B value in regression analysis

#1
Hi all,

Please bear with me here. I am new and I also am only learning about statistics & regression analysis.

One question about the interpretations of regression coefficient in multiple regression models. I have one variable which comes statistically significant. However it is wrong way around. It is supposed to be positive relationship whereas it comes negatively in relation with the dependent variable. This would not make sense at all logically. The data in question is number of vehicles on a road section. The deterioration of road surfaces is expected to be more when the number of vehicles rise. But the regression model picks this up as a negative factor.

How do I go about this to get the model to recognise this data as a positive factor?

Thanks in advance.
 

obh

Active Member
#2
Hi Engineer,

I can guess that there is some multicollinearity in your model.
There may be one other variable that is correlated with the "number of vehicles" predictor.
If you will remove the "other variable" you may get a positive coefficient.

For example, the "material" used on the road or the "technique"

You can you the following: http://www.statskingdom.com/410multi_linear_regression.html
It also checks the multicollinearity level (but don't let any automatic process to choose your variables).
look at the validation and on the VIF result.
 
#3
Hi,
Thank you very much for your comment.

I have now tried to calculate the VIFs for my each predictor variable. The max comes as 18.86 for the 'Lane' which is categorical variable and explains whether the measurement is taken from northbound or southbound of the road section. The second maximum ones are 10.98 for 'number of cars AM' and 11.11 for 'number of cars PM'. In the case of where there are more cars exist on AM time on southbound, there are less cars exist at the same time on northbound. It is the other way around for PM time. So I assume there might be a relation in that case (which causes the multicollinearity) that can be explained by themselves. But I have no idea how to explain and then include/exclude from the model.

Any suggestion will be very appreciated.

Thanks in advance.
 

obh

Active Member
#4
This is multicollinearity.

It would be better if you show the results, or at least the list of predictors and VIF.

So maybe, for example, there is multicollinearity between the "Lane" section and the "number of vehicles" so the positive effect for the "deterioration" because of the "number of vehicles" already get into the model because of the "Lane".

If you are interested in the model and want to know how each IV influence the DV you need to do something.
1 . Remove the highest VIF "Lane" , but also use your common sense.
Per my poor "roads" understanding the "Lane" itself shouldn't influence the "deterioration" but the "number of vehicles"
So if you wouldn't have the "number of vehicles" you could use the "Lane" but since you have it, it doesn't make sense to keep it in the model.

2. You may decide that the "Lane" influence in another way, maybe the deterioration influenced by the peak number of vehicles. as the peak number of vehicles increases the heat on the road, and heat increases chemical reactions exponentially ... or any other better reason...
In this case, you need to check if you can collect more data that will break the multicollinearity.

May be for example the best predictor should be "number of vehicles in peak hour" and both IV and both "Lane" and the "number of vehicles" correlate with this predictor.

If you are interested in the DV only and no in the coefficients you probably shouldn't do anything (see the following example: http://www.talkstats.com/threads/multicollinearity-in-regression.73292/#post-213213

PS @Karabiner per my understanding bold in not shout, it only helps me to read. if you want to shout try SHOUT :)