I have not heard of the residual variable before. Please continue to report on it. Another option is to do nothing. There will be obvious collinearity that you would report, but the explanatory value of both seems high.
Dear Community,
Currently I am modeling a multiple regression for a research project with Stata.
I want to examine the influence of patents/innovation and output on the prices of batteries over the last 30 years.
Therefore, I logarithmized inflation-adjusted prices, cumulative output and cumulative patents.
Both independent variables explain the price degression themselves quite well (R²: 0.94, 0.98). When I include both as independent variables R² increases to 0.99.
I tested the correlation of the independent variables and it resulted in r=0.9752.
Further, I calculated the variance inflation factor. It amounts to 20.
Therefore, in my opinion it is obvious that I have to care about multicollinearity. One solution is to add both independent variables together. That does not work in my case, as I want to identify the impact of those on the price.
Literature suggest a two-step regression approach were the correlations are removed by using a residual variable. However, I do not understand what to do exactly.
I would really appreciate your comments and your help.
Best,
Anton
I have not heard of the residual variable before. Please continue to report on it. Another option is to do nothing. There will be obvious collinearity that you would report, but the explanatory value of both seems high.
Stop cowardice, ban guns!
Hi hlsmith,
Thank you very much for replying so quickly. I attached the explanation for the two step approach including the given equations as a picture (to better read the equations..)
You should not look at R squared when adding variables. It always goes up when you do so. You should look at adjusted R square. Other than adding more data and combining variables there really are no easy solutions for MC. It has no impact on the actual slopes just the tests through the SE. If all you care about is your model, not individual variables, MC does not matter at all.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
hi,
a rough qualitative explanation would go like this: If you have two variables x1 and x2 which are strongly correlated, then including both into the regression means that you include the common information from x1 and x2 twice , which is causing the collinearity problem. To avoid this, you need to make sure to include the information only once, e.g by including x1 and only that component of x2 that is independent of x1. The way to find that part would be to build a regression for x2 using the variable x1 and to take the residuals from that regression as the second variable.
BTW, as noetsi pointed out, your increase in the r-squared is no sign that you need a second variable, r-squared increases always if you include any new variable. The question is whether your r-squared adjusted increases and if yes, if this increase is worth complicating the model. Maybe there is a comon factor influencing both of your variables and that factor alone should go into the regression?
Regards
hlsmith (02-08-2016)
There is a form of linear regression called hierarchical regression (not to be confused with multilevel models which are also confusingly called this by some). Rather than adding all the variables at once, which is how the software usually does it, you specify a specific order to add variables (based on theory, this is not stepwise regression). When this is done there are test (F change test I believe) that tell you if adding the variable added to its ability to predict. This would be made a lot more difficult I imagine with very high multicolinearity -which is a good example of why stepwise is not an ideal way to do regression.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Hi rogojel & noetsi,
Thank you very much for your comments. The next days I will look at these methods and try to apply them to my dataset. I will let you know if it worked out.
Best,
Anton
Rogojel, that was a nice basic description. I would be interested in seeing a simple worked out example.
Stop cowardice, ban guns!
hi hlsmith,
good idea! I will work it out hopefully this week. In fact the approach is imo a simplified version of doing a principal component analysis first and applying the regression to the first few principal components.
regards
Thanks. Yeah, this seems like it could be shown with a simulation where X2 is just X1 with a little extra variability. The idea makes sense to me and as I stewed on it last night I could kind of remember seeing something on it once in the past. Though, that could just mean you posted a similar reply two years ago and my brain is just trying to remember that!
Stop cowardice, ban guns!
The last days I worked with the dataset and the equations. Further, I did an extensive research if someone went for a similar approach. However, I could not find papers following a similar method in my area of research (some researchers even ignore the multicollinearity issue).
I did the modeling and in the end, Eq. 3/4 yielded plausible forecasts. Even though the results look plausible, I still have some problems with Eq. 1, which models cumulative patent applications (Ti) as a function of the logarithmized annual output.
Ti in my dataset increases exponentially, not linearly and thus, the regression without logarithmizing Ti, leads in my case to a low R², which is also "wanted" as I want to figure out with this Eq. the residual. If R² would be 1, there would be no difference between both independent variables and thus introducing a residual variable would not make sense. However, I am wondering if it is possible to logarithmize only the output (which also increases exponentially) and not logarithmizing Ti.
What do the experts think?
Sorry to side track, but I know collinearity is also addressed with Principal Component Analysis some times.
Stop cowardice, ban guns!
Tweet |