1. ## Solving Multicollinearity Issues

Dear Community,

Currently I am modeling a multiple regression for a research project with Stata.
I want to examine the influence of patents/innovation and output on the prices of batteries over the last 30 years.
Therefore, I logarithmized inflation-adjusted prices, cumulative output and cumulative patents.
Both independent variables explain the price degression themselves quite well (R²: 0.94, 0.98). When I include both as independent variables R² increases to 0.99.
I tested the correlation of the independent variables and it resulted in r=0.9752.
Further, I calculated the variance inflation factor. It amounts to 20.

Therefore, in my opinion it is obvious that I have to care about multicollinearity. One solution is to add both independent variables together. That does not work in my case, as I want to identify the impact of those on the price.

Literature suggest a two-step regression approach were the correlations are removed by using a residual variable. However, I do not understand what to do exactly.

Best,

Anton

2. ## Re: Solving Multicollinearity Issues

I have not heard of the residual variable before. Please continue to report on it. Another option is to do nothing. There will be obvious collinearity that you would report, but the explanatory value of both seems high.

3. ## Re: Solving Multicollinearity Issues

Hi hlsmith,

Thank you very much for replying so quickly. I attached the explanation for the two step approach including the given equations as a picture (to better read the equations..)

4. ## Re: Solving Multicollinearity Issues

You should not look at R squared when adding variables. It always goes up when you do so. You should look at adjusted R square. Other than adding more data and combining variables there really are no easy solutions for MC. It has no impact on the actual slopes just the tests through the SE. If all you care about is your model, not individual variables, MC does not matter at all.

5. ## Re: Solving Multicollinearity Issues

hi,
a rough qualitative explanation would go like this: If you have two variables x1 and x2 which are strongly correlated, then including both into the regression means that you include the common information from x1 and x2 twice , which is causing the collinearity problem. To avoid this, you need to make sure to include the information only once, e.g by including x1 and only that component of x2 that is independent of x1. The way to find that part would be to build a regression for x2 using the variable x1 and to take the residuals from that regression as the second variable.

BTW, as noetsi pointed out, your increase in the r-squared is no sign that you need a second variable, r-squared increases always if you include any new variable. The question is whether your r-squared adjusted increases and if yes, if this increase is worth complicating the model. Maybe there is a comon factor influencing both of your variables and that factor alone should go into the regression?

Regards

6. ## The Following User Says Thank You to rogojel For This Useful Post:

hlsmith (02-08-2016)

7. ## Re: Solving Multicollinearity Issues

There is a form of linear regression called hierarchical regression (not to be confused with multilevel models which are also confusingly called this by some). Rather than adding all the variables at once, which is how the software usually does it, you specify a specific order to add variables (based on theory, this is not stepwise regression). When this is done there are test (F change test I believe) that tell you if adding the variable added to its ability to predict. This would be made a lot more difficult I imagine with very high multicolinearity -which is a good example of why stepwise is not an ideal way to do regression.

8. ## Re: Solving Multicollinearity Issues

Hi rogojel & noetsi,

Thank you very much for your comments. The next days I will look at these methods and try to apply them to my dataset. I will let you know if it worked out.

Best,

Anton

9. ## Re: Solving Multicollinearity Issues

Rogojel, that was a nice basic description. I would be interested in seeing a simple worked out example.

10. ## Re: Solving Multicollinearity Issues

hi hlsmith,
good idea! I will work it out hopefully this week. In fact the approach is imo a simplified version of doing a principal component analysis first and applying the regression to the first few principal components.

regards

11. ## Re: Solving Multicollinearity Issues

Thanks. Yeah, this seems like it could be shown with a simulation where X2 is just X1 with a little extra variability. The idea makes sense to me and as I stewed on it last night I could kind of remember seeing something on it once in the past. Though, that could just mean you posted a similar reply two years ago and my brain is just trying to remember that!

12. ## Re: Solving Multicollinearity Issues

The last days I worked with the dataset and the equations. Further, I did an extensive research if someone went for a similar approach. However, I could not find papers following a similar method in my area of research (some researchers even ignore the multicollinearity issue).

I did the modeling and in the end, Eq. 3/4 yielded plausible forecasts. Even though the results look plausible, I still have some problems with Eq. 1, which models cumulative patent applications (Ti) as a function of the logarithmized annual output.

Ti in my dataset increases exponentially, not linearly and thus, the regression without logarithmizing Ti, leads in my case to a low R², which is also "wanted" as I want to figure out with this Eq. the residual. If R² would be 1, there would be no difference between both independent variables and thus introducing a residual variable would not make sense. However, I am wondering if it is possible to logarithmize only the output (which also increases exponentially) and not logarithmizing Ti.

What do the experts think?

13. ## Re: Solving Multicollinearity Issues

Sorry to side track, but I know collinearity is also addressed with Principal Component Analysis some times.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts