+ Reply to Thread
Results 1 to 12 of 12

Thread: Solving Multicollinearity Issues

  1. #1
    Points: 189, Level: 3
    Level completed: 78%, Points required for next Level: 11

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Solving Multicollinearity Issues




    Dear Community,

    Currently I am modeling a multiple regression for a research project with Stata.
    I want to examine the influence of patents/innovation and output on the prices of batteries over the last 30 years.
    Therefore, I logarithmized inflation-adjusted prices, cumulative output and cumulative patents.
    Both independent variables explain the price degression themselves quite well (R: 0.94, 0.98). When I include both as independent variables R increases to 0.99.
    I tested the correlation of the independent variables and it resulted in r=0.9752.
    Further, I calculated the variance inflation factor. It amounts to 20.

    Therefore, in my opinion it is obvious that I have to care about multicollinearity. One solution is to add both independent variables together. That does not work in my case, as I want to identify the impact of those on the price.

    Literature suggest a two-step regression approach were the correlations are removed by using a residual variable. However, I do not understand what to do exactly.

    I would really appreciate your comments and your help.

    Best,

    Anton

  2. #2
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Solving Multicollinearity Issues

    I have not heard of the residual variable before. Please continue to report on it. Another option is to do nothing. There will be obvious collinearity that you would report, but the explanatory value of both seems high.
    Stop cowardice, ban guns!

  3. #3
    Points: 189, Level: 3
    Level completed: 78%, Points required for next Level: 11

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Solving Multicollinearity Issues

    Hi hlsmith,

    Thank you very much for replying so quickly. I attached the explanation for the two step approach including the given equations as a picture (to better read the equations..)
    Attached Images  

  4. #4
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Solving Multicollinearity Issues

    You should not look at R squared when adding variables. It always goes up when you do so. You should look at adjusted R square. Other than adding more data and combining variables there really are no easy solutions for MC. It has no impact on the actual slopes just the tests through the SE. If all you care about is your model, not individual variables, MC does not matter at all.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  5. #5
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Solving Multicollinearity Issues

    hi,
    a rough qualitative explanation would go like this: If you have two variables x1 and x2 which are strongly correlated, then including both into the regression means that you include the common information from x1 and x2 twice , which is causing the collinearity problem. To avoid this, you need to make sure to include the information only once, e.g by including x1 and only that component of x2 that is independent of x1. The way to find that part would be to build a regression for x2 using the variable x1 and to take the residuals from that regression as the second variable.

    BTW, as noetsi pointed out, your increase in the r-squared is no sign that you need a second variable, r-squared increases always if you include any new variable. The question is whether your r-squared adjusted increases and if yes, if this increase is worth complicating the model. Maybe there is a comon factor influencing both of your variables and that factor alone should go into the regression?

    Regards

  6. The Following User Says Thank You to rogojel For This Useful Post:

    hlsmith (02-08-2016)

  7. #6
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Solving Multicollinearity Issues

    There is a form of linear regression called hierarchical regression (not to be confused with multilevel models which are also confusingly called this by some). Rather than adding all the variables at once, which is how the software usually does it, you specify a specific order to add variables (based on theory, this is not stepwise regression). When this is done there are test (F change test I believe) that tell you if adding the variable added to its ability to predict. This would be made a lot more difficult I imagine with very high multicolinearity -which is a good example of why stepwise is not an ideal way to do regression.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  8. #7
    Points: 189, Level: 3
    Level completed: 78%, Points required for next Level: 11

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Solving Multicollinearity Issues

    Hi rogojel & noetsi,

    Thank you very much for your comments. The next days I will look at these methods and try to apply them to my dataset. I will let you know if it worked out.

    Best,

    Anton

  9. #8
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Solving Multicollinearity Issues

    Rogojel, that was a nice basic description. I would be interested in seeing a simple worked out example.
    Stop cowardice, ban guns!

  10. #9
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Solving Multicollinearity Issues

    hi hlsmith,
    good idea! I will work it out hopefully this week. In fact the approach is imo a simplified version of doing a principal component analysis first and applying the regression to the first few principal components.

    regards

  11. #10
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Solving Multicollinearity Issues

    Thanks. Yeah, this seems like it could be shown with a simulation where X2 is just X1 with a little extra variability. The idea makes sense to me and as I stewed on it last night I could kind of remember seeing something on it once in the past. Though, that could just mean you posted a similar reply two years ago and my brain is just trying to remember that!
    Stop cowardice, ban guns!

  12. #11
    Points: 189, Level: 3
    Level completed: 78%, Points required for next Level: 11

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Solving Multicollinearity Issues

    The last days I worked with the dataset and the equations. Further, I did an extensive research if someone went for a similar approach. However, I could not find papers following a similar method in my area of research (some researchers even ignore the multicollinearity issue).

    I did the modeling and in the end, Eq. 3/4 yielded plausible forecasts. Even though the results look plausible, I still have some problems with Eq. 1, which models cumulative patent applications (Ti) as a function of the logarithmized annual output.

    Ti in my dataset increases exponentially, not linearly and thus, the regression without logarithmizing Ti, leads in my case to a low R, which is also "wanted" as I want to figure out with this Eq. the residual. If R would be 1, there would be no difference between both independent variables and thus introducing a residual variable would not make sense. However, I am wondering if it is possible to logarithmize only the output (which also increases exponentially) and not logarithmizing Ti.

    What do the experts think?

  13. #12
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Solving Multicollinearity Issues


    Sorry to side track, but I know collinearity is also addressed with Principal Component Analysis some times.
    Stop cowardice, ban guns!

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats