+ Reply to Thread
Results 1 to 13 of 13

Thread: Stepwise Regression Limitations Explanation

  1. #1
    Points: 14, Level: 1
    Level completed: 27%, Points required for next Level: 36

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Stepwise Regression Limitations Explanation




    I've recently been working on building a model and have come across a number of different approaches. I'm particularly interested in the limitations of using Stepwise regression as it has a huge amount of criticism online, however I can't find much material detailing why it's a poor method to use. Specifically, I've seen the following claims:

    - R-squared values are biased too high.
    - p-values are too low due to multiple comparisons.
    - Parameter estimates are biased high.

    Could someone please explain briefly how Stepwise regression causes the above claims?

  2. #2
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Stepwise Regression Limitations Explanation

    "p-values are too low due to multiple comparisons" well you run more than one model, so you may be inclined to correct for false discovery. If I throw all variables into a model and then do that over and over, I run the risk of finding spurious correlations.

    In general, if you are not using knowledge of the context to guide your rationale, this could lead to the above.





    The issue behind your first and last critiques seem similar.
    Stop cowardice, ban guns!

  3. #3
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Stepwise Regression Limitations Explanation

    The basic problem with stepwise is it relies heavily on chance. What you find in one survey might generate totally different results for stepwise in another survey. There is also a serious problem with misspecification if variables that are correlated with each other and the DV exist. If stepwise excludes one and includes the other misspecification will occur.

    The last variable included/first excluded can be particularly wrong. Stepwise gets blasted by statisticians - I once read a chapter entitled "Death to Stepwise: Think for Yourself"
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  4. #4
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Stepwise Regression Limitations Explanation

    noetsi has an interesting point with the collinearity of estimators. Perhaps two variables have an overlap in predicting the outcome. The model may grab the better of the two, but the second never gets a fair shake to be included, since the analyst assumes it is not significant. Though the second variable may be a better predictor to use with the sample (cheaper/easier to collect, etc., even though the firsts trumps it.
    Stop cowardice, ban guns!

  5. #5
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Stepwise Regression Limitations Explanation

    But one should check the VIF and if large take the appropriate steps, right? This is imo not a good argument against stepwise.

    regards

  6. #6
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Stepwise Regression Limitations Explanation

    Partially agree, since you can set the criteria used in stepwise regression. So feasibly you may be able to control for this during the process. But I would guess mamy folks that use it miss this step. Especially since this step typically occurs from running a non-stepwise approach.


    So conduct regression to get VIF, then run stepwise afterwards and drop or address potential collinearity concerns. Seems roundabout. Plus what about other assumptions for model fit or appropriateness. Stepwise currently is not telling you, you have leverage, etc., it is automated with basic criteria to fulfill.
    Stop cowardice, ban guns!

  7. #7
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Stepwise Regression Limitations Explanation

    To me, bias is the biggest problem here. The bias occurs because you intentionally throw out variables that are non-significant.

    Imagine the following scenario. You have a variable, X1, that (in the population), actually has a moderate effect size. You also have some other variables, X2-X5, which you would consider as part of your model. (But we'll focus on X1).

    Now imagine we conducted repeated studies, each time randomly drawing a sample from the population and estimating a regression model. The (estimated) sample coefficient for X1 would vary: Sometimes it will be smaller than the true parameter value, and sometimes larger. Make sense?

    Importantly, the cases when the sample coefficient for X1 is smaller will also tend to be the cases when the coefficient is not statistically significant.

    If, each time we collected a sample, we used stepwise regression to exclude non-significant predictors, we would tend to systematically exclude all the instances in which the effect of X1 is relatively small.

    Across stepwise models estimated on repeated samples, the average estimate of the effect of X1 will therefore be larger than the true parameter value of X1. In other words, the sample coefficient based on a stepwise regression is biased.

    (NB: If using SPSS, this problem occurs regardless of whether you actually click "stepwise" "forward" or "backward" selection in SPSS - all are broadly stepwise methods. Further, it also applies if you manually exclude predictors based on their p values).

  8. #8
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Stepwise Regression Limitations Explanation

    CB I like your post but i am unsure if a person wouldn't exclude that variable even if they used a nonautomated approach. I agree say that in the arena of publication bias or general publishing we are likely to those samples with greater effects - the standard error then comes into play somewhat to remind us of the distribution of effects.

    Side note, stepwise typically has inclusion exclusion criteria to help catch those small effect variables.
    Stop cowardice, ban guns!

  9. #9
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Stepwise Regression Limitations Explanation

    Quote Originally Posted by hlsmith View Post
    CB I like your post but i am unsure if a person wouldn't exclude that variable even if they used a nonautomated approach. I agree say that in the arena of publication bias or general publishing we are likely to those samples with greater effects
    Yes. Bias happens whether the selection is automated, whether predictors are binned manually based on p values, or whether papers aren't published because of "insignificant" results. Preregister + predetermined analyses + publish whatever you study (in some format) is one better option.

  10. The Following User Says Thank You to CowboyBear For This Useful Post:

    hlsmith (05-04-2016)

  11. #10
    Points: 14, Level: 1
    Level completed: 27%, Points required for next Level: 36

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Stepwise Regression Limitations Explanation

    Thank you for all of your responses, they're greatly appreciated!

  12. #11
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: Stepwise Regression Limitations Explanation

    I must say that stepwise regression has its place in an analyst's toolbox. Not so much in an academic's toolbox though, who performs experimental trials for example. But in, for example, ecommerce companies with lots and lots of data on customer purchase and online behaviour I'd say that it is greatly advantageous to use stepwise regression.

    I once built a predictive model for a large Swedish company who wanted to predict the probability that a customer would not place an order within one year. I had a couple of hundreds of variables to work with and I could identify a dozen which would surely have an impact on the DV.

    Within many fields, you don't care if you've included the 'correct' IVs; the only thing you care about is whether the model gives accurate predictions. And if you can assure that your model can do that - why not stepwise?

    So, what I did was that I used stepwise regression and found the 'best' model (or in other words: a good model) based on the least out of sample validation error. I did this for three different time periods and then I averaged the predictions for these three models. By building models based on three different time periods - the variables which does not affect the DV is expected to be cancelled out by averaging the predicted probabilities.

    TL;DR - Stepwise regression can be useful if you don't care if you accidentally include variables which does not affect the DV. Or in other words: if you only care about the model's predictive capability.

  13. #12
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Stepwise Regression Limitations Explanation

    Agreed. I was going to mention this as well. On occassion when I have many variables I will run a stepwise to get a feeling for the covariates I may need to control from.
    Stop cowardice, ban guns!

  14. #13
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Stepwise Regression Limitations Explanation


    Quote Originally Posted by Englund View Post
    Within many fields, you don't care if you've included the 'correct' IVs; the only thing you care about is whether the model gives accurate predictions. And if you can assure that your model can do that - why not stepwise?
    I guess my main responses to this would be:
    1. Stepwise regression is not set up to maximise prediction accuracy - it's based purely on the significance of predictors
    2. Stepwise regression will give you an overly optimistic estimate of prediction accuracy

    Using cross-validation is great, and that helps to deal with point 2. But it doesn't really deal with point 1. If you want to maximise out-of-sample prediction accuracy, stepwise regression isn't really the best tool - that's not what stepwise regression attempts to achieve. Some other options off the top of my head would be AIC, BIC, cross-validation (for selection, not just validation), or lasso.

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats