+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 16

Thread: Regression Methods

  1. #1
    Points: 1,765, Level: 24
    Level completed: 65%, Points required for next Level: 35

    Posts
    26
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Regression Methods




    Hello,
    I have 1 dependent variable and 30 independent variables. I used forward method for selecting the important variables. The Adjusted R square value was very high (near to 1 !!!) and I thought maybe there is something wrong with the methods. So according to the forward method results, I only selected the significant independent variables and ran the regression analysis with enter method again. But this time the Adjusted R square was low (about 0.4) and only one of the IVs was significant. I'm confused about these results. I appreciate any help.

  2. #2
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: Regression Methods

    We would love to hear more information (sample size, etc.). How many of the variables selected by computer (in your perfectly fitted model) was entered into the model with the adjusted r-squared of 0.4? Furthermore, note that the forward selection method does not select the significant variables to enter the next model. It has some criteria, but not exactly what you manually did to replicate what computer does. So, if you enter the same independent variables in the last model of the forward selection method into your manually selected model, the result will be the same as the computerized forward selection method (again an adjusted r-squared of about 1.0).

    The adjusted r-squared indicates how your model fits the data well, so when you enter more variables, it is possible that your model becomes better as a whole, and the adjusted r-squared increases. However, when you prune some of the variables that were non-significant, your model which depended on them too, becomes weaker in explaining your experiment, and thus your adjusted r-squared decreases. So it is possible to have an adjusted r-squared value close to 1, while the model is very complicated and actually of little or no practical use (despite being very accurate).
    "victor is the reviewer from hell" -Jake
    "victor is a machine! a publication machine!" -Vinux

  3. The Following User Says Thank You to victorxstc For This Useful Post:

    Bahareh (11-01-2013)

  4. #3
    Points: 1,765, Level: 24
    Level completed: 65%, Points required for next Level: 35

    Posts
    26
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Regression Methods

    Dear victorxstc,
    in my research n=32. I chose the model with appropriate VIFs and T-test results for each variable in forward regression. This model only kept "3" of my "30" IVs with ARS=1. Then I wanted to test what will happen if I only use these 3 IVs in a new regression analysis with enter method. But the results weren't the same as best model selected in forward regression and according to T-test only one of these 3 IVs was significant. In addition ARS decreased to 0.4.

  5. #4
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: Regression Methods

    Also, your adjusted R2 will be highly inflated since you've got so many variables. If n-k is small the inflation of adj-R2 will be severe!!! Next I'll show a simple simulation giving the 95:th percentile of the adjusted R2 based on 100 simulations with 30 independent non-intercorrelated variables and 50 observations; the 95:th percentile was estimated to be 0.41 with the forward selection method and 0.61 with backward selection.

    The inflation of adj-R2 is not as bad if n-k is large though. When I performed the same simulation as before but with n=300 and k=30, I got the following result: 0.06 and 0.06.

    I would strongly recommend not to use forward regression nor backward or all subset regression, especially if n-k is small!
    Last edited by Englund; 11-01-2013 at 04:53 PM.

  6. The Following User Says Thank You to Englund For This Useful Post:

    Bahareh (11-02-2013)

  7. #5
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: Regression Methods

    Quote Originally Posted by Bahareh View Post
    according to T-test only one of these 3 IVs was significant.
    I bet none of the variables are significant if you account for total number of tests carried out.

  8. #6
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: Regression Methods

    Quote Originally Posted by Bahareh View Post
    Dear victorxstc,
    in my research n=32. I chose the model with appropriate VIFs and T-test results for each variable in forward regression. This model only kept "3" of my "30" IVs with ARS=1. Then I wanted to test what will happen if I only use these 3 IVs in a new regression analysis with enter method. But the results weren't the same as best model selected in forward regression and according to T-test only one of these 3 IVs was significant. In addition ARS decreased to 0.4.
    You mean you have two exactly similar models with the same three independent variables (one selected through a forward-selection process, and the other one an exact replication of the former), plus the intercepts in both models, but their properties like adjusted R-squared differ? All I guess is that something is not exactly the same in your manual replication. Have you entered any interactions? Or manipulated the intercept? Perhaps the defaults of your statistical program differs for the Enter and Forward Selection methods, so it might be possible that the two methods are not exactly the same. If they were completely similar (regardless of how the variables had been selected), the adjusted R-squared would be the same too. Btw, which software do you use?
    "victor is the reviewer from hell" -Jake
    "victor is a machine! a publication machine!" -Vinux

  9. #7
    Points: 1,765, Level: 24
    Level completed: 65%, Points required for next Level: 35

    Posts
    26
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Regression Methods

    Dear Englund,
    according to what you said It is obvious that my n-k is very small! (n=32 and k=30). So should I omit some of my IVs and do my analysis again? what method I must use this time? I'm amateur in statistics and don't know what exactly to do. I appreciate your help...

  10. #8
    Points: 1,765, Level: 24
    Level completed: 65%, Points required for next Level: 35

    Posts
    26
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Regression Methods

    Dear victorxstc
    Yes, That's what I exactly did...I wonder what is different in these 2 analyses...I'm amateur in statistics and I use SPSS software...

  11. #9
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: Regression Methods

    Quote Originally Posted by Bahareh View Post
    Dear Englund,
    according to what you said It is obvious that my n-k is very small! (n=32 and k=30). So should I omit some of my IVs and do my analysis again? what method I must use this time? I'm amateur in statistics and don't know what exactly to do. I appreciate your help...
    But you were saying you had entered only 3 IVs (both in the final model of the forward-selection regression, and in your manually selected model), right? So your k must be 3, not 30? Please if possible, put the output of SPSS for both models, so we can see which model incorporates how many and which variables. I believe if the two models were exactly the same, they would have the exactly same R-squared values, regardless of the procedures that had used to select the variables.
    Besides, you don't look at all amateur.
    "victor is the reviewer from hell" -Jake
    "victor is a machine! a publication machine!" -Vinux

  12. #10
    Points: 1,765, Level: 24
    Level completed: 65%, Points required for next Level: 35

    Posts
    26
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Regression Methods

    Sorry! I mean in forward regression I had 30 IVs that the third model showed 3 important IVs and in the other analysis I entered only these 3 IVs...I supposed to get the same results. anyway I've attached my results
    Attached Files

  13. #11
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: Regression Methods

    Quote Originally Posted by Bahareh View Post
    Dear Englund,
    according to what you said It is obvious that my n-k is very small! (n=32 and k=30). So should I omit some of my IVs and do my analysis again? what method I must use this time? I'm amateur in statistics and don't know what exactly to do. I appreciate your help...
    Well, if you decide to use a forward/backward selection method, I'd strongly suggest that you evaluate your model on training data (data that was not used when estimating your model). As said before, my simulation showed that the 95:th percentile for adj-R2 was 0.41 with forward selection method; when the expected adj-R2 was 0. Thus: highly inflated adj-R2.

    I think 'data mining' methods such as different selection methods are very dangerous to use if you do not know about the sideeffects. If your model gives a high adj-R2 then your model fits well to your sample, but not at all to out of sample data (given that adj-R2 has a low expected value as it was in my simulation for example).

  14. The Following User Says Thank You to Englund For This Useful Post:

    Bahareh (11-02-2013)

  15. #12
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: Regression Methods

    Quote Originally Posted by Bahareh View Post
    Sorry! I mean in forward regression I had 30 IVs that the third model showed 3 important IVs and in the other analysis I entered only these 3 IVs...I supposed to get the same results. anyway I've attached my results
    It is interestingly strange. The only thing coming to my mind is that your data differs for the two models. I guess you have some missing data in variables other than the last third ones (agrirangzland, urbPSCV, urbzland). So the forward selection method has excluded some of the cases (rows) due to missing values in other variables. Once you entered only the last three variables, the missing cells in other columns did not interfere with the regression algorithm, so SPSS did not exclude any rows because of missing values in other columns.

    Perhaps the reduced n of the forward-selection method (due to excluding rows to treat missing values in any of the 30 variables) might contribute to the inflated adjusted r-squared of your forward-selection final model.

    I can't think of anything else in your case. Please check and update us too.
    "victor is the reviewer from hell" -Jake
    "victor is a machine! a publication machine!" -Vinux

  16. The Following User Says Thank You to victorxstc For This Useful Post:

    Bahareh (11-02-2013)

  17. #13
    Points: 1,765, Level: 24
    Level completed: 65%, Points required for next Level: 35

    Posts
    26
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Regression Methods

    Dear victorxstc and Englund,
    Thanks a lot for following my posts and your good comments and suggestions. I will check out my data and try to solve the problem considering your comments .

  18. #14
    Points: 1,765, Level: 24
    Level completed: 65%, Points required for next Level: 35

    Posts
    26
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Regression Methods

    Dear victorxstc,
    You were completely right! I checked my data and Yes! there are missing data in other variables and I didn't know that forward selection exclude those cases and reduce my "n"!!
    again thank you very much

  19. #15
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: Regression Methods


    You are so welcome dear Bahareh. I am glad it worked.

    [temp post]
    "victor is the reviewer from hell" -Jake
    "victor is a machine! a publication machine!" -Vinux

  20. The Following User Says Thank You to victorxstc For This Useful Post:

    GretaGarbo (11-07-2013)

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats