+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 25

Thread: What to do when the predictors are not what I expected (when the model is fine)?

  1. #1
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    What to do when the predictors are not what I expected (when the model is fine)?




    I would try to clarify the problem and then ask the questions.

    The problem (variable names are masked due to confidentiality):

    I ran a binary logistic regression, in which there were 5 independent variables (IVs): A, B, C, D, and E. A and B are my concern. C is also my concern and I would talk about it later. They have two factors each (A1 and A2, B1 and B2). When I ran the estimation, some significant findings appeared. Nevertheless, the direction of the coefficients (beta values) for A and B was the opposite of what I expected! According to the most of the literature (not all of it), I expected A and B to have a positive beta, while they both had a negative coefficient.

    First I checked the models for about three full days. There was no mistake in them, and the directions did not change whatever changes I made to the models (except that in none of those changes, I attempted to drop the interactions). The log likelihoods as well showed that I am in a good direction.

    Then I decided to put my subjective view against the strange results aside and trust the results of the regression analysis. Then I passed to discuss those strange results and tried to justify the controversial findings. While discussing, I came to this point that "those two variables are heavily interconnected. Firstly, they had a significant interaction. Secondly, the distribution of the predictor A was heavily affected on the B, and according to the literature, A and B could have opposite effects; it could be important in my sample which was not balanced in terms of B. This imbalance could confound the effect of A as well.

    So I though maybe these are causing some problems. Then since C was as well strange, I thought maybe the whole model is being affected in a bad way by problems such as multicollinearity. I asked myself "what will happen if I isolate only A and B in the model?" If the interactions between A, B, and C are some sources of bias, can reducing the number of IVs lead to different results? The answer was yes: when I excluded all the other IVs from the model and left only A, B, and A*B, one of the coefficients became favorable and more in line with the literature and common sense. Thus I might tell that some errors do exist in my model which disrupt the main model (such as multicollinearity maybe).

    The I decided to examine every strange predictor, in isolation. When I excluded the interactions and left only the five IVs, the results seemed Much more consistent. Apparently, the problem begins when some specific interactions (but not all of them) are added to the model. After adding them, the directions of betas for A and C get reversed. It is a little annoying since by adding those specific interactions, the log likelihood reduces considerably (from about -75 to -48), so I cannot easily ignore those interactions.

    Questions (the main ones are 3 and 4, but an answer to the rest is as well much appreciated):

    1. When the model acts strangely, but LRT and log likelihood tell that it is fine, which one should we choose? The subjective common sense, or the objective statistical measures?

    2. Do you think is there a "problem" in my case, in the first place? Maybe everything is fine. If you wished, I can provide the raw data too.

    3. What would you do in my case? At least three choices can be made: A. Dropping the interactions. B. Not dropping them and reporting the strange model. C. Reporting both the models with and without interactions, and also models of limited numbers of IVs (for example only A and B), and then try to subjectively discuss that "it is the interactions that cause the main large model strange".

    4.I am going to do the latter (3.C). But that would be so messy and not so good looking. I wonder if there is an elegant, objective way of finding the source of error in the main model (well if there is any errors, of course), so instead of subjective discussions, I can substantiate my claims on some objective statistical measures. For example, is there a way to highlight the problematic interactions according to some statistics?

    5. Do you have any other valuable suggestions or ideas?

  2. #2
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    I think multicolinearity is not the same as interactions. An interaction changes how x1 effects x2's impact on y. Multi colinearity is when two predictors basically are so closely aligned that they both are taking the same chunk out of y.

    What do the standard errors look like?
    Are you sure things are dummy coded correctly?

    PS I'm not the expert in the room but figure might aswell put an idea out and if it's wrong one of those experts will correct it
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  3. #3
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    lol, of course you are. Thanks trinker.

    Yeah you are right, that is not a case of multicollinearity, I should correct that part. Thanks again.

    What do the standard errors look like?
    They are so small that significant P values at the 0.01 or 0.001 level are outputted. Those do not look problematic. I can post everything (temporarily) here too. I am gathering the outputs.

    Are you sure things are dummy coded correctly?
    Yeah, the categorical variables are of two levels only and there is an ordinal variable as well.

  4. #4
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    I think what trinker was asking about the coding is are you sure that your 'success' condition is being coded as 1 (as opposed to being coded as 0 and the failure condition being coded as 1).
    I don't have emotions and sometimes that makes me very sad.

  5. The Following User Says Thank You to Dason For This Useful Post:

    trinker (02-07-2013)

  6. #5
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    UPDATE: I think it is a multicollinearity case.

    Oh of course, during the previous three days, I checked that for so many times!!! I first doubted that, but the file is OK, and besides, if that was the case, everything would be reversed (not just a couple of things). I am now attaching things in our private room.

  7. #6
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    You mention that A and B have slopes opposite of what you expect. If you fit a model with just A do you get a slope that you expect? What about for B? Humans are pretty bad at guessing/understanding slopes with multiple predictors in the model.
    I don't have emotions and sometimes that makes me very sad.

  8. #7
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    Well the literature is a little bit controversial about B so if it is positive or negative or even non-significant, it is OK. But less controversy exists over A so it should be positive. When not too many interactions were included yet, that beta for A was alright (the file attached). However, the problem started once I entered specific variables (interactions) to the model.

    So I checked the correlation matrices and there is huge multicollinearity. There are correlations up to 0.9 or maybe more between some of the variables, as a matter of fact between many of them. And I read somewhere that it is wise to consider correlations greater than 0.4 as cases of collinearity.

    Overall, I tried to rule out any sources of bad guessing and value the model more than my own subjective mind. Now I have objective evidence that the culprit is the multicollinearity and it is relieving. Now I should go find ways to deal with it.

  9. #8
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    Multicollinearity gets blamed for increased standard errors. But switching a slope isn't something you should 'blame' on multicollinearity. Like I said - it's just hard to understand how variables interact when fitting a model with multiple predictors.
    I don't have emotions and sometimes that makes me very sad.

  10. #9
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    Yeah mostly, but there might be some other consequences of multicollinearity according to wiki:

    "Indicators that multicollinearity may be present in a model:
    1. Large changes in the estimated regression coefficients when a predictor variable is added or deleted"

    I think my beta gets reversed to offset the effect of the last variable included.

  11. #10
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    SUCCESS! Finally found the perfect model after four or five days of banging my head against the wall. I am going to reward myself. to myself!

  12. #11
    Human
    Points: 12,676, Level: 73
    Level completed: 57%, Points required for next Level: 174
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,362
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    Of course when multicolinearity is present it is possible that the sign of an estimated parameter change when an other colinear variable is included. The values of the betas can "flip around".


    Even if the design had been perfectly balanced, perfectly orthogonal (so that all x-variables were uncorrelated to each other and to linear combinations), then it would not have been that easy to to interpret the parameter estimates if interaction effects are present. Even with perfect balance it is good practice to plot an "interaction plot" to look at the combined effect of factor A and B.

    With multicolinearity it is even more dificult.

    One method to avoid misstakes can be to "collaps" the two factors into one factor so that the two levels of factor A and the two levels of factor B are combinded into a factor say G with four levels. This give exactly the same parameter estimates as the model with "A + B + A:B", so nothing is statistically gained or lost but the model with one factor "G" is not as "tricky"/difficult. Then two levels of G can be compared (eg. level 2 with level 3).

    Next the model with G can be estimated with the others like: "G + C + D" (plus possible interactions).

    The problem with multicolinearity is a problem with the sample, not the population. So one (common) suggestion is to get better data. However it is often the case that the nature and society has a "bad design", so that colinear patterns appears.

    Also note that in logistic regression the individual values are "weighted" by the different variances, in contrast to a linear regression model estimated by least squares, so that what appeared to be a balanced design in the linear regression model would not be perfectly balanced in logistic regression.

    It also makes me a little bit worried when someone has found "the perfect model" after having struggled with multicolinearity a long time. There are no perfect models. As Box said: All models are wrong but some are useful.

  13. The Following 2 Users Say Thank You to GretaGarbo For This Useful Post:

    hlsmith (02-07-2013), victorxstc (02-08-2013)

  14. #12
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    I am not sure if the concern is interaction or MC. MC can reverse the sign if a variable is highly multicolinear and one variable is removed. This is one of the warning signs of high MC. My suggestion is to test for high MC and see what you get (what is your VIF)?

    I think what GreatGarbo said about MC would be the standard treatment of that topic although I dont think you can argue that its only an issue of the sample. It could well be that the two IV are confounded in their impact on the DV in the actual population (which is moot however since you will rarely if ever know the real population). John Fox covers this topic extensively in Regression Diagnostics (by Sage) - notably page 10-20. He does a much better job of explaining what does not work than what does.

    1. When the model acts strangely, but LRT and log likelihood tell that it is fine, which one should we choose? The subjective common sense, or the objective statistical measures?
    I would chose the one that makes the most theoretical (substantive) sense to me if I knew that. I was taught to always try to figure out why you had strange results and it makes me nervous when I can't explain some anomaly. Trying to figure out why the strange results are occuring is always important (which of course you are doing).

    2. Do you think is there a "problem" in my case, in the first place? Maybe everything is fine. If you wished, I can provide the raw data too.
    What is you VIF?

    3. What would you do in my case? At least three choices can be made: A. Dropping the interactions. B. Not dropping them and reporting the strange model. C. Reporting both the models with and without interactions, and also models of limited numbers of IVs (for example only A and B), and then try to subjectively discuss that "it is the interactions that cause the main large model strange".
    I would chose C which I think is how a journal would go as well. You might not present the full results, but at least enough to the summary results and why you think they are occuring and what they mean. Having said that I never submitted to a statistical journal and its been a long time since I submitted to any....


    Then since C was as well strange, I thought maybe the whole model is being affected in a bad way by problems such as multicollinearity. I asked myself "what will happen if I isolate only A and B in the model?" If the interactions between A, B, and C are some sources of bias, can reducing the number of IVs lead to different results? The answer was yes: when I excluded all the other IVs from the model and left only A, B, and A*B, one of the coefficients became favorable and more in line with the literature and common sense. Thus I might tell that some errors do exist in my model which disrupt the main model (such as multicollinearity maybe).
    Perhaps what this is showing is that existing theory is wrong. That they have failed to consider the impact of some phemenon (C) which changes the relationship of the other other factors on the DV. Our understanding of reality is often simplistic. The real question here is this simply a sampling issue, where C is distorting your results because of limits on the sample, or is it a problem in the real world. To answer that you would need to consider the theoretical implications of the impact of C - why would it logically have thise effect.

    It could be that existing theory is wrong, or incomplete, and your model is telling you that.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  15. The Following User Says Thank You to noetsi For This Useful Post:

    victorxstc (02-08-2013)

  16. #13
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,185 Times in 1,146 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    Whew, this is a long thread with big posts, I will save 15 minutes of my life and skip reading this monster. Vic, I bet you are one of those people with 50 slides jammed full of text, even when they are giving a 3 minute presentation. Word of the day "brevity".
    Stop cowardice, ban guns!

  17. #14
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?

    Quote Originally Posted by hlsmith View Post
    Whew, this is a long thread with big posts, I will save 15 minutes of my life and skip reading this monster. Vic, I bet you are one of those people with 50 slides jammed full of text, even when they are giving a 3 minute presentation. Word of the day "brevity".
    lol

    parsimony
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  18. #15
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,185 Times in 1,146 Posts

    Re: What to do when the predictors are not what I expected (when the model is fine)?


    Maybe we need a stepwise model for writing a post. Rank statements - take highest ranked statement, rank statements take new highest rank statement, rank statements, ... Now make sure your post is not saturated. How many words to concepts do you need? A long post has a high R^2 since it covers everthing, but this is not corrected for length. This seems like a Trinker R program waiting to happen!
    Stop cowardice, ban guns!

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats