+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast
Results 16 to 30 of 36

Thread: heteroskedasticity and non normal residuals in linear regression - please help!

  1. #16
    Points: 3,901, Level: 39
    Level completed: 68%, Points required for next Level: 49
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 24 Times in 22 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!



    In other words, is this 'wrong'? Or does it help to know that the starting conditions predict change? Even though they don't predict anything, it just arises out of the maths?

  2. #17
    IBM Rules
    Points: 13,300, Level: 75
    Level completed: 13%, Points required for next Level: 350

    Posts
    2,589
    Thanks
    121
    Thanked 378 Times in 366 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    Given your qq plot I strongly suggest you do a skewness test and DFBETA. The data appears to be abnormal with a lot of outliers in the tail. Its best to check.
    "Facts are stubborn things, but statistics are more pliable." Mark Twain

  3. #18
    Banned
    Points: 3,674, Level: 38
    Level completed: 16%, Points required for next Level: 126
    GretaGarbo's Avatar
    Posts
    429
    Thanks
    146
    Thanked 139 Times in 122 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    Quote Originally Posted by SiBorg View Post
    If so, I may go as far as to say that Greta is a Genius.
    Now, I am really embarrassed! Very embarrassed!

    Ok it is nice that we say friendly words to each other. Thank you!

    Sometimes we, I mean myself, makes stupid comments. Then maybe it is good if we are not to frank.

    If we regress: (random number – A) versus A, of course the regression coefficient will be around –1.

    I mean that’s what the left hand equations says. There is a –1 in front of A in the left hand side of the equation. As Dason later on point out. (So I don’t know where I was dreaming out the –0.5 coefficient. Stupid guess of me!)

    Anyway, this is an important model that is used again and again. It is very common to take the difference to the baseline and to use the baseline as an explanatory factor. I say again: it is not wrong, but is it relevant?

    Would it be better to just use the late periods value and baseline as explanatory variable? (I don’t think so but I don’t know why.)

    It is very natural to take the difference so that “the individuals acts as its own control”.

    There is a simple solution to this and at the moment I can’t se it.

    Explain to us!

    (A note: It was nice of Noetsi to explain where he saw the heteroscedasticity. I can’t se that. Anyway thanks!

  4. #19
    Points: 3,901, Level: 39
    Level completed: 68%, Points required for next Level: 49
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 24 Times in 22 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    Is this simply an example of 'regression to the mean'. It's just regression to a lower mean which is why the effects look significant. What we really need is a way of looking at the slopes of the lines between each baseline and follow-up depth, corrected for the difference in means, to see weather there is any 'true' effect.

  5. #20
    Multicollinearity hater
    Points: 6,534, Level: 52
    Level completed: 92%, Points required for next Level: 16
    victorxstc's Avatar
    Posts
    741
    Thanks
    170
    Thanked 181 Times in 163 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    Quote Originally Posted by noetsi View Post
    As I said it is close. To me it seemed there was enough difference to comment on. But there is no solid rule that I am aware of (when the differences are not extreme) of how far the range has to change to have serious heteroskedacity. It comes down to eying the data and making a judgement call.

    Another way of saying you could well be right. The first time I looked at it, the differences seemed further apart
    Quote Originally Posted by GretaGarbo View Post
    A note: It was nice of Noetsi to explain where he saw the heteroscedasticity. I can’t se that. Anyway thanks!
    Quote Originally Posted by Dason View Post
    You see heteroskedasticity in that plot? I don't. I mean the left side of the plot doesn't seem to have as large of a range of values but it also has a lot less values which can influence our perception of the spread of the data. It looks alright to me. And even if it was present it's not strong enough to care too much about in my opinion.
    Greta this is why I said recently that interpreting the qq plot is subjective. We can see that the result differs from person to person, or from time to time in one person.

    Besides, I too agree on that G thing! GGG = GretaGarboGenius

    Or GGGG = GretaGreatGarboGenius

  6. #21
    Points: 3,901, Level: 39
    Level completed: 68%, Points required for next Level: 49
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 24 Times in 22 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    Just before I sleep. If the expected regression coefficient for random interaction is -1, then is it the change from -1 that we are interested in? So I got -0.5 so does that mean that deeper chambers do not shallow as much as we would expect them to from simple regression to the mean? I.e. the opposite of what I concluded before (that deeper chambers shallow more...).

    Or does my -0.5 simply reflect that these were not sampled from a perfectly normal distribution?

  7. #22
    Points: 3,901, Level: 39
    Level completed: 68%, Points required for next Level: 49
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 24 Times in 22 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    OK, I honestly think that's it. So, -1.00 is the expected association if there is no association and any deviation from this is more or less than expected depending on whether it is more or less than -1.00. Whereas, for the other coefficients, we are interested in the difference between 0 which would indicate no association.

    Do we agree??

  8. #23
    Banned
    Points: 3,674, Level: 38
    Level completed: 16%, Points required for next Level: 126
    GretaGarbo's Avatar
    Posts
    429
    Thanks
    146
    Thanked 139 Times in 122 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    Quote Originally Posted by victorxstc View Post
    Besides, I too agree on that G thing! GGG = GretaGarboGenius

    Or GGGG = GretaGreatGarboGenius
    I am even more embarrassed!

    (Still: friendly words! Tanks!)

    But there are a number of people here who are really good at this. (I am not one of them. But I am reading, listening and learning.)

    (@victorxstc, I understood that you would hit on this about the impression of the graphs. But still, what we observe is an objective fact! Lets talk about that later.)

    Suppose the error term is small so that it is negligible and add A (the baseline measurement) to both sides:

    B-A+A = a+b*A +A

    B = a+(1+b)A

    So if b= -0.6 then

    B= a+ (1-0.6)*A

    B=a+(0.4)*A

    Therefore A will have a predictive value if b< 1.0

    But is it a good model?
    Last edited by GretaGarbo; 09-28-2012 at 09:55 PM.

  9. #24
    Points: 3,901, Level: 39
    Level completed: 68%, Points required for next Level: 49
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 24 Times in 22 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    Adjusted R-squared: 0.5637
    The adjusted R-squared for the random model I created is 0.56. Interestingly, even if you sample from exactly the same normal distribution you get the same result (as predicted by Dason). The next test is to see if you sample from any distribution do get the same result.

    So, an additional question is this. How do you adjust the R2 to reflect that this IV needs to have -1 as it's comparitor rather than 0 as for all the other ones.

  10. #25
    Points: 3,901, Level: 39
    Level completed: 68%, Points required for next Level: 49
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 24 Times in 22 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    The next test is to see if you sample from any distribution do get the same result.
    Ok, so I've done this one too. You get the same correlation (roughly -1.0) even with two random variables. The only difference is that the QQ plot becomes very sigmoid shaped (and is not fixed by taking logs then taking the difference).

    Code below and plots attached...

    Code: 
    ACD5 <- runif(200, 0, 10)
    ACD6 <- runif(200,0,10)
    ACDtestR<-data.frame(ACD5,ACD6)
    ACDtestR$ACDdiff<-ACDtestR$ACD5-ACDtestR$ACD6
    t.test(ACD5,ACD6)
    
    	Welch Two Sample t-test
    
    data:  ACD5 and ACD6 
    t = 0.6981, df = 397.287, p-value = 0.4855
    alternative hypothesis: true difference in means is not equal to 0 
    95 percent confidence interval:
     -0.3725438  0.7828126 
    sample estimates:
    mean of x mean of y 
     5.107567  4.902433 
    
    ACDtestR$ACDdiff<-ACDtestR$ACD6-ACDtestR$ACD5
    model.test<-lm(ACDdiff~ACD5,data=ACDtestR)
    summary(model.test)
    
    Call:
    lm(formula = ACDdiff ~ ACD5, data = ACDtestR)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -4.8977 -2.5170  0.0591  2.7380  5.0182 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  5.14488    0.40274   12.78   <2e-16 ***
    ACD5        -1.04747    0.06803  -15.40   <2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
    
    Residual standard error: 2.879 on 198 degrees of freedom
    Multiple R-squared: 0.5449,	Adjusted R-squared: 0.5426 
    F-statistic:   237 on 1 and 198 DF,  p-value: < 2.2e-16
    Attached Files

  11. #26
    Multicollinearity hater
    Points: 6,534, Level: 52
    Level completed: 92%, Points required for next Level: 16
    victorxstc's Avatar
    Posts
    741
    Thanks
    170
    Thanked 181 Times in 163 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    (@victorxstc, I understood that you would hit on this about the impression of the graphs. But still, what we observe is an objective fact! Lets talk about that later.)
    Greta, I agree that what we observe is objective (although one can argue that there is Nothing objective in the whole universe, as anything we perceive is only a subjective image of something which may or may not exist in the real world (a real world which itself may or may not exist at all) [but that's another story and I think you are familiar with it too])

    However, if we exclude this true but philosophical and non-practical fact, and agree (in fact assume) that the QQ plot itself is something 100% objective, we still have to finally confirm that its interpretation is not objective at all, since there are no clear-cut ways of interpreting it And by "clear-cut", I mean: as exact as a P value which can be non-significant or significant when it passes 0.05 in either ascending or descending ways.

    Sure looking forward to discussing it.

    --------------------------------

    About the technical parts on the model and correlation coefficients, all I can say is that I'm totally lost!!

  12. #27
    Points: 3,901, Level: 39
    Level completed: 68%, Points required for next Level: 49
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 24 Times in 22 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    Let's consider a slightly different situation, i.e I assume that ACD change is generally negative and takes a normal distribution [i.e. ACDchange<-rnorm(200,-0.159,0.215)].

    If I subtract this from the baseline measurement (i.e I am now keeping the values 'paired'), my regression works showing that there is no correlation at all between starting depth and shallowing.

    I feel that my situation is more akin to this, since I have kept the measurements 'paired' between patients.

    However, let's assume that there is a 20% measurement error proportional to the actual depth measured. Is it possible that THIS is regressing toward the mean and that is what is causing the apparent correlation.

    So what I am saying is that should I do a sensitivity analysis where I put on a random measurement error of, say 20% and see what happens when there is no correlation other than this random error. Then, I can say whether the effect I have found is more or less than this random error.

    Does this sound like a good idea?

  13. #28
    Points: 3,901, Level: 39
    Level completed: 68%, Points required for next Level: 49
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 24 Times in 22 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    It works!!! A 20% measurement error will give a correlation of -0.75!!! Eureka!

  14. #29
    Banned
    Points: 3,674, Level: 38
    Level completed: 16%, Points required for next Level: 126
    GretaGarbo's Avatar
    Posts
    429
    Thanks
    146
    Thanked 139 Times in 122 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!

    I think it is nice to try to participate in SiBorgs work. Not only is SiBorg a fellow member in this community but I also think that this is a problem that appears frequently.

    Don’t care about R2. It is largely irrelevant anyway.

    However, let's assume that there is a 20% measurement error proportional to the actual depth measured. Is it possible that THIS is regressing toward the mean and that is what is causing the apparent correlation.
    I don’t understand this. If it (the 20% error) were added to the dependent variable it would just increase the random error. If it were added to the independent variable it would create measurement errors in the independent variable and cause biased and inconsistent estimates.

    Besides it is obvious that if the baseline measurement is included as an explanatory variable, there will a “significant” R2.

    If this study had more two time periods, like if there had been three of more time periods then it would have been natural to model it as a repeated measures series. We could do that now also with just two time periods. Note that in that case we would not use the baseline measurement as an “explanatory variable”.

    Then there would be a between-subject-random error (among the circa 200 patients) and a within subject random error. All the individual measurements would have such an individual level.

    An other point:
    The use of the variable name: “Racd_screean_median”, the use of the word “median” suggest that there have been several measurement made and since the median was (maybe) used that there was a skewed distribution in the measured variable. This problem might have been cured by taking the logarithm that we have been talking about. It also indicates a sort of multi-level formulation in that each patient is measured several times. This does not matter much if we go into the world of normally distributed models where several random component are just lumped together in a common normally distributed random error. But for other distributions if might matter.

  15. The Following User Says Thank You to GretaGarbo For This Useful Post:

    SiBorg (10-02-2012)

  16. #30
    Points: 3,901, Level: 39
    Level completed: 68%, Points required for next Level: 49
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 24 Times in 22 Posts

    Re: heteroskedasticity and non normal residuals in linear regression - please help!


    Hi Greta. What I did was take 200 random starting cACD depths. I then subtracted a random sample of 200 depth changes sampled from a normal distribution. That then gives a finishing cACD depth.

    So, depth change is finishing depth - starting depth. If you do the regression Change~starting depth there is NO correlation using these simulated values.

    However, if you add a 20% random error to the starting AND finishing depths, then use this to estimate change and then do the regression Change~starting depth, suddenly there is a correlation. So what I am saying is that the random error from the start measurement and the end measurement is regressing toward the mean and giving an apparent correlation.

    You get the same effect (but less so) if you add a 0.4mm random error to the start and end measurements that does not depend on the value measured.

    So, it's not the regression that's at fault (because if I do the regression with 'perfect' simulated values I don't see a correlation). It's the random error in the 'real' measurements that's causing the problem.

    I suspect that a repeated measures design would be more robust to the random error.... but I don't know anything about repeated measures designs...

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats