+ Reply to Thread
Results 1 to 11 of 11

Thread: Heteroscedastcty with large number of cases

  1. #1
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Heteroscedastcty with large number of cases




    I ignore here that I have populations commonly

    I have heard that heteroscedastcity is not an issue when you have large sample sizes (I have ten thousand plus cases normally) because even if it exists the statistical test will be asymptotically correct. But I have also read that this is not the case, because heteroscedastcity influences the assumed distribution it will invalidate statistical test even with large sample sizes.

    It would be nice for statisticians could agree on something
    Last edited by noetsi; 10-14-2016 at 05:08 PM.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  2. #2
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Heterossedastcty with large number of cases

    I am just a practioner - but since I've seen the generalized least squares I keep wondering why we do not just switch to them and deal with the variance structure directly?

  3. #3
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Heterossedastcty with large number of cases

    The suggestion I commonly follow is to use White SE. But I am not honestly sure you even need to do that with so many cases - which is really the point of this thread. There is disagreement whether hetero has an impact with very large sample sizes.

    Incidentally R A Fischer should be shot for using this word, one of the hardest to spell in the entire English language. I usually just call it hetero.

    Of course he is dead and a statistical legend so its probably moot.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  4. #4
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Heterossedastcty with large number of cases

    Quote Originally Posted by noetsi View Post
    The suggestion I commonly follow is to use White SE. But I am not honestly sure you even need to do that with so many cases - which is really the point of this thread. There is disagreement whether hetero has an impact with very large sample sizes.
    The problem with this would be the need to explain clearly, every time, why you think heteroskedasticity is not an issue for the particular data set. It might be a combination of the size of the dataset and magnitude of the variance variation, so, probably there would be not a clear-cut decision.

    My guess is that it would be easier and less controversial to have a standard procedure including addressing the variance structure and to always do the analysis this way.

    Quote Originally Posted by noetsi View Post
    Incidentally R A Fischer should be shot for using this word, one of the hardest to spell in the entire English language. I usually just call it hetero.

    Of course he is dead and a statistical legend so its probably moot.
    I do some trainings in simple statistics and I always use the word to get some laughs like telling trainees to use it if they want to reeeeallly show off.

    regards

  5. #5
    Omega Contributor
    Points: 38,413, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,004
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Heteroscedastcty with large number of cases

    Couple of comments: does whites test provide a test statistic that can be directly interpreted. What I am getting at is can you look at its effect size per se to possibly get around large sample size . So you could say that the size is actually fairly small.

    I am similar to Noestsi in that if I think there is a threat I use sandwich estimators. For me this is because I typically only run about one linear regression model a year. It was my understanding you need to have sometheories on cause of heteroskedasticity to appropriately use GLS. Rogojel, what approaches do you usually use when applying GLS?
    Stop cowardice, ban guns!

  6. #6
    Omega Contributor
    Points: 38,413, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,004
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Heteroscedastcty with large number of cases

    Stop cowardice, ban guns!

  7. #7
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Heteroscedastcty with large number of cases

    Quote Originally Posted by hlsmith View Post
    It was my understanding you need to have sometheories on cause of heteroskedasticity to appropriately use GLS. Rogojel, what approaches do you usually use when applying GLS?
    Hi,
    I pretty much follow the recommendations of Zuur et. al

    https://www.amazon.de/Effects-Extens...&keywords=Zuur

    check the residuals, have some theory about the source of heteroskedasticity �� and build a model accordingly, check the new residual pattern and repeat.

    regards

  8. #8
    Omega Contributor
    Points: 38,413, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,004
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Heteroscedastcty with large number of cases

    So you initially assume random effects than try different variance structures then replot the residuals and see if they look better?

    The first part is similar to what I do with mixed models, then look at AICC.
    Stop cowardice, ban guns!

  9. #9
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Heteroscedastcty with large number of cases

    Yepp, my point is that I need not worry about transformation and other usual remedies against hs.

    regards

  10. #10
    Human
    Points: 12,686, Level: 73
    Level completed: 59%, Points required for next Level: 164
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,363
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: Heteroscedastcty with large number of cases

    Quote Originally Posted by rogojel View Post
    Hi,
    I pretty much follow the recommendations of Zuur et. al

    https://www.amazon.de/Effects-Extens...&keywords=Zuur

    check the residuals, have some theory about the source of heteroskedasticity �� and build a model accordingly, check the new residual pattern and repeat.

    regards
    Does this mean that you use a glm - generalized linear model?

    What distribution? And what link function? How do you specify the heteroscedasticity?

  11. #11
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Heteroscedastcty with large number of cases


    hi,
    nope, the gls function from the nlme package . It is a weighted linear regression, but allows an easy specification of different variance structures. E.g. if the hypothesis is that the variance is increasing with increasing values of one variable you can specify something like

    vmod=varFixed(~MyVar)
    and add to the call to gls like

    gls(...., weights=vmod,..)

    regards

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats