+ Reply to Thread
Results 1 to 11 of 11

Thread: Interpreting residuals

  1. #1
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Interpreting residuals




    Still getting used to interpreting large numbers of data points for regression assumptions.

    The attached looks fan shaped to me, which it is said violates the assumptions of equal variance. But I thought I would ask for a second opinion.

    Using a qq plot I clearly have non-normality in my data, its highly skewed, but with 30,000 data points I don't think non-normality will have a great impact on the p values...
    Attached Images  
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  2. #2
    Devorador de queso
    Points: 95,705, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,931
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Interpreting residuals

    Is your response a count? Or possibly a non-negative random variable?
    I don't have emotions and sometimes that makes me very sad.

  3. The Following User Says Thank You to Dason For This Useful Post:

    noetsi (06-09-2016)

  4. #3
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: Interpreting residuals

    A heat map approach (hex binning) would show you were the points are more dense. To be uniform it would be relatively the same color along the horizontal. That's a good way to view distribution within a dense set of points (overplotting).

    See http://www.r-bloggers.com/5-ways-to-...stograms-in-r/

    In any case, the only thing that matters is: https://en.wikipedia.org/wiki/Hetero...y#Consequences

    The coefficients may be as unbiased as they would normally be, but you might rule something not statistically significant when it is due to incorrect standard errors, which may or may not be a significant issue in your final model (e.g., Y ~ A + B + C may require C by fiat, but its not significant possibly due to heteroscedasticity, I'm still going to keep it in).
    You should definitely use jQuery. It's really great and does all things.

  5. The Following User Says Thank You to bryangoodrich For This Useful Post:

    noetsi (06-09-2016)

  6. #4
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Interpreting residuals

    Dason its cycle time. How many days you were in a given status. I guess that's a count, but it has a very wide range of possible values, thousands of possible values.

    I understand bryan that it will not bias the results. However, I am deciding whether to leave something in the model or not and two interesting variables have extremely high p values this way but with effect sizes that are substantively important in my judgment. I ran White's SE although they rarely change the results with my data and it did not here. I have no theory here - that is common in my analysis as none exists - but logically the variables that would be excluded might influence the results.

    I am guessing you both think it is not homoscedastic....
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  7. #5
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Interpreting residuals

    Ok two points from the link bryan noted.

    One author wrote, "unequal error variance is worth correcting only when the problem is severe."
    What is severe? For example to me the data I posted shows unequal error variance. Is it severe, nothing I read discusses when it is or is not.

    I did run White's Heteroscedasticity Consistent standard errors and they, as nearly always with my data, did not change the SE very much. But one thing I have never come across when reading about White's SE is how you can know for sure whether they corrected the problem. Or can you assume they nearly always will....
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  8. #6
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: Interpreting residuals

    The problem with trying to define severe is that it isn't an issue with the model. It's an issue with the objective the model is being used to serve. If I'm trying to predict an outcome within a certain margin of error and it can be shown that this unequal error variance results in wildly fluctuating predictions outside of the margin I'm willing to tolerate, than it's not an issue with the model. It's an issue with my expectations of using the model to service this application. In some applications, it might be alright, but in others it may not.

    The problem with testing severity with the model itself is that you've already fit the data. This is why cross-validation methods aim to see how the model performs with new data. You have a ton of data, break it into 10 groups, fit to 9 of those groups and see how poorly it estimates the 10th. Repeat so that you do a prediction for each group. The average of those 10 prediction errors is a good estimate of how well your model fits to new data (ceteris paribus). While each model may have some unequal variance, if it doesn't lead to poor out-of-sample predictions according to your judgement, then its a win (and a benchmark against which you can test other models trying to do the same thing).
    You should definitely use jQuery. It's really great and does all things.

  9. The Following User Says Thank You to bryangoodrich For This Useful Post:

    noetsi (06-09-2016)

  10. #7
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Interpreting residuals

    That seems useful advice, one thing I have never understood is how (in actual software) you predict the levels of the hold out data from the estimated model and determine the error (with time series there is a simple process, but then you are predicting few points). I know how this works in practice and I can do it with small data sets essentially manually. But I don't know how to do this with thousands of data points.

    It looks to me like White's is the generally accepted solution from the literature I looked at. But I found no simulations that showed how likely White is be right or wrong. Commonly they bring up the fact that hetero can be driven by a misspecified model and you should specify it correctly. Which always brings me to the question I have since my first regression course decades ago. In social science your model is always going to be misspecified, because reality is complex and we know too little about what we model (in my area there appears to be little empirical theory, its stress is on social interaction not data).

    So how do you fix something that is certainly, and unavoidably, wrong?
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  11. #8
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: Interpreting residuals

    All models are wrong, but some are useful.

    You can do k-fold cross-validation on time series data, albeit, with a little creativity. In cross-validation, the folds (groups) are just random assignment of the data, but time series requires that each group maintain the time series structure inherit in the data. Thus, you do a sort of stratified k-fold cross-validation. Alternatively, you can do variations of feed forward leave-one out cv (LOOCV) or repeated LOOCV.

    http://robjhyndman.com/hyndsight/tscvexample/
    http://robjhyndman.com/hyndsight/crossvalidation/
    https://en.wikipedia.org/wiki/Cross-...on_(statistics)
    You should definitely use jQuery. It's really great and does all things.

  12. The Following User Says Thank You to bryangoodrich For This Useful Post:

    noetsi (06-20-2016)

  13. #9
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Interpreting residuals

    Stop cowardice, ban guns!

  14. The Following User Says Thank You to hlsmith For This Useful Post:

    noetsi (06-20-2016)

  15. #10
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Interpreting residuals

    Similar figure with transparency and histograms:


    http://analytics.ncsu.edu/sesug/2011/RV08.Watts.pdf
    Stop cowardice, ban guns!

  16. #11
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Interpreting residuals


    Stop cowardice, ban guns!

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats