+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3
Results 31 to 44 of 44

Thread: Linear regression with non-normal data?

  1. #31
    Points: 590, Level: 11
    Level completed: 80%, Points required for next Level: 10

    Location
    MN
    Posts
    13
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Linear regression with non-normal data?




    I enjoyed every bit of this discussion and learned (hopefully) something. Thanks.
    +1

    Are the errors all pebbles or are they a mix of grains, pebbles, boulders, and mountains? This the best illustration of the problem I read on the subject when reading through the history of regression, which was originally developed independently by Gauss and Adrian Legrande. Gauss was trying to figure out the most likely path of an orbit.

    I'm a little confused how the belief that "predictors must be normally distributed" can be reconciled with the simple observation that we often include categorical predictors in multiple regression. It's hard to think of a more non-normal variable than a binary variable! So do people think that those variables are just exempt from the assumption or what?
    It seems that the important thing when deciding whether to use regression on a binary variable or not is whether your error rate in assigning 0's and 1's will be normal. If you had a group of people who either earn $10 or $100000 and you 'assign' a 0(earns $10) to a true 1(earns $100000), then regression may not be the best tool to use as your errors will not be normal.
    Last edited by leavesof3; 11-16-2011 at 09:41 AM.

  2. #32
    Points: 3,226, Level: 35
    Level completed: 18%, Points required for next Level: 124

    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Linear regression with non-normal data?

    But I feel a little guilty if I think I might be messing up the original poster by raising these issues. I take consolation from the fact that they probably forgot it long ago since they have not posted here in while
    The discussion (stimulating as it is) has not messed me up - in all honesty I have ignored most of it because to be frank, since I am not a statistician, most of it has been over my head. And come on, my absence has only been about 6 days!

    Your understanding is incorrect
    I find amazing is how often I have read, and been taught, that normality is important for linear regression
    I do find it a common misconception that the variables in a linear regression model have to be normally distributed.
    The overwhelming concensus is that my belief that the variables must be normally distributed is incorrect. So I thank everyone for at least setting me straight on that.

  3. #33
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Linear regression with non-normal data?

    Thanks. I was really rationalizing taking your thread off topic.....

    It was my belief as well (that it was required for linear regression). And, remarkably, I have been taught that by professors and read it in many statistical links.

    One, minor, thing to remember. While the consensus is that it is not required, there is an exception for statistical test with small sample sizes. In that case you need it for statistical test (that is to generate p values that are accurate). But you probably won't run regression very often with that few cases anyway.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  4. #34
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Linear regression with non-normal data?

    I hope your not saying you need the predictor variables to be normally distributed for small sample sizes (because that isn't true either). We need the error terms to be normally distributed for small sample sizes. For larger sample sizes we can get away with departures from normality but at no point do we ever require the predictors to be normally distributed.

  5. #35
    Points: 1,468, Level: 21
    Level completed: 68%, Points required for next Level: 32

    Posts
    12
    Thanks
    10
    Thanked 2 Times in 2 Posts

    Re: Linear regression with non-normal data?

    Sorry for resurrecting this thread, but is there a confirmation about this claim ?

    "For larger sample sizes we can get away with departures from normality but at no point do we ever require the predictors to be normally distributed."

    A reference, or something ? I ask, because in my regression analyses, the residuals are generally non normally distributed, with the PP plot showing a bow shaped curve, which is really annoying.

    Thanks.

  6. #36
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Linear regression with non-normal data?

    We get the "for larger sample sizes" part from the asympototic normality of OLS estimates of the regression coefficients: http://en.wikipedia.org/wiki/Proofs_...c_normality_of

    Since the OLS estimate is the same as the maximum likelihood estimate in the case of regression you can also use the asymptotic normality of the MLEs if you want.

    The part about not requiring normality of the predictors comes from the fact that we don't require any assumptions about the distribution of the predictor when deriving the distribution of the parameter estimates. It's literally just something that we don't require to derive the theory - so there really isn't a proof that we don't need the predictors to be normally distributed other than the fact that we don't need the predictors to be normally distributed to derive all of the properties of the estimates.

    With that said we typically want normally distributed errors. And depending on your situation there might be other more appropriate methods to use to analyze the data.
    I don't have emotions and sometimes that makes me very sad.

  7. The Following User Says Thank You to Dason For This Useful Post:

    Donald (03-18-2013)

  8. #37
    Points: 1,468, Level: 21
    Level completed: 68%, Points required for next Level: 32

    Posts
    12
    Thanks
    10
    Thanked 2 Times in 2 Posts

    Re: Linear regression with non-normal data?

    Thank you so much !

  9. #38
    Points: 3,124, Level: 34
    Level completed: 50%, Points required for next Level: 76

    Location
    Raleigh,NC
    Posts
    80
    Thanks
    0
    Thanked 9 Times in 9 Posts

    Re: Linear regression with non-normal data?

    All this have been a great thread. I just want to confirm what I'm reading.

    If we have the model Y_i = \beta_1 x_{i1} +\epsilon_i

    We are saying that \epsilon_i is assumed to normally distributed. X does not have to be; y does not need to be (thought it might help).


    Or said another way \epsilon_i being normally distributed implies that y is normally distributed condition on x.

    Correct?
    Last edited by Jrb599; 07-10-2013 at 07:28 PM.

  10. #39
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Linear regression with non-normal data?

    Right. Except I think you can even drop this part:
    (thought it might help)
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  11. #40
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Linear regression with non-normal data?

    If I understand correctly, normality is critical in linear regression to whether the p value is valid or not. And commonly among data analyst the p value is the most important thing people care about (the specific level of the IV is rarely critical if its significant - since a substantive interpretation of whether an effect size is large is very difficult to do and comparisons of realtive importance is not simple to do in linear regression or at least not agreed on).

    Is it the normality of the error terms or the raw data that would determine the validity of p (or is p not influenced as I assume by normality)?
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  12. #41
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Linear regression with non-normal data?

    The error term.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  13. The Following User Says Thank You to Jake For This Useful Post:

    noetsi (07-15-2013)

  14. #42
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Linear regression with non-normal data?

    So you would want to see if the residual distribution is normal I assume.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  15. #43
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Linear regression with non-normal data?

    Quote Originally Posted by noetsi View Post
    So you would want to see if the residual distribution is normal I assume.
    Yes - that is the typical way to assess that assumption.
    I don't have emotions and sometimes that makes me very sad.

  16. #44
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Linear regression with non-normal data?


    Quote Originally Posted by noetsi View Post
    And commonly among data analyst the p value is the most important thing people care about (the specific level of the IV is rarely critical if its significant - since a substantive interpretation of whether an effect size is large is very difficult to do and comparisons of realtive importance is not simple to do in linear regression or at least not agreed on).
    I guess it's hard to determine what people care about most, but a lot of people would argue that the p value is not very important (think of the whole practical vs statistical significance issue).

    At the end of the day, all the p value tells you is the probability of observing a coefficient as large or larger than that observed, given that the true population parameter is exactly zero. A lot of the time, the idea that the true parameter is exactly zero is really implausible anyway. Personally I'm usually a lot more interested in point and interval estimates for the parameter.

    a substantive interpretation of whether an effect size is large is very difficult to do
    I think interpreting coefficients is hard when the variables are measured on arbitrary scales. E.g. regressing score on some psychometric test on score on some other psychometric test. Then we often need to convert coefficients to standardised form, and interpret them in terms of t-shirt sizes (e.g. correlation of 0.5 = "large").

    But when the scaling of variables carry some substantive meaning, things aren't so bad.

    E.g., consider a regression of income on height in the US given by Baguley, 2010, adapted from Gelman and Hill, 2007.

    earnings = –60515 + 1256 x height in inches

    (I think earnings are per annum, but am not 100% sure)

    Knowing that an extra inch of height is associated with an extra USD1256 of earnings is an interesting piece of information whose importance we can grasp without any extra standardization or complicated interpretive scheme. Heck, standardizing this into a correlation (r = .24) hides the magnitude of the effect, if anything. Furthermore, having a best estimate of the quantity of earnings associated with a one-inch increase in height is a lot more informative than simply knowing that the data would be unlikely if the true relationship was zero (per a p value).

    Baguley, Thom. “When Correlations Go Bad.” The Psychologist 23, no. 2 (2010): 122–123.

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats