+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 16

Thread: PLS HELP! Regression with non-normally distributed errors

  1. #1
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    14
    Thanks
    3
    Thanked 0 Times in 0 Posts

    PLS HELP! Regression with non-normally distributed errors




    Hello Everyone,

    I'm desperatelly looking for help regarding an regression analysis I'd like to conduct. Apparently the residuals/error terms of the dependent variable are not normally distributed. A shapiro wilk test has confirmed this. However, different visualizations suggest that the distribution is not that far away from a normal distribution. I have already tried to transform the data of the Y varibale, but about half the data has negative values, which eliminates logarithms and square roots. The transformation 1/y stressed the problem even more. The data sample contains about 100 observtions.

    What should I do?
    Thank you so much!

  2. #2
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    hi,
    Could you give a bit more details, some data maybe? As a quick idea, adding a constant to all your data points will not change your model (except for the intercept) so you can easily get rid of negative values if that is the problem.
    regards
    rogojel

  3. #3
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    14
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    Hi rogojel,

    thanks for your answer. I have uploaded the data. IRR is the dependent variable. I know there are some values that are considered outliers, but I have tried to exclude some without success in terms of normality.

    What else do you need?
    Attached Files

  4. #4
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    hi,
    having had a quick look at the data I think the problem is that you have essentially a flat line (no dependence between Exit rate and Irr . The variability of the data is increasing with the exit rate though - and this explains why your residuals are not normal. Taking the logarithm of the Irr helps a bit but I do not see much sense in trying to regress Irr on exit rate with such weak connection. (R-sq for the log of irr against exit rate is 2.5%)

    Trying to model the variability of irr as a function of exit rate looks more promising , if this is an interesting question for you.

    regards
    rogojel

  5. #5
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    hi,
    just looking at the data further, the varaiabiliy is also not simpe to model. Is it possible that you have a mixture of different types of data here?

    regards
    rogojel

  6. The Following User Says Thank You to rogojel For This Useful Post:

    CheersToStata (07-26-2013)

  7. #6
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    14
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    Hi, thanks for your input!
    The data is both in percent, so there is no difference in the type of the data.
    I'm aware of the fact that the explanatory power is rather low. I'm replicating a model that is used plenty of times in the literature. The replication includes some more explanatory variables (see teh attached file). Depending on the variables Iinclude I get R-squared between 15 and 28, which is quite alright. Even if R-squared was lower, it would be okay.

    Is there some way that I can still proceed with a regression in some other form. Do you think a robust regression would solve the problem with the independent variable?

    Thank you so much!
    Attached Files

  8. #7
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    I have already tried to transform the data of the Y varibale, but about half the data has negative values, which eliminates logarithms and square roots.
    Actually what you do is add a constant to all the data so that the lowest value is positive. Than you log it. So if the lowest point is -42 you add 43 to all points and then log the results.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  9. #8
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    [QUOTE=CheersToStata;134429]Hi, thanks for your input!
    The data is both in percent, so there is no difference in the type of the data.[\QUOTE]

    hi,
    I mean something like some of the data coming from one type of input and some from another. E.g. if these were related to returns on stocks, you might have some stocks from the auto industry and some from chemicals, and they might behave differently.

    regards
    rogojel

  10. #9
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    14
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    I mean something like some of the data coming from one type of input and some from another. E.g. if these were related to returns on stocks, you might have some stocks from the auto industry and some from chemicals, and they might behave differently
    This should not be the issue. These are private equity returns, thats why there might be such high discrepancies and extreme values in the data set.

    Actually what you do is add a constant to all the data so that the lowest value is positive. Than you log it. So if the lowest point is -42 you add 43 to all points and then log the results.
    I have tried this in the meantime. It does not get any better.

    Is there some other regression that is a bit more lax regarding the normality assumption?

  11. #10
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    Logistic regression does not assume normality although you would normally not use it with interval data. Robust regression is designed to deal with outliers (as are methods that deal with M and S estimators). They may reduce the impact of normality violations.

    One issue that has not been raised is why exactly you are concerned with non-normality. They only influence the standard errors, and the test of signficance, not the parameter estimates. More importantly regression is robust to assumptions of normality at least if you have a fair number of cases.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  12. The Following 2 Users Say Thank You to noetsi For This Useful Post:

    CheersToStata (07-26-2013), gksiddiqui (08-21-2013)

  13. #11
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    14
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    Thank you very much noetsi!
    They may reduce the impact of normality violations.
    Does this also account for the independent variable (sorry for my lack of knowledge)? If yes, then I should be fine with a robust regression, should'nt I?!

    One issue that has not been raised is why exactly you are concerned with non-normality. They only influence the standard errors, and the test of signficance, not the parameter estimates
    Well, I have just checked the respective literature what I should look out for. Furthermore, as you wrote, non-normality might influence the test of significance, which is important to me.

    More importantly regression is robust to assumptions of normality at least if you have a fair number of cases.
    What is a fair number, I can also refer to in academic terms?
    Last edited by CheersToStata; 07-26-2013 at 10:38 AM.

  14. #12
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    Robust regression deals with the estimation of the regression line. You don't care about normality in the DV or IV. You care about normality of the residuals in the regression. The IV and DV normality does not matter at all for regression. All that matters is if the residuals are normal. That point gets missed a lot in treatments of normality which tend to focus on univariate analysis of normality (that is in the raw data).

    In my experience, and I found this out painfully comming here, normality is badly distorted in the literature. First they focus on normality specifically in the DV or IV which does not matter. Second they ignore that regression is robust to violations of normality (although what that means in practice is never very clear). But you can likely have mild non-normality in the residuals and if you have enough cases it won't matter that much. What a fair number is is never really defined in concrete terms in part because it depends on how many predictors you have. If you have several hundred cases and a few predictors you likely have a large sample.

    One possibility is to use one of the non-parametric tests and see if the results you get are generally similar. If they are use the regression (or at least you can have more confidence in the regression results).
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  15. The Following User Says Thank You to noetsi For This Useful Post:

    CheersToStata (07-26-2013)

  16. #13
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    14
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    Well, thank you! That clearifies the issue for the purpose of my research pretty much.

    One last question: Do you think I am fine, going with the Spearman correlation as a back-up?

    Greetings,
    CheersToStata

  17. #14
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: PLS HELP! Regression with non-normally distributed errors

    I am not sure what you mean by a backup but if you mean as a substitute for a non-parametric test, than I have not heard of that being done. Note I am not particularly experienced with non-parametric tests which I don't use (they are very important, I never had a chance to learn them).
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  18. #15
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    14
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: PLS HELP! Regression with non-normally distributed errors


    I meant to use spearman as a backup for the regression with OLS as you suggested.
    Since the regression tells the story I want to tell, I will just doublecheck the significance of the variables with a spearman correlation test. I'm pretty sure that is the right choice!

    Thanks again for your help!

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats