I've done several linear regression.
Now I would like to test if my residuals are normally distributed.
That is how my Q-Q Plots look like:
But I'm not sure what to make of the Q-Q Plots. Most of them are pretty much on the line, except for the beginning and ending.
What do you think?
Thank you for answer.
That's right, my sample size is n=414.
But I thought that a normal distribution of the residuals was a requirement for linear regression.
It is in the context of my hypothesis testing.
In theory normality is required. But in fact when you have several hundred cases it is not or that is the general view these days, the CLT means the regression will still generate correct results (or close enough). I think they refer to this as asymptotically correct.
I think this view has changed in recent decades, but courses have not caught up with the new view.
You have been told twice that normality of the residuals is not a necessary requirement,
in addition you have been told that this is based on the central limit theorem (CLT).
What else do you need for coming to a conclusion?
Other assumptions are not affected by this discussion, of course.
Yes, it's a funny thing that beginners always seem to be told to care
for normality, while this is the most unimportant feature.
The most important assumptions (linear relationships, and equal
variance of errors i.e. homoscedascity) should be checked, and
whether there's multicollinearity.
multicolinearity, except in rare cases, is not going to be a problem nor is there any real solution if there is that. Non-linearity is your greatest problem when it occurs imho. White SE deals with hetero, use them and don't worry about it.
Normality is part of the gaus markov assumptions which is why it is stressed. I guess no one considered the CLT.
@Karabiner - funny enough I read recently from someone of influence to just always use robust SEs as well. I don't think I will unless there is an obvious threat. And yeah, I have also heard it is just safe to always use pooled SEs in ttests.
@tt13 - you should look for heterogeneity. Even if you can address it via robust SEs, etc. - it is good to know your data and the underlying relationships, since this could also prompt you to opt to incorporate data transformations.
I have seen that recommended. At worse it makes your test conservative.
I think it is good practice to test for heterogeneity and non-linearity regardless. I generally don't think there is much value in testing for Multicolinearity or normality unless you have very few cases.
Note that some say that high skew can distort regression results and that is a form of normality. I don't know if case size impacts this.
Thank you for your answers.
I don't know if it has any relevance, but I haven't mentioned it yet:
In the linear regression the mean values of Likert scales are used. For example:
Here are my results of heterogeneity and linearity test:
But what is that supposed to tell me in relation to my hypotheses? Are there some that should be better discarded?
the dependent variable or the predictor is likert scale? If the dependent variable is likert then linear regression is questionable, because the data is not interval. Although some say it is ok. Formally it does not matter with the predictor, although not all agree on that part. I suspect your lack of normality and hetero is tied to the likert scale if that is what you are predicting.
Yes, both are likert scale.
I know that this is controversial, but it is explicitly desired to carry out a linear regression.
In my case likert scale should be treated like an interval-scaled variable.
a likert dependent variable has issues. Ones that people disagree on.
Its beyond my expertise if you should evaluate your hypothesis based on it. I would look at the literature on likert variables and also run ordered logistic regression and see if the results agree. You won't have to worry about normality or hetero that way.