Uncertainty about normality (Q-Q Plot)

tt13

New Member
#1
Hey everyone,
I've done several linear regression.
Now I would like to test if my residuals are normally distributed.
That is how my Q-Q Plots look like:
Bildschirmfoto 2021-04-19 um 17.36.14 (2).png
But I'm not sure what to make of the Q-Q Plots. Most of them are pretty much on the line, except for the beginning and ending.
What do you think?
 

Karabiner

TS Contributor
#2
Why do you want to know this? Seemingly your sample size is large,
therefore the normality assumption is irrelevant.

With kind regards

Karabiner
 
Last edited:

tt13

New Member
#3
Thank you for answer.
That's right, my sample size is n=414.
But I thought that a normal distribution of the residuals was a requirement for linear regression.
It is in the context of my hypothesis testing.
 
#4
In theory normality is required. But in fact when you have several hundred cases it is not or that is the general view these days, the CLT means the regression will still generate correct results (or close enough). I think they refer to this as asymptotically correct. :p

I think this view has changed in recent decades, but courses have not caught up with the new view.
 

Karabiner

TS Contributor
#6
You have been told twice that normality of the residuals is not a necessary requirement,
in addition you have been told that this is based on the central limit theorem (CLT).
What else do you need for coming to a conclusion?

Other assumptions are not affected by this discussion, of course.

With kind regards

Karabiner
 
Last edited:

tt13

New Member
#7
Okay.

Should I check any other assumptions as part of the hypothesis testing?

Sorry, I don't have that much experience in statistics.
 

Karabiner

TS Contributor
#8
Yes, it's a funny thing that beginners always seem to be told to care
for normality, while this is the most unimportant feature.
The most important assumptions (linear relationships, and equal
variance of errors i.e. homoscedascity) should be checked, and
whether there's multicollinearity.

With kind regards

Karabiner
 
#9
multicolinearity, except in rare cases, is not going to be a problem nor is there any real solution if there is that. Non-linearity is your greatest problem when it occurs imho. White SE deals with hetero, use them and don't worry about it.

Normality is part of the gaus markov assumptions which is why it is stressed. I guess no one considered the CLT.
 

hlsmith

Less is more. Stay pure. Stay poor.
#11
@Karabiner - funny enough I read recently from someone of influence to just always use robust SEs as well. I don't think I will unless there is an obvious threat. And yeah, I have also heard it is just safe to always use pooled SEs in ttests.

@tt13 - you should look for heterogeneity. Even if you can address it via robust SEs, etc. - it is good to know your data and the underlying relationships, since this could also prompt you to opt to incorporate data transformations.
 
#12
Should one use them by default (analogous to Welch t-test)?

With kind regards

Karabiner
I have seen that recommended. At worse it makes your test conservative.

I think it is good practice to test for heterogeneity and non-linearity regardless. I generally don't think there is much value in testing for Multicolinearity or normality unless you have very few cases.

Note that some say that high skew can distort regression results and that is a form of normality. I don't know if case size impacts this.
 
#13
Thank you for your answers.
I don't know if it has any relevance, but I haven't mentioned it yet:
In the linear regression the mean values of Likert scales are used. For example:
bsp.png
Here are my results of heterogeneity and linearity test:
het1.png het2.png
But what is that supposed to tell me in relation to my hypotheses? Are there some that should be better discarded?
 
#14
the dependent variable or the predictor is likert scale? If the dependent variable is likert then linear regression is questionable, because the data is not interval. Although some say it is ok. Formally it does not matter with the predictor, although not all agree on that part. I suspect your lack of normality and hetero is tied to the likert scale if that is what you are predicting.
 
#15
Yes, both are likert scale.
I know that this is controversial, but it is explicitly desired to carry out a linear regression.
In my case likert scale should be treated like an interval-scaled variable.
 
#16
that is making a big assumption but may explain the issues you found.

I doubt a likert scale dependent variable will be normally distributed or homoskedastic. Logistic regression is not.
 
#17
Sounds like usual issues handling with likert scale in the context of linear regression.
So it would not be a good idea to evaluate my hypotheses based on that?

Currently my hypotheses are evaluated based on R-square, F-value + p-value, beta coefficient, t-value + p-value.
 
#18
a likert dependent variable has issues. Ones that people disagree on.

Its beyond my expertise if you should evaluate your hypothesis based on it. I would look at the literature on likert variables and also run ordered logistic regression and see if the results agree. You won't have to worry about normality or hetero that way. :)