# Uncertainty about normality (Q-Q Plot)

#### tt13

##### New Member
Hey everyone,
I've done several linear regression.
Now I would like to test if my residuals are normally distributed.
That is how my Q-Q Plots look like:

But I'm not sure what to make of the Q-Q Plots. Most of them are pretty much on the line, except for the beginning and ending.
What do you think?

#### Karabiner

##### TS Contributor
Why do you want to know this? Seemingly your sample size is large,
therefore the normality assumption is irrelevant.

With kind regards

Karabiner

Last edited:

#### tt13

##### New Member
That's right, my sample size is n=414.
But I thought that a normal distribution of the residuals was a requirement for linear regression.
It is in the context of my hypothesis testing.

#### noetsi

##### Fortran must die
In theory normality is required. But in fact when you have several hundred cases it is not or that is the general view these days, the CLT means the regression will still generate correct results (or close enough). I think they refer to this as asymptotically correct.

I think this view has changed in recent decades, but courses have not caught up with the new view.

#### tt13

##### New Member
So is it better to skip residual analysis?

#### Karabiner

##### TS Contributor
You have been told twice that normality of the residuals is not a necessary requirement,
in addition you have been told that this is based on the central limit theorem (CLT).
What else do you need for coming to a conclusion?

Other assumptions are not affected by this discussion, of course.

With kind regards

Karabiner

Last edited:

#### tt13

##### New Member
Okay.

Should I check any other assumptions as part of the hypothesis testing?

Sorry, I don't have that much experience in statistics.

#### Karabiner

##### TS Contributor
Yes, it's a funny thing that beginners always seem to be told to care
for normality, while this is the most unimportant feature.
The most important assumptions (linear relationships, and equal
variance of errors i.e. homoscedascity) should be checked, and
whether there's multicollinearity.

With kind regards

Karabiner

#### noetsi

##### Fortran must die
multicolinearity, except in rare cases, is not going to be a problem nor is there any real solution if there is that. Non-linearity is your greatest problem when it occurs imho. White SE deals with hetero, use them and don't worry about it.

Normality is part of the gaus markov assumptions which is why it is stressed. I guess no one considered the CLT.

#### Karabiner

##### TS Contributor
Non-linearity is your greatest problem when it occurs imho. White SE deals with hetero, use them and don't worry about it.
Should one use them by default (analogous to Welch t-test)?

With kind regards

Karabiner

#### hlsmith

##### Less is more. Stay pure. Stay poor.
@Karabiner - funny enough I read recently from someone of influence to just always use robust SEs as well. I don't think I will unless there is an obvious threat. And yeah, I have also heard it is just safe to always use pooled SEs in ttests.

@tt13 - you should look for heterogeneity. Even if you can address it via robust SEs, etc. - it is good to know your data and the underlying relationships, since this could also prompt you to opt to incorporate data transformations.

#### noetsi

##### Fortran must die
Should one use them by default (analogous to Welch t-test)?

With kind regards

Karabiner
I have seen that recommended. At worse it makes your test conservative.

I think it is good practice to test for heterogeneity and non-linearity regardless. I generally don't think there is much value in testing for Multicolinearity or normality unless you have very few cases.

Note that some say that high skew can distort regression results and that is a form of normality. I don't know if case size impacts this.

#### tt13

##### New Member
I don't know if it has any relevance, but I haven't mentioned it yet:
In the linear regression the mean values of Likert scales are used. For example:

Here are my results of heterogeneity and linearity test:

But what is that supposed to tell me in relation to my hypotheses? Are there some that should be better discarded?

#### noetsi

##### Fortran must die
the dependent variable or the predictor is likert scale? If the dependent variable is likert then linear regression is questionable, because the data is not interval. Although some say it is ok. Formally it does not matter with the predictor, although not all agree on that part. I suspect your lack of normality and hetero is tied to the likert scale if that is what you are predicting.

#### tt13

##### New Member
Yes, both are likert scale.
I know that this is controversial, but it is explicitly desired to carry out a linear regression.
In my case likert scale should be treated like an interval-scaled variable.

#### noetsi

##### Fortran must die
that is making a big assumption but may explain the issues you found.

I doubt a likert scale dependent variable will be normally distributed or homoskedastic. Logistic regression is not.

#### tt13

##### New Member
Sounds like usual issues handling with likert scale in the context of linear regression.
So it would not be a good idea to evaluate my hypotheses based on that?

Currently my hypotheses are evaluated based on R-square, F-value + p-value, beta coefficient, t-value + p-value.

#### noetsi

##### Fortran must die
a likert dependent variable has issues. Ones that people disagree on.

Its beyond my expertise if you should evaluate your hypothesis based on it. I would look at the literature on likert variables and also run ordered logistic regression and see if the results agree. You won't have to worry about normality or hetero that way.