Testing the Normality of Errors in Regression?

mity

New Member
#1
Hi all,

I have simple conceptual question:

In the simple linear regression problem, where the true relationship is,

[math] y = ax + b + e [/math]

the error terms, [math] e [/math], are assumed to be normally distributed [math] N(0,\sigma^2) [/math].

However, linear regression only yields estimates [math] \alpha \approx a [/math] and [math] \beta \approx b [/math]. The resulting equation is,

[math] y = \alpha x + \beta + \epsilon [/math]

where [math] \epsilon [/math] is the observable residual instead of the unobservable error [math] e [/math].

1. How can one test the normality assumption of the error terms if the true errors are unobservable?

2. The residual is an estimator of the error. What kind of estimator is it? Unbiased? Maximum likelihood?

3. The true heart of my question is this: if the reasoning is to test the normality of the errors by testing the normality of the estimator of the errors, the residuals, how does one justify a test for normality since all of the residuals are correlated? From my understanding the residuals are all correlated by their dependence on the line of best fit. While, the tests of normality (goodness-of-fit) assume independent, identically distributed (iid) random variables.

Thanks,
mity
 

hlsmith

Omega Contributor
#2
Not sure I am getting at your questions, but the test of normality is looking for homogeneity in residuals for appropriateness of procedure.

This is probably not the cogent remark that you want, but wouldn't many values tested for normality be bounded by some type of constraint (ceilings or floor). So in a general, they have the possibility of being correlated in the same generic way?
 

Dragan

Super Moderator
#3
Hi all,


3. The true heart of my question is this: if the reasoning is to test the normality of the errors by testing the normality of the estimator of the errors, the residuals, how does one justify a test for normality since all of the residuals are correlated? From my understanding the residuals are all correlated by their dependence on the line of best fit. While, the tests of normality (goodness-of-fit) assume independent, identically distributed (iid) random variables.

Thanks,
mity
The major point you are missing the is that X is (classically) assumed to be fixed under repeated sampling at each level of X. Thus, the errors are iiid and normally distributed with equal variances at each level of X.....Further, there's the assumption that their is no autocorrelation between the error terms.

In short, the structural and distributional assumptions associated with the error terms are that they are iid with expected values of zero with equal variances at each level of X under repeated sampling.

Note that if you relax the assumption that X is random instead of fixed, then these assumption still hold.
 

Dason

Ambassador to the humans
#4
In short, the structural and distributional assumptions associated with the error terms are that they are iid with expected values of zero with equal variances at each level of X under repeated sampling.
I actually think the OP has a fairly good grasp of what is going on. Note that they said the residuals are correlated (which is true). Typically though by the time you get enough observations to do a normality test the correlation isn't strong enough to care about.