LinReg Assumption on errors ConfIV

Jacov

New Member
#1
Hi all

why is it assumed that the errors e_i are normal distributed in linear regression, see e.g. Seber.

What would be the influence e.g. on confidence intervals for parameter estimates, if one neglect this assumption.
 

trinker

ggplot2orBust
#2
This is a nonstatistician's stab at your questions:
why is it assumed that the errors e_i are normal distributed in linear regression, see e.g. Seber.
First the word assumed means something a bit different than you may think it means. It means that in order for this test to be appropriate the data must fit this specification. So one of the assumptions of the test you are using is that data comes from a population that follows the normal distribution (notice I didn't say the data is normally distributed as this is rarely the case). If the errors are normally distributed then this is an indication that the data comes from a normally distributed population.

What would be the influence e.g. on confidence intervals for parameter estimates, if one neglect this assumption
According to Cohen, Cohen, Aiken and West (2003) the violation of normally distributed error terms does not lead to biased estimates. "The affect of violation of the normality assumption on significance tests and confidence intervals depends on the sample size, with problems occurring in small samples."
 

Jacov

New Member
#3
Thx for reply,

actually I am not interested in data but in mathematical derivations.

In LinReg (see Seber) one "assumes" that e_i ~ N, therefore one can show
that [TEX](n-p)s^2 / \sigma^2 \sim \chi^2_{n-p} [/TEX] where p is the number of
parameter, n the sample size, s^2 the variance estimator and \sigma^2 the true
variance. Based on this result the ConfIV based on a t-distribution is constructed.

Now my problem:

What would happen if e_i is not normal distributed . Then the whole argumenation does not work anymore.

Are there any asymptotical / approximate results available? For example based on
a central limit theorem?

Do you understand my issue? (It is about mathematical argumentation and not about data)

It totally agree with your presented citation from Cohen in practice !
 

Dason

Ambassador to the humans
#4
Yes there are CLTs that can be applied. For instance if all we assume is that the error terms satisfy \(E[e_i] = 0\) and \(Var[e_i] = \sigma^2 < \infty\) then along with a mild condition on the predictor we can show that the parameter estimates are asymptotically normal.
 

Jacov

New Member
#5
Okay thx

Let me summarize: In linear regression we have the following results: (always independence assumed)

[TEX] e_i \sim N(0,\sigma^2)[/TEX] : We obtain an "exact" confidence interval for the parameter

[TEX] E[e_i] = 0 [/TEX], [TEX] Var[e_i] = \sigma^2 < \infty [/TEX]: We obtain based on the asymptotical normality of the estimator an "approximate" confidence interval for the parameter

Am I right?