# LinReg Assumption on errors ConfIV

#### Jacov

##### New Member
Hi all

why is it assumed that the errors e_i are normal distributed in linear regression, see e.g. Seber.

What would be the influence e.g. on confidence intervals for parameter estimates, if one neglect this assumption.

#### trinker

##### ggplot2orBust
This is a nonstatistician's stab at your questions:
why is it assumed that the errors e_i are normal distributed in linear regression, see e.g. Seber.
First the word assumed means something a bit different than you may think it means. It means that in order for this test to be appropriate the data must fit this specification. So one of the assumptions of the test you are using is that data comes from a population that follows the normal distribution (notice I didn't say the data is normally distributed as this is rarely the case). If the errors are normally distributed then this is an indication that the data comes from a normally distributed population.

What would be the influence e.g. on confidence intervals for parameter estimates, if one neglect this assumption
According to Cohen, Cohen, Aiken and West (2003) the violation of normally distributed error terms does not lead to biased estimates. "The affect of violation of the normality assumption on significance tests and confidence intervals depends on the sample size, with problems occurring in small samples."

#### Jacov

##### New Member

actually I am not interested in data but in mathematical derivations.

In LinReg (see Seber) one "assumes" that e_i ~ N, therefore one can show
that [TEX](n-p)s^2 / \sigma^2 \sim \chi^2_{n-p} [/TEX] where p is the number of
parameter, n the sample size, s^2 the variance estimator and \sigma^2 the true
variance. Based on this result the ConfIV based on a t-distribution is constructed.

Now my problem:

What would happen if e_i is not normal distributed . Then the whole argumenation does not work anymore.

Are there any asymptotical / approximate results available? For example based on
a central limit theorem?

Do you understand my issue? (It is about mathematical argumentation and not about data)

It totally agree with your presented citation from Cohen in practice !

#### Dason

Yes there are CLTs that can be applied. For instance if all we assume is that the error terms satisfy $$E[e_i] = 0$$ and $$Var[e_i] = \sigma^2 < \infty$$ then along with a mild condition on the predictor we can show that the parameter estimates are asymptotically normal.

#### Jacov

##### New Member
Okay thx

Let me summarize: In linear regression we have the following results: (always independence assumed)

[TEX] e_i \sim N(0,\sigma^2)[/TEX] : We obtain an "exact" confidence interval for the parameter

[TEX] E[e_i] = 0 [/TEX], [TEX] Var[e_i] = \sigma^2 < \infty [/TEX]: We obtain based on the asymptotical normality of the estimator an "approximate" confidence interval for the parameter

Am I right?