Homogeneity requirements for Linear regression

JDB

New Member
#1
I have a large data set >22,000 obs. This is industrial data with many unequal cells. Some cells have 1000's of obs, and some have 10's of obs. The Levene's test for Equality of Variances has a very sig p for non-homogeneity. Regardless of all that, the analysis makes very good sense, and useful for making a decision.

Residuals are normally distributed.

My question, is there a need for homogeneity with linear regression. I see the need for the normality of the residuals, and they are, but I do not see any discussion for homogeneity of the variances for linear regression .

Thanks for any help or ideas.

JDB
 
#2
My question, is there a need for homogeneity with linear regression.
In this case it will nat matter. Your parmeter estimates will be fine.

(It is when you want to do significance test with a saample size of say, n=20 that it it matter with constant variance.)
 

obh

Active Member
#3
Hi Greta,

From what sample size the "homogeneity of the variances" assumption is not important for linear regression?
I assume it is the same for ANOVA.?
 

noetsi

Fortran must die
#5
Generally speaking as the sample size gets larger violations of most (although not all) of the assumptions become less important. The results are asymptotically correct :p
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
Yeah, you all say this, but I think it should still be looked at to make sure something crazy extreme isn't going on. Visualization are so important. It could also reveal that say some outcome values aren't present or if values are bounded. Or help find erroneous outliers. It can't hurt to look at it.
 

obh

Active Member
#7
Generally speaking as the sample size gets larger violations of most (although not all) of the assumptions become less important. The results are asymptotically correct :p
Is there a "common" number, that from this number and larger, it is less important to check the "homogeneity of the variances" ?
 

obh

Active Member
#8
Yeah, you all say this, but I think it should still be looked at to make sure something crazy extreme isn't going on. Visualization are so important. It could also reveal that say some outcome values aren't present or if values are bounded. Or help find erroneous outliers. It can't hurt to look at it.
Correct, for example, CLT (if relevant ...) doesn't always work, try to do CLT on data with undefined skewness like F(3,3). the sample average will be skewed also for a huge sample size.