testing heteroscedastcity when you can not graph the results

noetsi

Fortran must die
#1
I have 5400 and more data points. It is very difficult to see anything in the graph with so many (SAS won't actually plot that many although I am working on changing the defaults).

How do you do a hetero test given this. To make things worse at least one of the hetero test makes an assumption of the data being singular which was not met and thus might not be useful (I don't think this means the data is not hetero, just that the test does not work).
 

obh

Active Member
#2
Did you try the white test?( It is a nice one running regression when the residuals (squared) are the DVs and Y and Y squared are the IVs)
 
Last edited:

noetsi

Fortran must die
#4
Ps as a bad workaround you can draw a chart based on random sample data from the 5400..
Can you. As it turns out I found a way to graph all the data, but it is always hard to see anything in a blob of 5000+ data points.
I ran what was probably white's test, but the results were not interpretable because a violation occurred. SAS calls this a heteroskedatic test without naming it.
 

noetsi

Fortran must die
#5
These residuals do not look heteroskedastic to me

1555541058613.png


But I ran a version of the white test

Specifying the SPEC option in the MODEL statement in PROC REG performs the test described in Theorem 2, page 823, of White (1980). This is a test of the joint null hypothesis that the errors are homoscedastic, that they are independent of the regressors, and that the model is correctly specified.

And the results were highly significant which suggests one of these factors were wrong (it could be that the model is specified wrong of course). I got a warning

The average covariance matrix for the SPEC test has been deemed singular which violates an assumption of the test. Use
caution when interpreting the results of the test.

So I am not sure if the model is or is not heteroskedastic. The qq plot is not entirely normal either, its heavy tailed, but I am not sure how much that will influence the results.
 

obh

Active Member
#6
Hi Noetsi.

I also think that the residuals do not look heteroskedastic.
So the white test claim the same?

"And the results were highly significant " of the regression model?

The residual doesn't distribute normally but it is probably close enough, so probably shouldn't be a problem.

The problem of heteroskedastic:
1. The optimization will be less accurate.
2. The SE of the coefficients is biased, so the significance of the results is not accurate.
But if your regression model results are highly significant, it probably won't influence the model predictors.
 

noetsi

Fortran must die
#7
In the end I just used White's SE. I think that p values, what hetero influence, don't even matter when you have a population, but a colleague stresses them so I have to address this.