Colinearity, Logistic GEE


I'm running a model and I think I've ran up against a colinearity issue. We have approximately 300,000 students in our dataset. For these students, we have information on their mothers and approximately 6000 kids have missing mother information. We've decided to include 'missing' as a category for the mother covariates :

age of 1st pregnancy; less than 19, greater or equal to 19, and missing
immigration status: immigrant, non-immigrant, and missing.

If I include these variables, among the other kid level characteristics, in my logistic GEE model, the parameter estimates for the 2 mother level variables are odd.

For age at 1st preg, I'm able to get an OR estimate for Missing vs <19 and greater equal to 19 vs < 19.

For Immigration Status: I'm able to get an estimate for immigrant vs non-immigrant, and the estimate for Missing category vs non-immigrant comes out to 0, and there is no pvalue associated with it either.

I'm using SAS proc Genmod.

I'm thinking colinearity might be an issue as for the two variables, the individuals in each missing category are the same kids. If i run a univariate analysis for each variable, I'm able to estimate all pair-wise OR.

Any suggestions?


TS Contributor
You may want to look into the Heckman two-step estimator or the Särndal & Lundström-estimator. These are methods which deals with missing data in an elegant way, in my opinion.