Missing value imputation on a non-normal distribution - EM vs Regression? Or else?


Bit green when it comes to some of these methods, so please be gentle... but I'll try to provide as much info and concisely as possible. Any help sincerely appreciated.

I have a non-normal dataset with missing at random data and am trying to determine which imputation method to use. I am using SPSS, so please if possible keep this in mind when offering practical suggestions.

Two things: 1. I don't feel missing value deletion is a good option as cases with missing values are still viable and valid records with complete data in important variables. 2. I cannot use multiple imputation as I need to import the imputed dataset into another dataset of different dimensionality for final analysis.


So I am looking at imputing using either Expectation Maximisation or Regression (as these are inbuilt in SPSS).

Does one of these approaches fit my data better than the other?

Or is there a more appropriate approach I could try?

About my data

Missingness model: Okay... so I am confident that my missing data is MAR. I have come up with a missingness model that I'm pretty satisfied with – showing a very strong relationship between the variable in which missing data occurs and a specific auxiliary variable I brought in, both in statistical and semantic terms. I won't say anything more than that as I don't think it necessary here.

Some stats for the variable with the missing data: 14% missing values (139 cases out of 987) Skewness 2.984 (SE .084) Kurtosis 7.427 (SE .168) Using Q-Q plots, the distribution of the variable with missing data seems to be closest to a Gamma distribution. Gamma dist. Shape: 0.2 Gamma dist. Scale: 0.000003


Thanks in advance for answers/suggestions.


Less is more. Stay pure. Stay poor.
Re: Missing value imputation on a non-normal distribution - EM vs Regression? Or else

Can you normalize the data via a transformation?

What is this variable, a predictor or your outcome?

Whichever approach you use, you need to do multiple imputation, meaning don't just impute them once. Multiple imputations accounts for your doubt in the estimates. With MI you then just calculate your outcome on all of the sets and use Little's formula to combine them.