missing data in multiple regression

spunky

Super Moderator
#1
so... here's the situation. i'm slowly becoming more and more interested in the problem of missing data because of pervasive it is. there're usually two ways to go around it: Full Information Maximum Likelihood (FIML) and Multiple Imputation (MI). routines for both methods have been automated in Structural Equation Modelling software programs but i was hoping to use them for simpler analyses (ANOVA, regression... the tea-test :p)

anyway, so i'm trying to help someone who has missing data and wishes to perform a straightforward multiple regression analysis. i thought to myself "no problem. with lavaan/R i can get the EM (expectation-maximization) covariance matrix, operate on it and obtain what i want. there is a problem, though, with the standard errors.

the formula i have for the standard errors is [MATH]\frac{\beta}{\sqrt{\sigma^{2}C_{jj}}}[/MATH] where [MATH]\sigma^{2}[/MATH] is the variance of the residuals and [MATH]C_{jj}[/MATH] is the diagonal element of [MATH](X'X)^{-1}[/MATH] associated with that particular variable.

i believe i have heard before that i cannot "naively" estimate the SEs of the regression coefficients because that underestimates the true variability due to missing data. does anyone know how to correct for it?