missing data

noetsi

Fortran must die
#1
Is there any way to get a good sense of how much missing data invalidates analysis, that is when you have to question your model? For example I would guess that at worse we might lose 2 percent of our cases in a multivariate regression (no one variable would have 2 percent missing, 2 percent of the cases will be missing some data on different variables when means the regression excludes that case)?
 

hlsmith

Not a robit
#2
Seems pretty trivial. If you assume MAR not an issue, you c an always play around with things like imputing, removing more systematically and see if that does anything, not always the best but carry values forward or mean, min,max impute and see if there is an effect, etc.
 

noetsi

Fortran must die
#3
I don't know why the data is missing in honesty. Part of it involves race where customers chose not to identify their race (that is not missing at random but is a small part of the data ).