A question regarding missing variables

Hey all,

If you have missing values in your data set, how do you go about doing the statistical analysis?

For example if you have a data chart with entries for gender, age, and weight, but you are missing some weight entries (i.e. 45 weight entries out of a sample size of 50 people), how would you output the mean and standard deviation? Would you just add the weights of the 45 known values and divide by 50 to get the average?


TS Contributor
Deletion or imputation

Hi moomoo345,

As you may have noticed, missing values are a huge topic and many books have been written to cover it. The procedure you have to use will depend mainly in the nature of your data, the analysis you intend to perform and specially on the reason that presumably caused those cases to be missing.

What most people do when dealing with missing data (even if they don't notice it) is a simple process called listwise deletion. This is nothing but erasing the cases with missing data. In that case, if you have 50 observations and you have 20 of them with missing data, you would only use the remaining 30 to obtain the average. There are obvious disadvantages with this approach.

The other option is to use an imputation technique. There are many available and each one is appropriate for certain situations. Just as an example, one can impute using the mean, or using fitted values from a regression, or maybe using fitted values from a regression plus a random component. And there's also multiple imputation which is the most powerful method existent. Of course, some of this techniques require good knowledge on the topic.