+ Reply to Thread
Results 1 to 3 of 3

Thread: A question regarding missing variables

  1. #1
    Points: 1,799, Level: 24
    Level completed: 99%, Points required for next Level: 1

    Thanked 0 Times in 0 Posts

    A question regarding missing variables

    Hey all,

    If you have missing values in your data set, how do you go about doing the statistical analysis?

    For example if you have a data chart with entries for gender, age, and weight, but you are missing some weight entries (i.e. 45 weight entries out of a sample size of 50 people), how would you output the mean and standard deviation? Would you just add the weights of the 45 known values and divide by 50 to get the average?

  2. #2
    TS Contributor
    Points: 7,284, Level: 56
    Level completed: 67%, Points required for next Level: 66

    MD, USA
    Thanked 11 Times in 11 Posts

  3. #3
    TS Contributor
    Points: 6,789, Level: 54
    Level completed: 20%, Points required for next Level: 161
    terzi's Avatar
    Mexico City, Mexico
    Thanked 34 Times in 33 Posts

    Deletion or imputation

    Hi moomoo345,

    As you may have noticed, missing values are a huge topic and many books have been written to cover it. The procedure you have to use will depend mainly in the nature of your data, the analysis you intend to perform and specially on the reason that presumably caused those cases to be missing.

    What most people do when dealing with missing data (even if they don't notice it) is a simple process called listwise deletion. This is nothing but erasing the cases with missing data. In that case, if you have 50 observations and you have 20 of them with missing data, you would only use the remaining 30 to obtain the average. There are obvious disadvantages with this approach.

    The other option is to use an imputation technique. There are many available and each one is appropriate for certain situations. Just as an example, one can impute using the mean, or using fitted values from a regression, or maybe using fitted values from a regression plus a random component. And there's also multiple imputation which is the most powerful method existent. Of course, some of this techniques require good knowledge on the topic.
    Statisticians are engaged in an exhausting but exhilarating struggle with the biggest challenge that philosophy makes to science: how do we translate information into knowledge

+ Reply to Thread


Similar Threads

  1. fill missing variables by cycle
    By gingerfish in forum SAS
    Replies: 5
    Last Post: 01-21-2011, 11:55 AM
  2. question about independent variables
    By mzimmers in forum Probability
    Replies: 8
    Last Post: 12-06-2010, 11:14 PM
  3. Combining Variables to Fill Missing Data
    By williamsons in forum Statistical Research
    Replies: 0
    Last Post: 12-02-2010, 05:19 AM
  4. Question regarding missing values
    By Rachie in forum SPSS
    Replies: 9
    Last Post: 09-30-2010, 10:55 AM
  5. Using Aggregate Proxies for Missing Licro Level Variables
    By gatormka in forum Psychology Statistics
    Replies: 0
    Last Post: 03-14-2008, 10:50 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Advertise on Talk Stats