+ Reply to Thread
Results 1 to 3 of 3

Thread: How to Handle Blank Fields in Multiple Linear Regression

  1. #1
    Points: 11, Level: 1
    Level completed: 21%, Points required for next Level: 39

    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    How to Handle Blank Fields in Multiple Linear Regression




    I am building a regression with 45 samples across 20+ independent variables. I am randomly selecting subsets of the variables and running many combinations of regressions to help avoid multicollinearity. However, my main issue is concerning missing data in my samples. Each of the 45 samples is missing 1 (if not more) value(s) of independent variables. Ideally, I would be able to gather the missing data, however this is not feasible in this specific situation. Instead, I am seeking common alternatives to handling missing data in a regression analysis that will cause the least impact to the strength of my analysis. One idea is to use the median value for each independent variable across the samples, and populating all missing values with this figure. Another idea is to use all other independent variables to 'predict' an appropriate value for each individual variable with missing data, and use this prediction equation to populate the missing fields. What are other common methods for approaching a regression analysis with blank fields in the dataset?

    Thank you for any help you can provide!

  2. #2
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: How to Handle Blank Fields in Multiple Linear Regression

    If you have less than 5-10% of missing values, then you are good to go. However, what is your sample size, 45 observations? That is pretty low as is. Rule of thumb -- 10 observations per regressor. Or do you have separate 45 samples? Then how many observations are there in each?
    Cases with missing values (if less than 10%) are dropped in regression analysis (assuming listwise approach).
    If you have more than that, you can look at Multiple Imputation or SEM (FIML) techniques.

    Also, to address multicollinearity: have you checked the correlations between the variables? Are there any above .5?

  3. #3
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: How to Handle Blank Fields in Multiple Linear Regression


    Question, do you know the mechanism causing the missing data? Are they just randomly missing or is there a systematic reason?
    Stop cowardice, ban guns!

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats