How to Handle Blank Fields in Multiple Linear Regression

I am building a regression with 45 samples across 20+ independent variables. I am randomly selecting subsets of the variables and running many combinations of regressions to help avoid multicollinearity. However, my main issue is concerning missing data in my samples. Each of the 45 samples is missing 1 (if not more) value(s) of independent variables. Ideally, I would be able to gather the missing data, however this is not feasible in this specific situation. Instead, I am seeking common alternatives to handling missing data in a regression analysis that will cause the least impact to the strength of my analysis. One idea is to use the median value for each independent variable across the samples, and populating all missing values with this figure. Another idea is to use all other independent variables to 'predict' an appropriate value for each individual variable with missing data, and use this prediction equation to populate the missing fields. What are other common methods for approaching a regression analysis with blank fields in the dataset?

Thank you for any help you can provide!


New Member
If you have less than 5-10% of missing values, then you are good to go. However, what is your sample size, 45 observations? That is pretty low as is. Rule of thumb -- 10 observations per regressor. Or do you have separate 45 samples? Then how many observations are there in each?
Cases with missing values (if less than 10%) are dropped in regression analysis (assuming listwise approach).
If you have more than that, you can look at Multiple Imputation or SEM (FIML) techniques.

Also, to address multicollinearity: have you checked the correlations between the variables? Are there any above .5?


Omega Contributor
Question, do you know the mechanism causing the missing data? Are they just randomly missing or is there a systematic reason?