A method to estimate linear regression coefficients when some data is missing

Is there a method to estimate linear regression coefficients when some data is missing, without deleting the records are missing? I know that This publication (robins, 1994) deals with it. Is there a software that making the process? (Excuse me for my English)

Re: A method to estimate linear regression coefficients when some data is missing

I think the default setting for most packages is to estimate the coefficients by removing the missing cases.
If you want to keep the observations, then I think you can use some sort of impuatation method.

For example in R.

Code:

set.seed(58383)
test<-data.frame(group=c(rep("A",5),rep("B",5)), measure=c(rnorm(8),NA,NA))
>test
group measure
1 A 1.19098463
2 A 1.00703099
3 A -0.07903821
4 A -1.01747050
5 A 1.15636110
6 B 0.57750629
7 B -0.53048394
8 B -0.83861051
9 B NA
10 B NA
#So, we have two missing values
#Regular Linear Model
> model1<-lm(measure~group, data=test)
> summary(model1)
Call:
lm(formula = measure ~ group, data = test)
Residuals:
Min 1Q Median 3Q Max
-1.4690 -0.5416 0.1444 0.7134 0.8414
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4516 0.4042 1.117 0.307
groupB -0.7154 0.6600 -1.084 0.320
Residual standard error: 0.9038 on 6 degrees of freedom
(2 observations deleted due to missingness) ##R removes those 2 observations
Multiple R-squared: 0.1638, Adjusted R-squared: 0.02438
F-statistic: 1.175 on 1 and 6 DF, p-value: 0.32
# Running a random imputation in R
#code to run random imputation
random.imp <- function (a){
missing <- is.na(a)
n.missing <- sum(missing)
a.obs <- a[!missing]
imputed <- a
imputed[missing] <- sample (a.obs, n.missing, replace=TRUE)
return (imputed)
}
test$measure.imp <- random.imp (test$measure)
#source code for imputing from: http://lane.compbio.cmu.edu/courses/gelmanmissing.pdf
##New dataset with imputed values
group measure measure.imp
1 A 1.19098463 1.19098463
2 A 1.00703099 1.00703099
3 A -0.07903821 -0.07903821
4 A -1.01747050 -1.01747050
5 A 1.15636110 1.15636110
6 B 0.57750629 0.57750629
7 B -0.53048394 -0.53048394
8 B -0.83861051 -0.83861051
9 B NA 1.19098463
10 B NA -0.53048394
#Now run your model
> model2
Call:
lm(formula = measure.imp ~ group, data = test)
Coefficients:
(Intercept) groupB
0.4516 -0.4778

Code:

# Estimates using regular model
Call:
lm(formula = measure ~ group, data = test)
Coefficients:
(Intercept) groupB
0.4516 -0.7154
# Estimates with those observations kept in (using a random impuation)
Call:
lm(formula = measure.imp ~ group, data = test)
Coefficients:
(Intercept) groupB
0.4516 -0.4778

However, imputation methods have their own fare share of criticism. So, you have to be careful when choosing what method you want to use for imputation.

Not sure if you meant this? Or am I going off the tangent with your question.

Last edited by ledzep; 12-23-2011 at 10:55 AM.

Oh Thou Perelman! Poincare's was for you and Riemann's is for me.