A method to estimate linear regression coefficients when some data is missing

#1
Is there a method to estimate linear regression coefficients when some data is missing, without deleting the records are missing? I know that This publication (robins, 1994) deals with it. Is there a software that making the process? (Excuse me for my English)

thanks for your help,
Michael
 

ledzep

Point Mass at Zero
#2
I think the default setting for most packages is to estimate the coefficients by removing the missing cases.
If you want to keep the observations, then I think you can use some sort of impuatation method.


For example in R.

Code:
set.seed(58383)
test<-data.frame(group=c(rep("A",5),rep("B",5)), measure=c(rnorm(8),NA,NA))
>test
   group     measure
1      A  1.19098463
2      A  1.00703099
3      A -0.07903821
4      A -1.01747050
5      A  1.15636110
6      B  0.57750629
7      B -0.53048394
8      B -0.83861051
9      B          NA
10     B          NA

#So, we have two missing values

#Regular Linear Model

> model1<-lm(measure~group, data=test)
> summary(model1)

Call:
lm(formula = measure ~ group, data = test)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.4690 -0.5416  0.1444  0.7134  0.8414 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.4516     0.4042   1.117    0.307
groupB       -0.7154     0.6600  -1.084    0.320

Residual standard error: 0.9038 on 6 degrees of freedom
  (2 observations deleted due to missingness)                   ##R removes those 2 observations
Multiple R-squared: 0.1638,     Adjusted R-squared: 0.02438 
F-statistic: 1.175 on 1 and 6 DF,  p-value: 0.32 


# Running a random imputation in R 

#code to run random imputation
random.imp <- function (a){
 missing <- is.na(a)
 n.missing <- sum(missing)
 a.obs <- a[!missing]
 imputed <- a
 imputed[missing] <- sample (a.obs, n.missing, replace=TRUE)
 return (imputed)
}

test$measure.imp <- random.imp (test$measure)

#source code for imputing from: http://lane.compbio.cmu.edu/courses/gelmanmissing.pdf


##New dataset with imputed values
  group     measure measure.imp
1      A  1.19098463  1.19098463
2      A  1.00703099  1.00703099
3      A -0.07903821 -0.07903821
4      A -1.01747050 -1.01747050
5      A  1.15636110  1.15636110
6      B  0.57750629  0.57750629
7      B -0.53048394 -0.53048394
8      B -0.83861051 -0.83861051
9      B          NA  1.19098463
10     B          NA -0.53048394


#Now run your model

> model2

Call:
lm(formula = measure.imp ~ group, data = test)

Coefficients:
(Intercept)       groupB  
     0.4516      -0.4778

Code:
# Estimates using regular model

Call:
lm(formula = measure ~ group, data = test)

Coefficients:
(Intercept)       groupB  
     0.4516      -0.7154  

# Estimates with those observations kept in (using a random impuation)
Call:
lm(formula = measure.imp ~ group, data = test)

Coefficients:
(Intercept)       groupB  
     0.4516      -0.4778
However, imputation methods have their own fare share of criticism. So, you have to be careful when choosing what method you want to use for imputation.

Not sure if you meant this? Or am I going off the tangent with your question.
 
Last edited: