# Thread: A method to estimate linear regression coefficients when some data is missing

1. ## A method to estimate linear regression coefficients when some data is missing

Is there a method to estimate linear regression coefficients when some data is missing, without deleting the records are missing? I know that This publication (robins, 1994) deals with it. Is there a software that making the process? (Excuse me for my English)

Michael

2. ## Re: A method to estimate linear regression coefficients when some data is missing

I think the default setting for most packages is to estimate the coefficients by removing the missing cases.
If you want to keep the observations, then I think you can use some sort of impuatation method.

For example in R.

Code:
``````set.seed(58383)
test<-data.frame(group=c(rep("A",5),rep("B",5)), measure=c(rnorm(8),NA,NA))
>test
group     measure
1      A  1.19098463
2      A  1.00703099
3      A -0.07903821
4      A -1.01747050
5      A  1.15636110
6      B  0.57750629
7      B -0.53048394
8      B -0.83861051
9      B          NA
10     B          NA

#So, we have two missing values

#Regular Linear Model

> model1<-lm(measure~group, data=test)
> summary(model1)

Call:
lm(formula = measure ~ group, data = test)

Residuals:
Min      1Q  Median      3Q     Max
-1.4690 -0.5416  0.1444  0.7134  0.8414

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.4516     0.4042   1.117    0.307
groupB       -0.7154     0.6600  -1.084    0.320

Residual standard error: 0.9038 on 6 degrees of freedom
(2 observations deleted due to missingness)                   ##R removes those 2 observations
Multiple R-squared: 0.1638,     Adjusted R-squared: 0.02438
F-statistic: 1.175 on 1 and 6 DF,  p-value: 0.32

# Running a random imputation in R

#code to run random imputation
random.imp <- function (a){
missing <- is.na(a)
n.missing <- sum(missing)
a.obs <- a[!missing]
imputed <- a
imputed[missing] <- sample (a.obs, n.missing, replace=TRUE)
return (imputed)
}

test\$measure.imp <- random.imp (test\$measure)

#source code for imputing from: http://lane.compbio.cmu.edu/courses/gelmanmissing.pdf

##New dataset with imputed values
group     measure measure.imp
1      A  1.19098463  1.19098463
2      A  1.00703099  1.00703099
3      A -0.07903821 -0.07903821
4      A -1.01747050 -1.01747050
5      A  1.15636110  1.15636110
6      B  0.57750629  0.57750629
7      B -0.53048394 -0.53048394
8      B -0.83861051 -0.83861051
9      B          NA  1.19098463
10     B          NA -0.53048394

> model2

Call:
lm(formula = measure.imp ~ group, data = test)

Coefficients:
(Intercept)       groupB
0.4516      -0.4778``````

Code:
``````
# Estimates using regular model

Call:
lm(formula = measure ~ group, data = test)

Coefficients:
(Intercept)       groupB
0.4516      -0.7154

# Estimates with those observations kept in (using a random impuation)
Call:
lm(formula = measure.imp ~ group, data = test)

Coefficients:
(Intercept)       groupB
0.4516      -0.4778``````
However, imputation methods have their own fare share of criticism. So, you have to be careful when choosing what method you want to use for imputation.

Not sure if you meant this? Or am I going off the tangent with your question.

3. ## Re: A method to estimate linear regression coefficients when some data is missing

You should look into EM algorithms. They try to work around the missing data problem.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts