+ Reply to Thread
Results 1 to 3 of 3

Thread: A method to estimate linear regression coefficients when some data is missing

  1. #1
    Points: 471, Level: 9
    Level completed: 43%, Points required for next Level: 29

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    A method to estimate linear regression coefficients when some data is missing




    Is there a method to estimate linear regression coefficients when some data is missing, without deleting the records are missing? I know that This publication (robins, 1994) deals with it. Is there a software that making the process? (Excuse me for my English)

    thanks for your help,
    Michael

  2. #2
    Point Mass at Zero
    Points: 13,828, Level: 76
    Level completed: 45%, Points required for next Level: 222
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    684
    Thanks
    188
    Thanked 143 Times in 139 Posts

    Re: A method to estimate linear regression coefficients when some data is missing

    I think the default setting for most packages is to estimate the coefficients by removing the missing cases.
    If you want to keep the observations, then I think you can use some sort of impuatation method.


    For example in R.

    Code: 
    set.seed(58383)
    test<-data.frame(group=c(rep("A",5),rep("B",5)), measure=c(rnorm(8),NA,NA))
    >test
       group     measure
    1      A  1.19098463
    2      A  1.00703099
    3      A -0.07903821
    4      A -1.01747050
    5      A  1.15636110
    6      B  0.57750629
    7      B -0.53048394
    8      B -0.83861051
    9      B          NA
    10     B          NA
    
    #So, we have two missing values
    
    #Regular Linear Model
    
    > model1<-lm(measure~group, data=test)
    > summary(model1)
    
    Call:
    lm(formula = measure ~ group, data = test)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -1.4690 -0.5416  0.1444  0.7134  0.8414 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   0.4516     0.4042   1.117    0.307
    groupB       -0.7154     0.6600  -1.084    0.320
    
    Residual standard error: 0.9038 on 6 degrees of freedom
      (2 observations deleted due to missingness)                   ##R removes those 2 observations
    Multiple R-squared: 0.1638,     Adjusted R-squared: 0.02438 
    F-statistic: 1.175 on 1 and 6 DF,  p-value: 0.32 
    
    
    # Running a random imputation in R 
    
    #code to run random imputation
    random.imp <- function (a){
     missing <- is.na(a)
     n.missing <- sum(missing)
     a.obs <- a[!missing]
     imputed <- a
     imputed[missing] <- sample (a.obs, n.missing, replace=TRUE)
     return (imputed)
    }
    
    test$measure.imp <- random.imp (test$measure)
    
    #source code for imputing from: http://lane.compbio.cmu.edu/courses/gelmanmissing.pdf
    
    
    ##New dataset with imputed values
      group     measure measure.imp
    1      A  1.19098463  1.19098463
    2      A  1.00703099  1.00703099
    3      A -0.07903821 -0.07903821
    4      A -1.01747050 -1.01747050
    5      A  1.15636110  1.15636110
    6      B  0.57750629  0.57750629
    7      B -0.53048394 -0.53048394
    8      B -0.83861051 -0.83861051
    9      B          NA  1.19098463
    10     B          NA -0.53048394
    
    
    #Now run your model
    
    > model2
    
    Call:
    lm(formula = measure.imp ~ group, data = test)
    
    Coefficients:
    (Intercept)       groupB  
         0.4516      -0.4778

    Code: 
    
    # Estimates using regular model
    
    Call:
    lm(formula = measure ~ group, data = test)
    
    Coefficients:
    (Intercept)       groupB  
         0.4516      -0.7154  
    
    # Estimates with those observations kept in (using a random impuation)
    Call:
    lm(formula = measure.imp ~ group, data = test)
    
    Coefficients:
    (Intercept)       groupB  
         0.4516      -0.4778
    However, imputation methods have their own fare share of criticism. So, you have to be careful when choosing what method you want to use for imputation.

    Not sure if you meant this? Or am I going off the tangent with your question.
    Last edited by ledzep; 12-23-2011 at 10:55 AM.
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  3. #3
    Ninja say what!?!
    Points: 8,297, Level: 61
    Level completed: 49%, Points required for next Level: 153
    Link's Avatar
    Posts
    1,165
    Thanks
    37
    Thanked 84 Times in 76 Posts

    Re: A method to estimate linear regression coefficients when some data is missing


    You should look into EM algorithms. They try to work around the missing data problem.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats