+ Reply to Thread
Results 1 to 9 of 9

Thread: Need help with Multiple Imputations ("pooled" dataset)

  1. #1
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Need help with Multiple Imputations ("pooled" dataset)




    Hello everybody,

    My original data set (n=901) contains missing values (10-20% per variable). I am applying Multiple Imputations technique using SPSS (also tried Stata and Lisrel). SPSS, for example, creates a new data set with a number of imputed datasets (say 5 imputations -> 5 “stacked” data sets). SPSS is capable of running analysis with this imputed data set (a “swirly” icon appears next to the type of analysis in the menu). As far as I understand, it uses “pooled” variances, std errors, etc. from all this imputed datasets in order to run the analysis.

    As of today, I was not able to find a way to export from SPSS (or other software I tried) this so called “pooled” data set in order to be able to use it in Smart PLS (the primary software I am using for my research project).

    Does anyone have any ideas or suggestions on how I can do this? I would truly appreciate your time and response.

    Thank you!

    P.S. Dear moderators, I apologize if I made a post in a wrong thread.

  2. #2
    TS Contributor
    Points: 22,389, Level: 93
    Level completed: 4%, Points required for next Level: 961
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Need help with Multiple Imputations ("pooled" dataset)

    Quote Originally Posted by kiton View Post
    As of today, I was not able to find a way to export from SPSS (or other software I tried) this so called “pooled” data set in order to be able to use it in Smart PLS (the primary software I am using for my research project).
    uhm... i think there's a small mistake on your understanding of how multiple imputation works. when they talk about pooled parameter estimates they don't mean that somehow the multiply-imputed datasets are "pooled" and then the analysis is ran on that one data set. if you use multiple imputation it generates several datasets, it fits the model you're specifying to each dataset and then those parameter estimates are pooled together, according to Rubin's rules for the estimates and their standard errors.

    so there is no pooled dataset, only pooled parameter estimates (and their uncertainties).

    but you mentioned that you tried to use Stata and LISREL. why aren't you using the gllamm package from Stata or LISREL instead of smartPLS? they both handle multiple imputation and FIML for missing data.

    the whole field of PLS as a substitute to traditional covariance modeling is... well, quite fishy to be honest. from what i've learnt about it, i see limited use in their methods and a lot of questionable assumptions.
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  3. The Following User Says Thank You to spunky For This Useful Post:

    kiton (06-20-2014)

  4. #3
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Need help with Multiple Imputations ("pooled" dataset)

    Quote Originally Posted by spunky View Post
    uhm... i think there's a small mistake on your understanding of how multiple imputation works. when they talk about pooled parameter estimates they don't mean that somehow the multiply-imputed datasets are "pooled" and then the analysis is ran on that one data set. if you use multiple imputation it generates several datasets, it fits the model you're specifying to each dataset and then those parameter estimates are pooled together, according to Rubin's rules for the estimates and their standard errors.

    but you mentioned that you tried to use Stata and LISREL. why aren't you using the gllamm package from Stata or LISREL instead of smartPLS? they both handle multiple imputation and FIML for missing data.

    the whole field of PLS as a substitute to traditional covariance modeling is... well, quite fishy to be honest. from what i've learnt about it, i see limited use in their methods and a lot of questionable assumptions.
    1. Thank you very much for this clarification. It is clear to me now (yes, I was understanding it the wrong way).

    2. I have not heard of the gllamm package. I will surely explore it right away.
    - A question I have though: say I am using Lisrel to impute data. I obtain a news dataset where the "total sample size = the original sample size * the number of imputations", right? How is Lisrel going to treat it - like a dataset with "total sample size"? I assume this is wrong, to much variation. Or am I misunderstanding it?

    3. I agree with you on the sPLS. However, people do get published in the top tier journals using it. The reason I am using it is because some of my data is not normally distributed and there is nothing I can do to normalize it (FB, Twitter, and Youtube data).

    Once again, thank you for a comprehensive response.

  5. #4
    Phineas Packard
    Points: 16,013, Level: 81
    Level completed: 33%, Points required for next Level: 337
    Lazar's Avatar
    Location
    Sydney
    Posts
    1,159
    Thanks
    198
    Thanked 336 Times in 299 Posts

    Re: Need help with Multiple Imputations ("pooled" dataset)

    "I have done things to data. Dirty things. Things I am not proud of."

  6. The Following User Says Thank You to Lazar For This Useful Post:

    kiton (06-21-2014)

  7. #5
    TS Contributor
    Points: 22,389, Level: 93
    Level completed: 4%, Points required for next Level: 961
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Need help with Multiple Imputations ("pooled" dataset)

    Quote Originally Posted by kiton View Post
    Or am I misunderstanding it?
    yes, you are still misunderstanding it. let's work with a very simple example.

    say you want to fit a simple regression Y = b0 + bX + e to some data set that has missing data. you decide to use multiple imputation to take care of this and use some software. let's imagine that you're requesting 3 imputed datasets.

    what the computer will do is:

    1. generate a complete dataset (first imputation)
    2. fit the Y = b0 + bX +e simple linear regression model to it and store the parameter estimates
    3. the old dataset gets thrown out, and the computer generates another dataset (second imputation)
    4. repeat step #2 until you've gone through all the imputations.

    what the computer will do is combine the parameter estimates in some particular ways that are described in Lazar's link. the original sample size remains the same. the magic happens in the way on how you average the regression coefficients across imputations (and their standard errors, since you need larger standard errors to account for the uncertainty surrounding the estimation.

    Quote Originally Posted by kiton View Post
    3. I agree with you on the sPLS. However, people do get published in the top tier journals using it. The reason I am using it is because some of my data is not normally distributed and there is nothing I can do to normalize it (FB, Twitter, and Youtube data)
    yes, the PLS approach to covariance structure modelling is quite popular particularly with people who work in information systems... and that's about it (maybe some marketing). it's quite unpopular in basically every other area of science (with the exception of the areas that use it the way it's creator, Herman Wold, intended for it to be used).

    the part where i get really angry is that smartPLS was designed for business-oriented people with little consideration to the statistical nuances involved in covariance modelling. when they tell you "oh, maximum-likelihood-based SEM cannot handle nonnormal data, you should use PLS" is a lie. there's quite a bit of literature surrounding robust estimators for every type of non-normality you can imagine and they're so easy to use that you only need to switch from "estimator A" to "estimator B" for everything to be taken care of.

    PLS, on the other hand, suffers from a lot more serious statistical deficiencies. you may not know this, but currently there exists no proof that the parameter estimates derived from PLS are neither efficient, unbiased or EVEN consistent. and this is very, very bad in Statistics because you're basically only estimating noise. but, of course, Chin & the people who work on PLS don't want others to know that because they want to sell you their software.

    maybe you may consider using the lavaan package in R? it's free and very much cutting-edge in the area of SEM. if you're worried about non-normal data the only thing you need to do is say "estimator = MLR" and "chi.square=Satorra-Bentler" and on the lavaan call and it tells it which estimator to use that handles nonnormal data.
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  8. The Following User Says Thank You to spunky For This Useful Post:

    kiton (06-21-2014)

  9. #6
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Need help with Multiple Imputations ("pooled" dataset)

    Quote Originally Posted by Lazar View Post
    Thank you! Definitely a good read.

  10. #7
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Need help with Multiple Imputations ("pooled" dataset)

    Quote Originally Posted by spunky View Post
    yes, you are still misunderstanding it. let's work with a very simple example.

    say you want to fit a simple regression Y = b0 + bX + e to some data set that has missing data. you decide to use multiple imputation to take care of this and use some software. let's imagine that you're requesting 3 imputed datasets.

    what the computer will do is:

    1. generate a complete dataset (first imputation)
    2. fit the Y = b0 + bX +e simple linear regression model to it and store the parameter estimates
    3. the old dataset gets thrown out, and the computer generates another dataset (second imputation)
    4. repeat step #2 until you've gone through all the imputations.

    what the computer will do is combine the parameter estimates in some particular ways that are described in Lazar's link. the original sample size remains the same. the magic happens in the way on how you average the regression coefficients across imputations (and their standard errors, since you need larger standard errors to account for the uncertainty surrounding the estimation.
    Ok, I get it now.

    Quote Originally Posted by spunky View Post
    yes, the PLS approach to covariance structure modelling is quite popular particularly with people who work in information systems... and that's about it (maybe some marketing). it's quite unpopular in basically every other area of science (with the exception of the areas that use it the way it's creator, Herman Wold, intended for it to be used).
    You pointed out this exactly the way it is.

    Quote Originally Posted by spunky View Post
    the part where i get really angry is that smartPLS was designed for business-oriented people with little consideration to the statistical nuances involved in covariance modelling. when they tell you "oh, maximum-likelihood-based SEM cannot handle nonnormal data, you should use PLS" is a lie. there's quite a bit of literature surrounding robust estimators for every type of non-normality you can imagine and they're so easy to use that you only need to switch from "estimator A" to "estimator B" for everything to be taken care of.
    Oh, yes - for business people who know very little of stats.

    Quote Originally Posted by spunky View Post
    maybe you may consider using the lavaan package in R? it's free and very much cutting-edge in the area of SEM. if you're worried about non-normal data the only thing you need to do is say "estimator = MLR" and "chi.square=Satorra-Bentler" and on the lavaan call and it tells it which estimator to use that handles nonnormal data.
    Honestly, I feel that my major weakness right now is not knowing R. Not only I have not had time for it in the past couple years, but I am also very reluctant to "programming" type of software. Looks like I must make a push and get myself into an R course or something. Or hire someone who knows it ))

    THANK YOU for your feedback.

  11. #8
    TS Contributor
    Points: 22,389, Level: 93
    Level completed: 4%, Points required for next Level: 961
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Need help with Multiple Imputations ("pooled" dataset)

    Quote Originally Posted by kiton View Post
    Or hire someone who knows it
    please do! i've been recently doing research on unemployment and recent college graduates and the statistics are just quite daunting.

    i'm sure that some young programmer will be grateful for it!
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  12. The Following User Says Thank You to spunky For This Useful Post:

    kiton (06-23-2014)

  13. #9
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Need help with Multiple Imputations ("pooled" dataset)


    Quote Originally Posted by spunky View Post

    i'm sure that some young programmer will be grateful for it!
    Just in case you come across someone, let me know, please.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats