# Thread: How to obtain individual factor scores after PCAs on multple imputed datsets???

1. ## How to obtain individual factor scores after PCAs on multple imputed datsets???

Hello,

I really hope, someone here can help me, I've searched the internet for hours and could not get any clue on how to solve my problem.
I want to do a principal component analysis on my dataset to combine my variables into a few components. The main analysis is, however, not the PCA, but a logistic regression with the different PCA factors as predictors. This means, I need to compute all the individual factor scores before I can proceed with the logistic regression. The problem is, that there are a lot of missings in my dataset and because of that, I cannot simply use listwise deletion because this would reduce my N too much. Therefore, I have to use an imputation method for handling my missing data. My method of choice would be a Multiple imputation method, and I've already generated 5 imputed data sets. BUT WHAT NOW??? In the literature on MI methods, it is usually recommended to combine the results of the main analysis, so I could do 5 PCAs on my five imputed datasets, but then I need to calculate individual factor scores to continue my analysis and do the logistic regression. And here's my dilemma: How can I do that with multiple imputed datasets?! I would have to combine the estimated missings from the 5 datasets directly, but this is not the usual way to analyse imputed datasets. The idea to model the uncertainty would get lost somehow, but I really do not know what to do or how to combine the individual estimates of missing values cases if I would try it - should I just calculate the means of each individual case out of the 5 different values which I obtained after the multiple imputation?!

If it were possible to create one single dataset out of the 5 imputed ones I could - theoretically - do all the analysis including the PCA just on that one data set...

Have you any, ANY ideas how to solve this problems?!

Greetings,
Marina

2. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

Thank you all for having read this awfully long and much too complicated piece of writing ....but I have finally, FINALLY, after hours and hours of research found a SOLUTION (and it isn't really that complicated as I thought....) so consider this TOPIC as SOLVED! Thx

3. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

Would you mind sharing your solution?

4. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

I've changed my general approach a little bit. Insteadt of using MI for doing the PCA I've used an EM-Algorithm to estimate just the covariances/correlations of my original sample directly, without making a detour over imputing missings. Then, I used the estimated correlation matrix as direct input into a single PCA. With the resulting factors I'm indeed calculating factor scores for every single of my five imputed data sets, followed by five logistic regressions. Finally, I'm combining the results of all five logistic regressions. That's it and I think it's statistically the best solution. Originally I had wished to use a Full Information Maximum Likelihood Estimation Algorithm for estimating my covariances but I couldn't find a properly intergrated syntax or program so I used the EM-Estimation option in SPSS (without imputing here, because those imputations are biased) - for imputing I used an R library (Amalia).+

Greetings,
Marina

5. ## The Following User Says Thank You to marbar For This Useful Post:

mrtwino (09-26-2013)

6. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

Originally Posted by marbar
I've changed my general approach a little bit. Insteadt of using MI for doing the PCA I've used an EM-Algorithm to estimate just the covariances/correlations of my original sample directly, without making a detour over imputing missings. Then, I used the estimated correlation matrix as direct input into a single PCA. With the resulting factors I'm indeed calculating factor scores for every single of my five imputed data sets, followed by five logistic regressions. Finally, I'm combining the results of all five logistic regressions. That's it and I think it's statistically the best solution. Originally I had wished to use a Full Information Maximum Likelihood Estimation Algorithm for estimating my covariances but I couldn't find a properly intergrated syntax or program so I used the EM-Estimation option in SPSS (without imputing here, because those imputations are biased) - for imputing I used an R library (Amalia).+

Greetings,
Marina
Hi Marina and many thanks for this brilliant idea! However I have some questions pertaining to it which I would be most grateful if you could help me solve it.

1) Could you please explain what do you mean about EM-Algorithm to estimate just the covariances/correlations of my original sample directly. Could you please descripe the steps in SPSS or a form of spss syntax in order to be able to do it
2) How you used the estimated correlation matrix as a direct input to PCA

Thank you so much of your support on this

Christos

7. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

Two simplier ways to do it:

1. Mplus will do what you want in a single step using either MI or FIML.
2. You do a PCA on each data set individually and treat the resulting factor scores as plausable values (i.e. each data set contains slightly different PCA results; they should be fairly similar if your missing data model is efficent). You then run your logistic regression and only then combine the results from the imputed data sets.

8. ## The Following User Says Thank You to Lazar For This Useful Post:

triunk (07-18-2012)

9. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

one silly question: If my only purpose is to do PCA in a complete data set and just stop to the extraction and interpretation of factors should I do logistic regression???

Many thanks

C

10. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

This is somewhat tough. It depends on how many factors you pull and how well defined the factor structure is and (most importantly) how much missing data you have and how effective your missing data model is. In most cases I would think that if you imputed datasets gave you wildly different results, such that it was not possible to integrate the findings, then this suggests you have to go back to the drawing board and improve your missing data model or give up in defeat.

You can help everything along a little by a) having a clear idea about the number of factors you want to extract and what the likely factor structure is and; b) using some form of target rotation so that you are giving the imputations less chance to diverge from each other. Of course the easist thing is to let Mplus do it for you or SPSS. SPSS in recent versions allows for multiple imputations and should automatically combine results for most analyses (I dont use SPSS much so can not be sure what it does this for and what it doesn't).

On a side note I think FIML is the way to go where possible. You will not have these problems of integrating multiple datasets with FIML.

11. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

Originally Posted by Lazar
This is somewhat tough. It depends on how many factors you pull and how well defined the factor structure is and (most importantly) how much missing data you have and how effective your missing data model is. In most cases I would think that if you imputed datasets gave you wildly different results, such that it was not possible to integrate the findings, then this suggests you have to go back to the drawing board and improve your missing data model or give up in defeat.

You can help everything along a little by a) having a clear idea about the number of factors you want to extract and what the likely factor structure is and; b) using some form of target rotation so that you are giving the imputations less chance to diverge from each other. Of course the easist thing is to let Mplus do it for you or SPSS. SPSS in recent versions allows for multiple imputations and should automatically combine results for most analyses (I dont use SPSS much so can not be sure what it does this for and what it doesn't).

On a side note I think FIML is the way to go where possible. You will not have these problems of integrating multiple datasets with FIML.
Thank you very much for this detailed answer. To be honest I used Amelia programs which uses something like EM to impute the missing data...but instead of one solution it gives 5...imputed data sets...the reason I did that was because I have likert-scale with ordinal data...so...I could not do FIML nor multiple imputation (at least so quickly) so I did something in the middle... now my problem was how to combine these results to do PCA...to finish this part of the analysis...

Again thank you soooooooo much for this reply. I really appreciate it.

Chris

12. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

Both FIML and MI will work perfectly fine with likert ordinal scales. You can either assume an underlying continous variable (in which case both MI and FIML will be relatively fast) or directly impute ordinal data (in which case you will need a relatively small missing data model and be willing to watch Amelia or what ever grind away in the background for some time).

I have left Amelia running for days at a time!

13. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

Originally Posted by Lazar
Both FIML and MI will work perfectly fine with likert ordinal scales. You can either assume an underlying continous variable (in which case both MI and FIML will be relatively fast) or directly impute ordinal data (in which case you will need a relatively small missing data model and be willing to watch Amelia or what ever grind away in the background for some time).

I have left Amelia running for days at a time!
Weeeeeeeell...Regarding FIML I know only AMOS. Amos assumes that everything is continuous and normally distributed, so if you want to include dummy variables, you have to dummy code them in SPSS before you attach
them in AMOS. So I wanted to avoid doing this analysis and I choose Amelia...but I guess now...this consumed more time than I have anticipated...

14. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

This is because AMOS is a piece of crap If you are using Amelia you know R in which case I would suggest Lavaan, SEM, or even the Open MX packages over Amos.

15. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

Actully no..I only used ameliaview...which was more friendly to me...to use...but I will look for the other open softwares...or Rtools

Many thanks anyway for this insightfull conversation!

16. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

Hey!

i have a similar problem and can´t seem to find a solution. Maybe someone can help me!!

I have to do validate a questionnaire, so I´m doing correlations (with other questionnaires) and a PCA. I did Multiple Imputation with SPSS, so now I have the original Data and 5 Imputations. Correlations weren´t a problem with multiple imputed Data in SPSS.

NOW: My problem is, SPSS doesn´t offer PCA for imputed data, so I don´t get a "pooled" result!

Is there any other way for me to do it? oder do I have to handle my Missing Data for the PCA some other way?

If some one has any advise what to do and how to do it, I would be very grateful!!!

Susanna

17. ## Re: How to obtain individual factor scores after PCAs on multple imputed datsets???

This is how I did it:

1. Create dummies for Imputation_

2. Put your factor analysis syntax between the "filter by" commands.

3. Run 5 factor analyses and save the factor scores 5 times, each time filtered by
another imputed data set. Make sure you rename the names of the factor scores each time to prevent confusion.

4. Sort the data set by respondent id.

5. Cut-paste the columns with factor scores 1 by 1 to create a horizontal line with factor scores instead of a diagonal one.

6. Take the mean of the 5 factor scores for each individual with the mean.5 syntax so that no missing values are allowed.

7. Now you have N averaged factor scores and 5*N missing factor scores, but that's OK because every respondent has a mean factor score now and the 5 duplicate respondents will be deleted listwise anyway in subsequent analyses.

Please let me know if anything is unclear or if you know a more efficient way.

Kind regards,

Paul Tromp
Research Master in Social and Behavioral Sciences, Tilburg University

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts