# Generating factor analysis variable output for a new sample

#### markcarver

##### New Member
There may indeed be a better way to do this, but here's my current thinking. I have a survey of 500 students at a university and have performed principal component analysis. Two factors are relevant for my research question, and I have output a score for each student so that there are now 2 new variables in my data (one I've named compliance, one I've named strategising).
I now have an extra 100 students who responded to the same survey but did so online (the first 500 were on paper). I want to know if there is a difference between those who complete online and those who were encouraged to fill in a sheet while they were in a lecture. They might take more time to think, might be more engaged, have stronger views - whatever. So here's what I want to do:
1. create a score for each of the 100 new students based on the variables from the previous factor analysis (where the new 100 students are not part of the analysis, they're just given a score from the 'formula' that is effectively my new variable).
2. perform a second factor analysis/PCA with all 600 students, creating new variable scores for each student on the new factors which are closest in interpretation to the old ones.
3. Compare the scores using Pearson correlation (or maybe chi-sq would be better?) to see how much difference it makes adding the extra 100 students.

I hope this makes sense. Please let me know if you know a way to do this, or a better way to achieve a similar outcome. All I can think of at the moment is to write an expression using the factor loadings as multipliers, but I have up to 35 contributing variables so it would be a bit of a pain.

Thank you

Mark

#### Injektilo

##### New Member
I'm no expert in stats, but personally I would just run the PCA on the full sample of 600, and check to make sure that the 2 factors as you defined them still exist (it should, otherwise that tells you something right there about how different those new 100 cases are from the other 500). Add a variable indicating which of the 2 groups each student belongs to. Then check for normality of the distribution of the two variables created from the PCA. If they both appear normal, run an independent t-test on the both factor variables. If they do not appear normal, run a Mann Whitney U test instead. Either way, look for p-values < 0.05 from the tests.