Polychoric Principal Component Analysis

#1
Hi everyone:

I am working with a questionnaire to evaluate socioeconomic position in a sample. It was basically counting a list of items to assess living standards. These items were home ownership, number of bathroom, ownership of household items (cars, bicycles, etc)...

I want to use polychoric principal component analysis to examine the variability of the sample and retain the first PC as an indicator of wealth, but I could't find a way to do that in R.

Thank you in advance.
 

spunky

Super Moderator
#2
it's relatively easy to get this with the psych package if you do it in two steps.

say you have some data (i'm gonna call it my_data) that's either a dataframe or a matrix of discrete scores. then you just do something like:

Code:
library(psych)

Poly_cor <- polychoric(my_data)$rho

principal(Poly_cor)
and by exploring the arguments of the principal() function you can specify rotations and other things like that.
 
#3
it's relatively easy to get this with the psych package if you do it in two steps.

say you have some data (i'm gonna call it my_data) that's either a dataframe or a matrix of discrete scores. then you just do something like:

Code:
library(psych)

Poly_cor <- polychoric(my_data)$rho

principal(Poly_cor)
and by exploring the arguments of the principal() function you can specify rotations and other things like that.
Thanks for the reply.
I was reading the guide on the psych package, but couldn't fit the polychoric function. Is it the polychoric.matrix function?
 

spunky

Super Moderator
#4
Thanks for the reply.
I was reading the guide on the psych package, but couldn't fit the polychoric function. Is it the polychoric.matrix function?
uhm... well, that is strange because i found the polychoric() function here

but i guess if polychoric.matrix() gives you the correct correlations you can use it.
 
#5
spunky,

I was trying to use the polychoric function on my dataset but got this error:
You have more than 8 categories for your items, polychoric is probably not needed


I thought that the polychoric function was used exactly for that... Any thoughts?

Just to make a quick test, I changed my dataset to have a maximum value of 8 and run the principal function and I got a PC for every variable, but what I wanted was the first PC for every individual (observation) on my dataset. Is there a way to do that?

I hope you can help me with this.
Thanks again.
 

noetsi

Fortran must die
#6
You might want to send questions about R to the R forum in the future as answers might come quicker there. On the other hand we have so many R types here that might not be true:p

This forum is really for questions about theory or a method not R.
 

spunky

Super Moderator
#7
I thought that the polychoric function was used exactly for that... Any thoughts?
you usually see little gains (and can encounter a lot more problems) by using polychoric correlations if the number of response options is bigger than 5. by the time you reach 7 categories the bias between the standard Pearson correlations and the polychoric correlations is so small that there really is no point on using it.

but what I wanted was the first PC for every individual (observation) on my dataset. Is there a way to do that?
uhm... i haz no cluez what you meanz here?
 
#8
My second question was the following:

I had (lets say) 200 observations with 15 variables (the list of items used to assess living standard).
I wanted to extract from those 15 variables the first principal component (PC) so I could have ONLY ONE variable as a wealth indicator for every individual.

What I got from using the principal function was the PC for every variable (the 15 items) and not for every individual (which should have been 200).

Thanks again spunky :)
 
Last edited:

spunky

Super Moderator
#9
i guess i'm struggling to understand what you mean by that you want "one variable per individual". your data are not "individuals". your data are the responses they gave to the items.

what about calculating a factor/principal component score on the PC that represents "wealth"? i'm assuming "wealth" is a latent variable here where you're using items as proxies, so certain proxies would be more correlated with this PC that stands for "wealth"? if you get a PC score, you would get a measure of this "wealth" per individual.
 

noetsi

Fortran must die
#10
In factor analysis there is a difference between doing analysis at the individual case level and the factor level (the former is much less commonly used). I suspect that is what is involved here although PC rather than EFA is occuring.
 

spunky

Super Moderator
#11
yes. R-factor analysis (on the items) VS Q-factor analysis (on the people). i wanted to first see if the OP was just misinterpreting the use of PCA because Q-factor analysis is somewhat questionable and dubious, which is why it fell in disuse around the 60s or 70s (and rightly so. it was too darn close to qualitative methods and people don't like that).

among the many, many criticisms wielded against it is the fact that the subspace on which the factors are being projected is undefined. but then again Q-factor analysts didn't concern themselves about that. the guy who invented it merely thought "hey, wouldn't it be cool if we transposed the dataset and did a factor analysis on the covariance matrix of the people?" whatever *that* is supposed to mean.
 

noetsi

Fortran must die
#12
No fair insulting dead guys who can't defend their methods :(

While it may not be used anymore it commonly is discussed in books on factor analysis.
 

spunky

Super Moderator
#13
No fair insulting dead guys who can't defend their methods :(
While it may not be used anymore it commonly is discussed in books on factor analysis.
it isn't an insult as much as make sure the division is clear. people of the reputation of Cronbach himself characterized Stephenson's methods (the inventor of Q-factor analysis and Q-methodology) as "arbitrary and untrustworthy".

you actually need to dig a little deeper into Psychometrics to bring it forth. i have to say i was surprised you knew it, since it is no longer regularly taught (or even mentioned) in modern Psychometrics courses and you don't usually find good explanations of it unless you read older (aka 'classic') books on factor analysis.

my beef with Stephenson's Q-factor analysis is that he tried to make it a mainstream psychometric method without providing much formal justification for its use. that's why his ideas were abandoned.

with the rise of qualitative methods, Q-methodologies have seen a revival (particularly among grounded-theory advocates) because that's where they were supposed to be from the beginning: qualitative methods, not quantitative. when he tried to masquerade them as quantitative he got chastised.

we're Psychometricians, after all. WE NEVER FORGIVE AND NEVER FORGET! (<--- this should be the motto of my House. House Spunky!). we like to punish subjectivity with the heavy hammer of publication bans. ask William Chambers who tried to overthrow traditional covariance modeling a la Joreskog with his method of "corresponding regressions" - Who is this Willam Chambers, you may ask? - EXACTLY. that's what happens to dissidents of mainstream psychometrics!
 

noetsi

Fortran must die
#14
How much theoretical justification is there for something like the say extremely common Tukey boxplot:p Or for decades for exponential smoothing which (while commonly being shown to be more accurate than more complex methods) had no theoretical basis before state space models were developed (decades after ESM became popular). When something works consistantly over time there is likely a reason it does even if we have not found out why yet.:)

I rarely read statistical text before (at the earliest) the nineties. Q models are commonly discussed at the back of EFA books (and I note most of those don't state that this approach is discredited).
 

spunky

Super Moderator
#15
How much theoretical justification is there for something like the say extremely common Tukey boxplot:p Or for decades for exponential smoothing which (while commonly being shown to be more accurate than more complex methods) had no theoretical basis before state space models were developed (decades after ESM became popular). When something works consistantly over time there is likely a reason it does even if we have not found out why yet.:)
well, there's plenty of theoretical justifcation for the use of the boxplot since it conveys, within one shot, the visual characteristics of the empirical distribution of your data. i know nothing about ESM but given that is considerably more sophisticated than a boxplot, i'm pretty sure there is some very strong theory behind what they do.

I rarely read statistical text before (at the earliest) the nineties. Q models are commonly discussed at the back of EFA books (and I note most of those don't state that this approach is discredited).
uhm... that's interesting. which textbook are you referring to? but the most important part is... do you ever use it? or do you ever see it used within the context of Psychometrics? no, you do not. it has seen a revival in other areas though where Psychometricians and quantitative analysts don't work. i'm not saying it's a useless method, but it was definitely not intended to be used as Stephenson would've wanted to have it used.
 
#16
what about calculating a factor/principal component score on the PC that represents "wealth"? i'm assuming "wealth" is a latent variable here where you're using items as proxies, so certain proxies would be more correlated with this PC that stands for "wealth"? if you get a PC score, you would get a measure of this "wealth" per individual.
Yes, I just wanted to retain the first principal component as an indicator of wealth, so I could work with this PC as a measure of "wealth" per individual as you said.
But when I used the principal function I got PC1 for each of my "items" (variables).

http://imgur.com/gcqREq8

How do I get a PC for every individual?
 

spunky

Super Moderator
#18
what about the option that i mentioned to you regarding calculating principal component scores from the principal component that implies the latent variable 'wealth'?

that way you would obtain a reading of 'wealth' for each participant.

other than that, i cannot see any other possibility to get information of 'wealth' per individual. if the scores don't work for you i don't think principal component is then the technique you should be using to get the information you want.
 
#19
Yes I want to do that, but I don't know how to do it. I know this isn't a space for R questions, but the only result I get is the one (the picture) I uploaded in the post above.

This are the only functions I am using:

ses <- read.table("SES_matrix_edit.txt", fill = T)
Poly_cor <- polychoric(ses)$rho
principal(Poly_cor, nfactors = 1)

Thank you again.