# Thread: PCA - horribly confused...

1. ## PCA - horribly confused...

Hi, I am doing a Principal Components Analysis (with SPSS) on Census data. My aim is to get a socio-economic typology of suburbs, which I will then work out if water consumption is significant for various suburbs (ie. high socio-economic status).

I am confused about *many* aspects, so please bear with me...

First, I assume that Census data, being counts, is Nominal or Ordinal? Am I correct?
- Second, my methodology is to calculate a new variable as a percentages of a population (eg. people with postgraduate qualifications as a % of total persons over 15 and using the calcutor function - "postgraduate qualifications / total people over 15 * 100" to create a new variable); and then have added all of these new variables into the PCA (as Scale, is this correct?)
- I then run the PCA (and a Varimax rotation) with all my variables. This initially gives about 22 factors but only really the first four or so have many variables loading on to them.
- I then get rid of variables which have low scores on the communalities, complex variables (eg. high loading on more than two of the components) and those which only load on minor components; and rerun the PCA a few times (it is significant at 0.01, and the test (can't remember the name, sorry) gives it over .80) Is this the correct procedure?
- This normally leaves me with about 10 factors. As I am only looking for 3-6, I then forcibly extract a lower number of factors/ and or select eigenvalues over, say, 4. But when I do this, it changes the numbers on the Communalities, with some now only scoring very low (but which variables I still want to use). Should I just leave the eigenvalues over 1 and ignore minor factors and their variables, or is it acceptable to say I only want 4 factors (I've done other analysis which suggests this is correct)?
- And, I have one variable which is measured in averages, whilst all the others are percentages of a population; can I add this variable to my PCA? - And, as this is my dependent variable (water consumption), do I leave it out entirely and then use it in regression against the factors got from the PCA?
- And, sorry for all these questions, lastly, a paper I read did a PCA, got three main factors and then split each factor into two; how do I do this in SPSS?

thanks so much for any help, I am really confused and I dont want my supervisor to think I am an idiot.

2. One quick question!
Do you plan to use wealth index (WI) as a measure of socieconomic status and you are trying the build the quintiles for classifying individuals?

3. No, I want to reduce a large number of variables to approximately 6 against which I can map areas having common socio-demographic characteristics (ie. suburban mortgage belt, middle income, family suburbs VS inner city, single, high income households).

My dependent variable is household water use, and I want to map this against the (reduced) number of independent variables (ie. suburbs having a large proportion of high income/high qualification people) and see if this is significant. It is the first cut of a larger study, and I would like to perhaps establish a typology of water use against socio-demographics; and then select further people to survey in depth from paired suburbs in different areas, but having similar characteristics (hope this isn't too confusing...)

I'm also debating whether just to use the PCA as an exploratory tool (as it should be) and then to use the scores on each factor to give me some idea to create new weighted variable (ie. % of population with postgrad qualifications * (weight x) + % household income over x (*weight y) + % managers and professionals * (weight z) = variable A (which to some extent, should correspond with the first rotated factor that I extracted...

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts