# Factor analysis - how to create a composite scale?

#### labeauteestdanslarue

##### New Member
Hi,

I am doing a secondary analysis of the British Election study and am testing attitudinal variables (Likert scale) for their suitability in creating an attitudinal scale. I have selected 15 variables which I am testing for suitability for an Internationalism scale. Having run a “maximum likelihood” factor analysis, SPSS has established that these variables are contributing to four distinct factors. The literature and text books I have access to talk about how to decide how many items and factors to retain, but I can’t find any information regarding what I should do next.
Would someone be able to tell me whether what I propose to do is completely wrong and point in the right direction (either with their response or pointing me towards a text book that might help):

Because SPSS found four factors, I will create four subscales for the four the factors I have identified. I will then take an average score of these four factors and combine to create an overall score for internationalism. Is this correct so far? Should I only keep items that load greater than .6?

Once I recode new variables for each subscale, should I run a new factor analysis to see if they load on a single factor? Is this nonsense and would such a test yield identical results as the first factor analysis? (My brain hurts trying to figure this out).

Some subscales are comprised of two variables, others from 7, etc, providing I take the mean score for each factor, is this okay? It seams problematic to take an overall average of all items together, as those factors comprised of more items would load more on the overall scale, right? Furthermore, is it possible to weigh each subscale differently with regards to the overall scale? So for instance, I regard the “cultural openness” scale to be more important than another factor per theory, can I multiply this factor by day, 3, and then take the overall average (so this factor would contribute 3 times as much to the overall scale)? E.g 4 factors, A,B,C,D, so score=(3A+B+C+D)/6?

Finally, my reasoning for wanting to create a unidimensional scale is because I am expected to run multivariate regressions on it (so I'm not really interested in investigating four separate subscales that I believe to be contributing to a single concept). Is it statistically sound to score 5-level variables into scores of (0, 25, 50, 75, 100) so I can get a percentage score, so when analyzing the regression tables I can talk about differences in terms of percentage points on my internationalism scale? Alternatively, can I calculate a summative score of 1-5 and then multiply by 20 to get a percentage score? Is this entering dodgy territory?

Thanks so much in advance, any help, pointers or clarification would be enormously appreciated - TJ

#### CB

##### Super Moderator
SPSS has established that these variables are contributing to four distinct factors. The literature and text books I have access to talk about how to decide how many items and factors to retain, but I can’t find any information regarding what I should do next.
Hiya!
First question: How have you decided how many factors to retain? The fact that you say "SPSS has established" the number of factors hints to me that perhaps you have used the default setting in SPSS, which is simply to extract all factors with an eigenvalue greater than one ("Kaiser's stopping rule"). This, unfortunately, is quite possibly the worst way to select the appropriate number of factors (it will tend to overestimate the number of factors). Of the selection methods available in SPSS, referring to the scree plot is probably the best way to decide how many factors to choose (but is rather subjective). Better methods are Velicer's MAP and parallel analysis, but these aren't available as menu options in SPSS. They can be done in R.

Because SPSS found four factors, I will create four subscales for the four the factors I have identified.
This sounds fine (though the chosen number of factors may differ with one of the above-mentioned techniques).

I will then take an average score of these four factors and combine to create an overall score for internationalism.
It might be better to just add up the responses to all items to get your overall internationalism score. This will deal with the problem you mention of the subscales being of different sizes, resulting in some items having more influence than others if you average the four subscale scores.

Should I only keep items that load greater than .6?
Deciding how big a loading has to be before an item is considered to "load" on a subscale is a bit subjective, but .6 seems like a rather high cutoff. .3 might be more common.

Once I recode new variables for each subscale, should I run a new factor analysis to see if they load on a single factor? Is this nonsense and would such a test yield identical results as the first factor analysis? (My brain hurts trying to figure this out).
This would seem a little superfluous, yes.

Is it statistically sound to score 5-level variables into scores of (0, 25, 50, 75, 100) so I can get a percentage score, so when analyzing the regression tables I can talk about differences in terms of percentage points on my internationalism scale? Alternatively, can I calculate a summative score of 1-5 and then multiply by 20 to get a percentage score? Is this entering dodgy territory?
This does sound a little dodgy, yes When you start to talk about percentage point differences on the internationalism scale you're making the implicit assumption that your scale has a meaningful zero point (i.e. that it is a ratio scale). In fact, for a percentage score you also need a meaningful 100% point too (i.e. a point at which someone has "complete" internationalism). As much as I'm skeptical about the S.S. Stevens levels of measurement, this does seem to be going a bit overboard in terms of assumptions about the meaningfulness of measurements via your scale. Is it really meaningful to talk about someone having no internationalism, or complete internationalism?