How can I find out, if subscales are independent?

LtStarbuck

New Member
I need to find out, if four subscales of a scale a independent from each other in my sample. A correlation with all four is a good start, so I did that and found, that 4 of the 6 correlations are significant (r=.34 and higher). But I want to do it a little more sophisticated. My teacher said something about EFA and alphas.
But EFA ist not just one calculation, so im curious, on what I have to do specifically. Do I need do need to try to replicate the four subscales with all items? Or do I explore, how many factors the Items would create? What does all this have to do with Alpha? I know, that Alpha can tell me about, how good items are for a single scale. But how does alpha pay a role in several scales?

Last edited:

spunky

Smelly poop man with doo doo pants.
I need to find out, if four subscales of a scale a independent from each other in my sample. A correlation with all four is a good start, so I did that and found, that 4 of the 6 correlations are significant (r=.34 and higher). But I want to do it a little more sophisticated. My teacher said something about EFA and alphas.
But EFA ist not just one calculation, so im curious, on what I have to do specifically. Do I need do need to try to replicate the four subscales with all items? Or do I explore, how many factors the Items would create? What does all this have to do with Alpha? I know, that Alpha can tell me about, how good items are for a single scale. But how does alpha pay a role in several scales?
the logic behind EFA i do understand: if you do EFA with an oblique rotation and you notice that the factors underlying each subscale are correlated (and they quite likely will) then you cannot claim each sub-scale is independent from one another. a much better way to do this is through Confirmatory Factor Analysis (as opposed to Exploratory Factor Analysis, EFA) because you could look at the p-values of the correlations among the factors to see if they are significant or not.

i'm lost as far as the logic behind the alphas though. i'd venture a guess that your prof could be a fan of "old school" of psychometrics where it was trumpeted that if subscales had the similar alphas they are probably measuring the same construct. this is, of course, wrong in like SO many levels... but then again i'm venturing a guess here.

LtStarbuck

New Member
Hey spunky!
I doubt, that my prof is that oldschool, sice he advised me to use Hayes' method of calculating mediations (with the 'Process'-Macro in SPSS) instead of the causal steps of Baron and Kenny. I need to check of the scales are independent, so I can chose a method for my mediation (all X's in one calculation vs. 4 calculations with one X each).

The alpha-thing is confusing me aswell, but I refuse to ask my prof about, before I havent researched a few things myself (profs are too busy with other things anyway).

By CFA you mean, that I predetermine the number of factors (four because I have four subscales) in SPSS, right? (I apologize for my superficial knowledge). So then I would look, if the items of each scale have the highest loading on a different of the four factors?

Miner

TS Contributor
i'm lost as far as the logic behind the alphas though. i'd venture a guess that your prof could be a fan of "old school" of psychometrics where it was trumpeted that if subscales had the similar alphas they are probably measuring the same construct. this is, of course, wrong in like SO many levels... but then again i'm venturing a guess here.
Are we talking Cronbach's alpha and Item Analysis?
BTW, another option would be a Cluster Variables analysis.

spunky

Smelly poop man with doo doo pants.
Hey spunky!
I doubt, that my prof is that oldschool, sice he advised me to use Hayes' method of calculating mediations (with the 'Process'-Macro in SPSS) instead of the causal steps of Baron and Kenny. I need to check of the scales are independent, so I can chose a method for my mediation (all X's in one calculation vs. 4 calculations with one X each).
PROCESS uses the Baron & Kenny framework for mediation underneath because it relies on multiple regression. you don't see that because it does everything for you simultaneously but it is the same thing. Wright's path analysis and Joreskog's structural equations are the more modern way of doing this. it is not really a matter of which software your choose but of which statistical framework you rely on. in this case, that's still old school

The alpha-thing is confusing me aswell, but I refuse to ask my prof about, before I havent researched a few things myself (profs are too busy with other things anyway).
you're paying your prof! you should be able to bother him/her as much as you please!

anyhoo, i'm willing to bet my brownies (s)he may say something like what i mentioned, in which case the truth will be revealed

By CFA you mean, that I predetermine the number of factors (four because I have four subscales) in SPSS, right? (I apologize for my superficial knowledge). So then I would look, if the items of each scale have the highest loading on a different of the four factors?
nope. SPSS cannot do Structural Equation Modelling by itself. you'd need to buy the horrible add-on AMOS *or* you could do it for free in R using the lavaan package (or pay money and buy Mplus).

now that you mention this, however, it does prompt me to mention a three things:

- if you're letting SPSS decide the number of factors for you using its eigenvalue>1 rule, you're doing it wrong.
- if you're using SPSS' default of doing Principal Component Analysis instead of Factor Analysis, you're doing it wrong.
- if you're letting SPSS rotate the factors (components?) using its default variamx rotation, you're not actually testing what you'd like to test.

spunky

Smelly poop man with doo doo pants.
Are we talking Cronbach's alpha and Item Analysis?
BTW, another option would be a Cluster Variables analysis.
yup. there is no alpha in psychology but Cronbach's alpha!

yeah... cluster analysis sounds like a nice idea too. i dunno why we don't use it as often here in social sciency land

LtStarbuck

New Member
Hmm okay, if the CFA is not possible in SPSS, lets go back to the EFA.
Spunky, youre saying, that these default settings wont help me. Are there any specific settings, you can recommend?

spunky

Smelly poop man with doo doo pants.
Hmm okay, if the CFA is not possible in SPSS, lets go back to the EFA.
Spunky, youre saying, that these default settings wont help me. Are there any specific settings, you can recommend?
yes.

- the number of factors to extract must ultimately be determined by YOU and what your research question/theory mandates, not the computer. you can get some help from the computer though. it seems like for your case specifically, you'll want 4 factors. for a much better, robust way to help you decide the number of factors, Parallel Analysis is better than the eigenvalues>1 rule. you can do it here:

http://ires.ku.edu/~smishra/parallelengine.htm

- for the method of factor extraction choose maximum likelihood. do the whole analyze -> dimension reduction -> Factor. click on "extraction" and in the drop-down menu that reads "Method" change that to "Maximum Likelihood" (it should read Principal Components but you don't want that one). then go down and where it reads "Maximum Iterations for Convergence" change that from whichever number you have (i think it's 20-something) to something very large. like a bazillion. you wanna give maximum likelihood a fair chance.

- for the rotation go back to the previous menu, click on "rotation" and click on "promax" (you can leave the kappa default as it is).

if this does not converge, repeat step #2 but instead of choosing maximum likelihood choose principal axis factoring.

LtStarbuck

New Member
Hey, thank you so much for the help!
Luckily, no Gabillion iterations were needed, only 6, haha.
So I just ran the analysis and except for 3 Items, all of my 45 Items have the highest value in a different of the 4 factors in the pattern matrix. In other word, except for 3 Items, the attribution match up. Although I a few Items with very high (but not highest) values in different factors aswell. For the structure matrix, this pattern I alomost the same. Only one highest factor loading does not match with the 4 subscales, and its one of the 3 items, mentioned for the pattern matrix. But in the structure matrix for some items there are some high, but not highest values on the other factors aswell.
So can I say, that for all except 3 Items, the 4 subscales can be confirmed?

I have 40% explained variance with 4 factors (but there would be 10 with Eigenvalues >1) and 35% explained variance for squared factor loadings. I dont know, if that is a lot, but I I dont think so?
Also the correlationmatrix shows values as high as .4 between some factors. No good sign for independent subscales, right?

Last edited:

spunky

Smelly poop man with doo doo pants.
Hey, thank you so much for the help!
Luckily, no Gabillion iterations were needed, only 6, haha.
So I just ran the analysis and except for 3 Items, all of my 45 Items have the highest value in a different of the 4 factors in the pattern matrix. In other word, except for 3 Items, the attribution match up. Although I a few Items with very high (but not highest) values in different factors aswell. For the structure matrix, this pattern I alomost the same. Only one highest factor loading does not match with the 4 subscales, and its one of the 3 items, mentioned for the pattern matrix. But in the structure matrix for some items there are some high, but not highest values on the other factors aswell.
So can I say, that for all except 3 Items, the 4 subscales can be confirmed?

I have 40% explained variance with 4 factors (but there would be 10 with Eigenvalues >1) and 35% explained variance for squared factor loadings. I dont know, if that is a lot, but I I dont think so?
well, strictly speaking you can't really confirm anything using exploratory factor analysis, right? you'd need to do confirmatory factor analysis to answer this questions and you already mentioned you don't wanna step outside of the boundaries from SPSS, so it is what it is. the eigenvalues>1 rule tends to overextract most of the time. how many factors did parallel analysis said you could have? how much of a sample size do you have? how are these items being scored? i'm assuming in some sort of Likert-type scale? if so, how many response options do the people had?

Also the correlationmatrix shows values as high as .4 between some factors. No good sign for independent subscales, right?
nope, it ain't looking good in terms of "independent" subscales. which is reasonable, i guess. it's very difficult to find things that are uncorrelated with real data.

LtStarbuck

New Member
The help here is really great, thank you a lot! Also I apologize, if I probably ask mundane questions, Im really not very deep in the statistics :-/

how many factors did parallel analysis said you could have? how much of a sample size do you have? how are these items being scored? i'm assuming in some sort of Likert-type scale? if so, how many response options do the people had?
Some data:
Sample size: 333
Items: 45 (all with 4Point-Likert scale)

Im not sure, if I understand the parallel analysis right, here is what I did:
Number of Variables in your Dataset to be Factor Analyzed: 45
Sample Size of Your dataset: 333
Type of analysis: 1
Number of Random Correlation Matrices to Generate: 111
Percentile of Eigenvalues: 95
Seed: 1000

It goes to Root 21 (of 45), before the mean (thats the Eigenvalue, right?) goes under 1.0

nope, it ain't looking good in terms of "independent" subscales. which is reasonable, i guess. it's very difficult to find things that are uncorrelated with real data.
Yeah I guess thats just the natural thing to find. Its always hard to trust in your own data, when it goes a different way than expected, because the many oh so great studies showed differently. Of course thats the only reason, why the have been published in the first place, because nobody wants to read unsignificant data (at least thats what many papers seem to imply, sadly).

spunky

Smelly poop man with doo doo pants.
The help here is really great, thank you a lot! Also I apologize, if I probably ask mundane questions, Im really not very deep in the statistics :-/

Some data:
Sample size: 333
Items: 45 (all with 4Point-Likert scale)

Im not sure, if I understand the parallel analysis right, here is what I did:
Number of Variables in your Dataset to be Factor Analyzed: 45
Sample Size of Your dataset: 333
Type of analysis: 1
Number of Random Correlation Matrices to Generate: 111
Percentile of Eigenvalues: 95
Seed: 1000

It goes to Root 21 (of 45), before the mean (thats the Eigenvalue, right?) goes under 1.0
you almost got it right. what you have to do now is check the eigenvalues that SPSS reports from your data and compare those to the number that appears in the 'Means' column. the logic behind this is that you only keep the number of factors SPSS tells you are greater than the ones in the 'Means' column. so, for example, if your eigenvalues from SPSS were 5,4,3 and then they went down to 0.000001, 0.000000001, 0.000000001, and so on, you would compare those to the 'Means' column and would only keep the first 3 (the hypothetical 5, 4 and 3). in your case you already have a theoretical reason as for why you should use 4 factors so that in itself should trump any statistical rule of how many factors you should have. but i guess it's always good to have extra reassurances.

because your data are ordinal and not continuous, you could potential have the issue that you're getting weird results because you're not treating your data as categorical and doing your factor analysis on the poychoric correlation matrix, which is the one you should be using. but, once again, SPSS will not calculate this one for you (did i mention that you could do this in R for free? just saying... )

but since you've already admitted that statistics is not your forte, we'll just leave it like that.

Yeah I guess thats just the natural thing to find. Its always hard to trust in your own data, when it goes a different way than expected, because the many oh so great studies showed differently. Of course thats the only reason, why the have been published in the first place, because nobody wants to read unsignificant data (at least thats what many papers seem to imply, sadly).
well, there are some places now where researchers with null findings publish them (like here: http://www.jasnh.com/) because those could also be informative. but, in general, you're right: nobody (particularly in the social sciences) really cares much about stuff that happens if you don't get p <.05

maybe as a reassurance i would be willing to say that, from a data analytic perspective, you're not going about this quite right. as i mentioned to you, your research question (are my subscales independent of one another?) does not quite match the analysis that you're doing (exploratory factor analysis). you have a well-defined hypothesis that you could test (i.e. the null hypothesis H0: factor correlations = 0) if you were using the correct methodology (structural equation modelling/confirmatory factor analysis). but because you don't want to/can't step outside from SPSS we sort of have to do some hand-waving, close our eyes and 'pretend' that exploratory factor analysis can masquerade as a confirmatory technique. the crux of the issue is that you don't have a standard error for those factor correlations so you cannot test whether or not they are 0 in the population. they're on the bigger side of things though (r=0.4) but it could happen. or you could do some more advanced methods (a latent mediation model) but then again that implies you can do structural equation modelling.

all in all this is just to say that you could be right and what you expect about your scales could be true, but you'd need to use the proper analytic tools to discover this.

LtStarbuck

New Member
Oh statistics, its so complex sometimes, haha.
So here is my comparison (Eigenvalues from Data left, Random Data Eigenvalues from the Website right)

My Data ..... Random Data
7,97 ........... 1,79
4,84 ........... 1,70
3,18 ........... 1,63
2,41 ........... 1,58
1,41 ........... 1,53

So actually, the first four Eigenvalues from my own data are bigger than the Random ones, after that, they are smaller.

About your SPSS only comment. You are totally right. But I have to admit, that my calculations are causing me enough (mental) trouble already, so another application that I have to learn now, would be an economic disaster for my time :-D. Also I realise, that this is probably not the best way, to work with data, but thats how it is. :-/

Last edited:

LtStarbuck

New Member
Hey I have another question. I just read in the book of Andy Field about collinearity. Isnt that something, I can use? Its there to check for the independence of my variables aswell, right?

spunky

Smelly poop man with doo doo pants.
Hey I have another question. I just read in the book of Andy Field about collinearity. Isnt that something, I can use? Its there to check for the independence of my variables aswell, right?
collineartiy as in in multiple regression? that just means your variables are correlated. if you're doing factor analysis, you need some of that.

LtStarbuck

New Member
Yes exactly. Remember that the EFA was one suggestions of my prof to me, to check for independence of my subscales. But the independence is mostly important to decide, how I implement them in my mediation (one by one separately or all together. The last method would require the IVs to be independent / uncorrelated).
So instead of going of the way to different methods that dont seem to useful anyway, as you pointed out for the EFA, instead I could just use the check for (multi)collinearity of my IVs.

LtStarbuck

New Member
I actually had some free time to so something with AMOS (yes we have it on the campus, I did not know), and can make a CFA
Since I have four subscales, that are supposed to be independent, is this here the model, I need for my analysis, including the covariances?

Last edited:

spunky

Smelly poop man with doo doo pants.
oh god, i had forgotten how horrible AMOS is.

anyhoo, the key here would be to see what the p-value of the correlations between those latent factors are. if significant = subscales NOT independent.

you could also test the model where you constrain the factors to be orthogonal VS a model where they're freely estimated and see which model fits better. i bet my brownies the model with freely-estimated correlations fits better

LtStarbuck

New Member
oh god, i had forgotten how horrible AMOS is.
Although I never worked with it before (and never will again), I can say, it is really hideous and a total nightmare concerning ergonomics.

Anyway I have been looking for the correlations and found them unter "Estimates/scalars/correlations". Are those the one I am looking for? I have all my estimates there, but no p-values. Am I looking at the wrong place?

How can make the models with orthogonal and freely estimates? (betting brownies is a good idea, I think :-D )