Factor analysis

noetsi

Fortran must die
#1
I have a personal view that some of 32 spending types influence a dependent variable. But 32 different types of spending seem to many variables (there are many other predictors to use besides these). I have no real academic based theory to build on. I was going to use factor analysis to reduce the 32 variables to a more manageable level.

The only way I can think of to do this is to see how much a customer spends in each category and run correlations between the spending they get (in factor analysis). I don't know if that is a valid way to do factor analysis (the theory would be something lies behind spending so that if you spend high on X1 and X2 there is something behind that not measured). All the analysis I have done with factor before used likert data for satisfaction data, never spending.
 

spunky

Doesn't actually exist
#2
For the type of problem you're describing, Principal Component regression seems more appropriate. It's almost like Factor Analysis but Principal Components does exactly what you're interested in doing: takes a large collection of variables and extracts the main 2-4 or so that explain the majority of the variance.

For Factor Analysis you need to take the extra step to assume the existence of latent variables that explain the covariances among your variables, but it appears that you simply want a principled "dimension reduction" method. So Principal Components can take care of that.
 

noetsi

Fortran must die
#4
thanks spunky and jake.

I think in practice what I called factor analysis is principal component extraction. They generate factors variables load on - principal components is an option inside the factor analysis to extract these factors.

Is it valid to use spending as a way to generate covariance among the variables (I have not seen that done in honesty).

I don't know partial least squares regression. I will have to look that up.
 

spunky

Doesn't actually exist
#5
Is it valid to use spending as a way to generate covariance among the variables (I have not seen that done in honesty).
I... don't really think I follow what you're asking here. I mean, sure. If there are all these ways to measure spending and all relate to it, perhaps (hopefully) you're going to end up with only 1 or 2 principal components that account for most of the variance. And since they are uncorrelated/orthogonal by construction, you'll be able to clearly see which clusters of variables contribute most to your prediction. I guess it really depends on how these other 32 variables related to spending are being measured.
 

noetsi

Fortran must die
#6
What I meant is that I have 32 types of spending. I want to reduce that to a small number. The only thing these factors vary on is spending of course. In looking at factor analysis or PCE they always seem to use some form of likert scale to generate covariance. I literally have never seen what I am doing used to generate covariance.