Item Response Question

hlsmith

Omega Contributor
#1
I have received some survey data for 200 respondents.

DV: Binary (20=1; 180=0)
20 IVs, Likert 5-point scale and currently treated as continuous, but data are
5 score inflated.


I have been asked which IVs are associated with binary outcome. There is obvious sparsity in these data given the rare outcome and lack of dispersion in IVs. I ran 20 bivariate logistic models with Firth correction and alpha = 0.05/20. Most of the variables come up significant, though I believe I need to control for collinearity between variables since questions could be saying the same thing,


I am guessing I could look at VIF and Tol tests or a correlation matrix to better figure out their inter-relationship. I am getting ready to do this. Though, is there a Factor Analysis or something else that I can use to get a composite cluster or a way to parcel the 20 down into just a couple variables that are the most telling? I don't what to over fit the model, since I know the confidence intervals will easily get blown out of control.


Thanks.
 

spunky

Smelly poop man with doo doo pants.
#2
If you're interested in reducing the dimension of your explanatory variables (20 IVs) then Principal Component regression (based off Principal Component Analysis) would be a better choice. That is unless you have some reason to believe the 20IVs are manifest variables relating some type of latent factor.

Remember, Factor Analysis assumes a statistical model that has a different purpose from Principal Components and, from the way you describe the problem, it seems like PCA is what you want.

I have not heard before about Principal Component regression within the context of logistic regression but it seems like a common-enough problem that someone surely has come up with something somewhere.
 

hlsmith

Omega Contributor
#3
Item responses are definitely correlated. The items fall into 5 categorical themes already. Let me know what you think about the following. I was asked to find out which variables are related to the binary outcome, well most of them are, so I am planning to just run 5 stepwise regressions, one for each of the categories and tell it just to keep the best predictor. I was going to use this to then just report via the 5 final bivariate models, the best predictor for each category.


I will divide my level of significance by 25, to address FDR. What should I use in my stepwise as the criteria for selection (inclusion, I am thinking accuracy (c-statistic)?


This seems slightly less like a general fish excursion.
 
#4
If you're interested in reducing the dimension of your explanatory variables (20 IVs) then Principal Component regression (based off Principal Component Analysis) would be a better choice. That is unless you have some reason to believe the 20IVs are manifest variables relating some type of latent factor.

Remember, Factor Analysis assumes a statistical model that has a different purpose from Principal Components and, from the way you describe the problem, it seems like PCA is what you want.

I have not heard before about Principal Component regression within the context of logistic regression but it seems like a common-enough problem that someone surely has come up with something somewhere.
agree.. offering a batter choice
 
#5
If you're interested in reducing the dimension of your explanatory variables (20 IVs) then Principal Component regression (based off Principal Component Analysis) would be a better choice. That is unless you have some reason to believe the 20IVs are manifest variables relating some type of latent factor.

Remember, Factor Analysis assumes a statistical model that has a different purpose from Principal Components and, from the way you describe the problem, it seems like PCA is what you want.

I have not heard before about Principal Component regression within the context of logistic regression but it seems like a common-enough problem that someone surely has come up with something somewhere.
i think the only choice.. :)