# Item Response Question

#### hlsmith

##### Omega Contributor
I have received some survey data for 200 respondents.

DV: Binary (20=1; 180=0)
20 IVs, Likert 5-point scale and currently treated as continuous, but data are
5 score inflated.

I have been asked which IVs are associated with binary outcome. There is obvious sparsity in these data given the rare outcome and lack of dispersion in IVs. I ran 20 bivariate logistic models with Firth correction and alpha = 0.05/20. Most of the variables come up significant, though I believe I need to control for collinearity between variables since questions could be saying the same thing,

I am guessing I could look at VIF and Tol tests or a correlation matrix to better figure out their inter-relationship. I am getting ready to do this. Though, is there a Factor Analysis or something else that I can use to get a composite cluster or a way to parcel the 20 down into just a couple variables that are the most telling? I don't what to over fit the model, since I know the confidence intervals will easily get blown out of control.

Thanks.

#### spunky

##### Smelly poop man with doo doo pants.
If you're interested in reducing the dimension of your explanatory variables (20 IVs) then Principal Component regression (based off Principal Component Analysis) would be a better choice. That is unless you have some reason to believe the 20IVs are manifest variables relating some type of latent factor.

Remember, Factor Analysis assumes a statistical model that has a different purpose from Principal Components and, from the way you describe the problem, it seems like PCA is what you want.

I have not heard before about Principal Component regression within the context of logistic regression but it seems like a common-enough problem that someone surely has come up with something somewhere.

#### hlsmith

##### Omega Contributor
Item responses are definitely correlated. The items fall into 5 categorical themes already. Let me know what you think about the following. I was asked to find out which variables are related to the binary outcome, well most of them are, so I am planning to just run 5 stepwise regressions, one for each of the categories and tell it just to keep the best predictor. I was going to use this to then just report via the 5 final bivariate models, the best predictor for each category.

I will divide my level of significance by 25, to address FDR. What should I use in my stepwise as the criteria for selection (inclusion, I am thinking accuracy (c-statistic)?

This seems slightly less like a general fish excursion.

#### WyattTankersley

##### New Member
If you're interested in reducing the dimension of your explanatory variables (20 IVs) then Principal Component regression (based off Principal Component Analysis) would be a better choice. That is unless you have some reason to believe the 20IVs are manifest variables relating some type of latent factor.

Remember, Factor Analysis assumes a statistical model that has a different purpose from Principal Components and, from the way you describe the problem, it seems like PCA is what you want.

I have not heard before about Principal Component regression within the context of logistic regression but it seems like a common-enough problem that someone surely has come up with something somewhere.
agree.. offering a batter choice

#### Beth641

##### New Member
If you're interested in reducing the dimension of your explanatory variables (20 IVs) then Principal Component regression (based off Principal Component Analysis) would be a better choice. That is unless you have some reason to believe the 20IVs are manifest variables relating some type of latent factor.

Remember, Factor Analysis assumes a statistical model that has a different purpose from Principal Components and, from the way you describe the problem, it seems like PCA is what you want.

I have not heard before about Principal Component regression within the context of logistic regression but it seems like a common-enough problem that someone surely has come up with something somewhere.
i think the only choice..