Collinearity in Logistic Regression


TS Contributor
If my independent variables are correlated in Linear Regression, I can first, check it using VIF, and solve it (PLS, other ways).

What do I do if I have a logistic model ? How do I check collinearity and how do I solve it if it exists ?

thank you !
You can check for collinearity for logistic regression the same way as you would for linear regression i.e. just run a linear regression with the same predictors and dependant you are using for the logistic model. You are just running it to get the collinearity stats and then interpret these the same way.

Others may be able to comment more on this, but some suggestions for solving it (each with their own issues); is remove one of the colinear measures (obviously not ideal!), just leave it in and comment on it in your report as an issue, or you could factor analyse the collinear measures to get a factor score for them and use the factor score.

Hope this helps :)


TS Contributor
I was thinking about factor analysis or principal components, but then I can do that only when I have many independent variables, if I only have a few I might get stuck with 1 or 2 factors.
I agree that principal components would be the way to go if you were going to try this (whilst examining as lumhearts said how correlated the variables actually are to see if it is a major problem). I have not done principal components with only two variables (so others may be able to comment on the validity of this) but I don't see why you couldn't just put them in ask for only 1 component? Especially if you just wanted to get a component score for a couple of variables that you know are highly collinear. This would create a single standardised score just for the two variables.


TS Contributor
so you are saying that I could do a principal component analysis, get t factors and use them as independent variables in a logistic regression model ?
sounds interesting, the only issue will be to make interpretation of the results, it won't be easy...


TS Contributor
I got some information, maybe you could give me the advice now :)

I ran correlation check between my variables, some of them ARE correlated, for example I got a pair with r=0.6, and some pairs with r=0.45 or near that.
Then I ran a linear model just to calculate the VIF, and I was surprised to find out that the highest VIF was 2.38...not 5, not even 3 !!

what would you do ?


Super Moderator
That does seem to make sense - as I understand it, collinearity is more of a problem in the context of highly correlated IV's (the fact that they're correlated isn't in of itself necessarily a massive drama). Those VIF's sound pretty low, perhaps you can just go ahead with your LR as per normal.


TS Contributor
I did that, started with simple models, and out of 8 IV's only 3 were significant. Then I put all 3 in 1 model, and none were significant ! Of course when I put all 8 none are significant. Maybe the IV's are simply not connected to the dependent, or maybe it's because of the inner correlation between them ?
Will it be OK to have a final model containing 1 or 2 IV's out of the 8 ? (otherwise there will be no relations at all).