# Thread: Multicollinearity problem in binary logistic regression

1. ## Multicollinearity problem in binary logistic regression

I'd like to ask for some help with a binary logistic regression. In SPSS I am trying to build a binary logistic regression with 4 independent continuous variables (Sample size - 85).

I have a dichotomous dependent variable (a clinical form of multiple sclerosis) and quite a few independent variables that are quite good predictors of the dependent variable individually (individually I have 10 variables with AOC > 0,8 and all of them show to be significant if I build a binary logistic regression with only one variable).

I want to build a regression model with 4 variables that display the best AOC values if taken individually. I would like it to include the 4 variables, because a model with more variables displays a bigger AOC value and should be better in predicting the outcome (clinical form of MS). However, if I add all these 4 variables into the equation, most of their p values and confidence intervals show them to be not significant. I am pretty sure this is a multicollinearity issue as the values change significantly if I remove one or a few of the variables (event though the VIF values are not more than ~3) . The biggest number of variables with which all of the variables in the equation are shown to be significant for the model is two.

Therefore my question is it possible to build a model with more than two independent variables in this case and somehow overcome the multicollinearity issue or should I stick to only two.

2. ## Re: Multicollinearity problem in binary logistic regression

Originally Posted by kranas
I am pretty sure this is a multicollinearity issue as the values change significantly if I remove one or a few of the variables (event though the VIF values are not more than ~3)
Did you actually check for multicollinearity (i.e., calculating pairwise correlation coeff among your predictors)? From what you wrote, it seems that you are just guessing the presence of highly correlated predictors.

3. ## Re: Multicollinearity problem in binary logistic regression

I'm guessing it should be because of multicollinearity, because individually (or with max two variables) the variables display significant results in the equation, but after adding the 4 variables into one binary logistic regression in SPSS the p values and confidence intervals rise significantly.
Here are regression results with all 4 variables that display the biggest AUC values individually and the correlation matrix:

Could this be because of some other issue instead of multicollinearity?

 Tweet