Binary Logistic Regression Assumptions Testing

Hi there, I am having difficulties testing some of the assumptions while conducting a binary logistic regression for an exploratory study.

1. There are some research studies perform bivariate analysis and input only those variables that are significant into their model. My question is why perform a bivariate correlation rather than a chi-square analysis? I have some variables that were shown to have some predictive power in prior research, however, there is no significant relationship for these variables in my bivariate correlation table. Under such circumstance, if I go ahead and use it in my regression model, will that cause my model to be invalid?

2. Do you have any recommendation on step-by-step guide for binary logistic regression assumptions testing on SPSS?

3. What will be other options if my sample contain multicollinearity?

I appreciate your help and time. Thank you.



TS Contributor
I happened to use LR for research purposes. I have seen studies in which bivariate tests were performed in a preliminary step of LR: I am not happy with that. The fitted model will tell you which predictor is actually significantly contributing to the prediction of the positive outcome of the dependent variable. So, why bothering with those pairwise tests?! Do them if you have ink to waste:)

There is a couple of good articles on collinearity in the context of modelling: I could give you some reference, but currently I have no PC with me. Try and search on jstor. In any event, I remember reading about using 0.7 as threshold for multicollinearity. In case of categorical predictors, use Pearson's phi as correlation measure. Also, check the Variance Inflation Factor as well.

As for spss, I do not use it, so I cannot help.

Thank you, gianmarco. I am searching for the references on jstor. Hopefully I can find something for my study. If there is multicollinearity in my sample, what can I do? I appreciate your help. Thank you.

My two cents on multicollinearity:
Don't get stressed by it, yes it may be an issue, but I'm assuming you've selected variables that have shown to be important in previous studies and theories, and you should always be guided by theory and previous practice to an extent. If you have seem to have multicollinearity, but have a selection of variables and data that are theoretically and practically sound, closer considerations about weight given to multicollinearity should apply. Secondly, my rule of thumb is; a mean VIF of below 10 and an individual VIF of below 2 and you're allright.

Thirdly, and I wouldn't really recommend this, but you can try it under two conditions: 1) You have selected variables that are theoretically proven, and 2) you will be pressed if you do nothing about the aforementioned too high VIFs. --> Centering of variables
Take particular note of the second point, and also the article as a whole.

For the other questions I don't really understand what you're getting at in 1) and for 2) you should just do some minutes of googling and you'll be set.

Hope all goes well.

Last edited:


Omega Contributor
You can use whichever test you want when exploring your variables, but why would you not just use a simple logistic regression model?

Also, a limitation of just looking at all of the variables by themselves at first is that you run the risk of missing interaction terms. So keep that in mind as well.
Thank you, jbwettergreen. The biggest problem for this study is the exploratory in nature. Some of the variables are not well researched. I am looking forward to develop a new model. For assumptions testing, many different websites or books suggested different assumption testing. I could not find a consensus for the testing. Hence, I am wondering if there is any assumptions that I must test before proceed with the regression model.
Thank you, hlsmith again. I will keep that in mind. However, I am looking forward to simple model, probably not including interaction.


TS Contributor
Some literature about multicollinearity:

Dormann, C.F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J.R.G., Gruber, B., Lafourcade, B., Leitão, P.J., Münkemüller, T., Mcclean, C., Osborne, P.E., Reineking, B., Schröder, B., Skidmore, A.K., Zurell, D., Lautenbach, S., 2013. Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography (Cop.). 36, 027–046. doi:10.1111/j.1600-0587.2012.07348.x

Midi, H., Sarkara, S.K., Ranaa, S., 2010. Collinearity diagnostics of binary logistic regression model. J. Interdiscip. Math. 13, 253–267. doi:10.1080/09720502.2010.10700699

O’Brien, R.M., 2007. A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 41, 673–690. doi:10.1007/s11135-006-9018-6

Another thing which comes to my mind about modelling:
It could be worth assessing to what extent your model is able to generalize outside the training data (i.e., the data on which you build your model). I found interesting this article:
Arboretti Giancristofaro, R., Salmaso, L., 2003. Model performance analysis and model validation in logistic regression. Statistica 63, 375–396.

Part of the procedure described there has been implemented by me in R (

Hope this helps,