Binary Logistic Regression Classification table

#1
Hi there!

I am conducting a binary logistic regression however, I have run into an issue. In my output there does not appear to be any difference in my classification table between the null model and the model with variables. Would anyone be able to explain why this may be happening, what it means and how I can get around it.

Kind regards,
 
Last edited:
#5
There seems to be many variables that are significant, so spss does take account of them. But maybe you think of the classification table? Maybe change the "cut value"?

By the way, don't use stepwise regression.
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
I am not used to these outputs, but it seems like there are 14000 outcomes and the model is predicting everyone into the other group. It might be interesting to look at the predicted probabilities and the calibration plot for the model. Doing this may help to understand the model and whether @GretaGarbo suggestion of changing the 'cut value' from 0.5 to something lower may be an option.
 
#7
@GretaGarbo Thank you for the reply! I'm not very confident with logistic regression so would you be able to explain why I shouldn't use stepwise regression and what would be more appropriate? I can change the 'cut value' however, I am unsure of what would be a more appropriate value for my data. Is there any other thing I could do that would indicate what would be suitable? Kind regards
 
#8
Someone said: "stepwise is unwise". I think that it is better that you think for yourself about what model is appropriate (that is what the rest of the science is doing) instead of letting an arbitrarily algorithm decide what variables to use when it is throwing in and out variables.

When you have decided about the model then you simply estimate that model. And you can think and re-estimate. It is actually allowed to think and use judgement.
 

obh

Active Member
#9
Someone said: "stepwise is unwise". I think that it is better that you think for yourself about what model is appropriate (that is what the rest of the science is doing) instead of letting an arbitrarily algorithm decide what variables to use when it is throwing in and out variables.

When you have decided about the model then you simply estimate that model. And you can think and re-estimate. It is actually allowed to think and use judgement.
I wouldn't mind letting an arbitrary algorithm decide/suggest what variables to use, if it would support a good suggestion :)

I assume the main problems of this unwise method:
1. Multiple tests, increase the possibility of random "significant result". each test for another variable is another possible random mistake.
2. Potentially remove variable due to multicollinearity, in this case, the predicated Y may be okay but you may remove an important IV
3. Different orders of steps may influence the result.
 

hlsmith

Less is more. Stay pure. Stay poor.
#10
There can also be differences in practice given a person's intent. If you solely want prediction, things do end up getting a done differently. However, if you are attempt to define the relationships (estimate) between variables, more rigor is typically needed. For example, knowing a mutual effect of a predictor and the outcome can help predict the outcome better. However, this may be an issue in explaining relationships of the variable of interest and target outcome since the mutual effect variable happens after the variable of interest. A model is not going to know your agenda and can't call you out for using inappropriate terms in your model.

Secondarily, sometimes it is a good idea to control for certain variables in the model that are known to impact the outcome. However, given sampling variability the variable may fall out of a binary "significant" y/n rule and get excluded from the model. Automated processes don't know this information either.