Getting there with logistic regression - some (hopefully!) simple questions.

tamsynann

New Member
Hi all,
Following my post the other day I am now at the writing up stage of my hierarchical logistic regression.

I inputted 5 variables across 3 blocks, with the 3rd block becoming non-significant.

3 questions really:

1. How do I acknowledge the 3rd block was non-significant in my write up. Do I just say that it was non-significant therefore results reported are for block 2? Or do I need to report the non-significant results and then move on to block 2?

So far I am saying something along the lines of:
"The full model containing all predictors was non-significant X^2 (6, N = 458) = 35.16, p= .179. However the model containing the predictors of smoking, race and mothers weight was statistically significant X^2 (4, N = 458) = 31.72, p = .000"

2. When reporting for block 2 - do I report the block significance (.003) or the overall model (.000). I'm thinking overall model? Because I'm concerned with all the variables up until this point as opposed to the addition of the block itself? (Trying to show that I really have thought about this as much as I can!)

3) Do I include all the variables in the table presenting my logistic regression (i.e. all those in the table in the block 3 output) - or just those variables in block 2?

Many thanks in advance for any help or assistance you might be able to provide.

EDIT: I've just realised. That in block 3 - the step/block significance is .179. But the model .000 is still significant. Am I doing this wrong? My reasoning is that I want to try and keep it as simple as possible, thus have as few predictors explaining the biggest amount of variance. Between block 2 and block 3 there is no significant increase in variance, however the model overall is still significant - so should I be reporting it? Got myself confused again!

Last edited:

noetsi

No cake for spunky
I have not seen hiearchical regression written up, but I would guess based on other methods you would have a table showing all the variables including signficance and then have a brief text comment where you point out the variables in the third block were not significant and what this means for theory if you know. I doubt most readers are going to care if a block is signficant. They want to know the model results and the individual Odds ratios and signficance test of individual variables. In truth if the model is signficant I move right on to the indivdual results.

Your table should include all variables that are in your model, signficant or not. The real question is whether you should remove non-signficant variables and rerun the model. I have never seen anyone come down firmly on this one way or the other. However, tables I have seen commonly have variables in them that are not signficant. Generally speaking parsimony is desirable, but what this means in practice is not clear.

tamsynann

New Member
Thank You very much for your reply! I'm not sure whether I made myself clear or whether I am misunderstanding you.

"In truth if the model is signficant I move right on to the indivdual results."

So would you be saying that because the model is significant in block 3 you would report the individual (i.e. results from there? Even though the contribution of the variables in that block add nothing significant to the overall variance and model?

Sorry if I am sounding really stupid - I don't do well with statistics at all!

(I've attached my output which will hopefully make more sense than how I am trying to explain it)

noetsi

No cake for spunky
I would report whatever the final model you run is (with whatever variables are in the final model). It is not clear to me if the variables in hiearchical regression that are found to not be signficant are actually in the model (as they would be if as is more common you added them all in at one time) or not. I think most readers want to know not only what was signficant, but what is not and it is common to have variables reported that are not significant. After all you are presumably testing variables that make theoretical sense (perhaps based on past research) and finding that these variables don't contribute to the model's ability to predict is an important.

You chose a more difficult form of adding variables to a model than I usually do and I have run regression on and off for a very long time.

tamsynann

New Member
That's really helpful - thank you! I've now come to the point of making conclusions about the efficacy of my model and am a little stumped!

The model identifies 3 significant predictors of low birth weight and correctly classified 69% of cases.

The -2LL decreased from block 0 to the overall model thus suggesting the model explains more variance than the constant only model.

The overall model only explains between 7.4% (Cox and Snell R square) and 10.4% (Nagelkerke R squared) of the variance in low birth weight.

The Hosmer and Lemeshow test was insignificant.

What conclusions can I draw from this? At the moment I am thinking that the model fits the data well, however does not explain a great deal of variance of low birth weight. If applying to the real world situation the model correctly classifys 69% overall, but only 19.6% of the low birth weights (<2500g) which is probably the most important aspect - as this is what we really want to predict in a real world situation?

I also need to think of ways in which the model could be improved through future projects - I'm really not sure how I would go about this.

noetsi

No cake for spunky
Was the -2LL model signficant? I think so, based on your comments, but that is the starting point. If not your model has no predictive value and you should not interpret the individual slopes. Your hosmer-lemeshow statistic is good, the model fits the data well.

You have to be careful about the pseudo R squares. First, they don't show explained variance at all (what they do show is not really well defined). They were created because people wanted an equivalent of R square, but they really are not. Some suggest they are generally lower than linear R square and again what they actually mean is not obvious substantively or theoretically.

I have not worked with the classify right/wrong so I can't comment on that. What were classify right/wrong produced by other researchers in this field? That allows one to make a better judgement of how useful your analysis is. The use of ROC curves is growing in popularity to evaluate models and you might want to look at that (again I am new to that so I won't comment on it).

Once the model is signficant (the -2LL) and HL test not signficant I focus on the odds ratios of the indivdual variables rather than the overall model value. I think that is the common approach, although that does not make it right of course. The best way to quickly improve your model often is too look at what other researchers have done. Considering interaction effects also may be of value.