1)Test multicollinearity b4/after Stepwise Log Regress? 2)Final Equatn 3)Model Select

#1
Hi all,

I need to perform logistic regression for my study. Pretty new to it but have read up on it and have tried it on SPSS. I would like to seek the advice from all of you regarding it.

A brief description on my study: I want to see which factors (categorical and continuous) are significant in influencing the likelihood of occurrence of an event - bird colliding into buildings when they fly. As I'm not sure which are the more important factors, I've used Stepwise Logistic Regression to pinpoint the more significant factors. The two outcomes are collision or no collision.

1) Some of my factors are correlated to one another. But because I did stepwise, it happens that all the correlated ones are eliminated from the final model. Is it still legit to keep the model? Or do I have to check for multicollinearity before I perform Stepwise Logistic Regression and remove the correlated ones before even putting them to the test? (From what I gather, it seems like I could check the remaining factors after but just want to confirm it.)

2) I was expecting that I need to write a final equation like y = b0 +b1x1 +b2x2 +.... but I've been finding papers and I don't see anyone doing it. They merely report the statistically significant factors and the associated odd ratio etc. So we don't have to explicitly write out the model? If we do, can someone advise me how is it written? (Sorry, I think this is quite a noob qn!)

3) From the SPSS output I have, I could calculate sensitivity, specificity, positive predictive values and negative predictive values. I would like your opinion on which of these are more important in telling us how good a model is, and the pros and cons if we use one of them more than others.
(I know I could use AICc but that I think in SPSS, I could only use that in multinomial log regression?)

Thanks for all of your help!!
 

gianmarco

TS Contributor
#2
Re: 1)Test multicollinearity b4/after Stepwise Log Regress? 2)Final Equatn 3)Model Se

Hello,
I am not a statistician, so I will provide my two cents on the matter, waiting for stat guys to jump in and possibly correct me.
As for question n.1, it is quite interesting and I myself look forward to hearing from more experienced guys here...
As for question n.2, I think that writing or not the Logistic Regression equation depends on individual choice: as a matter of fact, as long as you provide the constant and the beta coefficients, anyone would be able to work out the LR equation.
As for question n.3, this is a HUGE topic and I think that you should read some literature: it would be easy to search for some references in Google Scholar or JSTOR. What I can say is that judging the performance of a model can be seen from two different standpoints: one is the performance in relation to the data you already have, i.e. the data on which your model is built. The second standpoint is the performance of the model in relation to 'future' observations. For the first case, maybe a good starting point is the analysis of the ROC curve (LINK). IN the second case, cross-validation should be used (LINK).

Hope this helps,
gm
 
#3
Re: 1)Test multicollinearity b4/after Stepwise Log Regress? 2)Final Equatn 3)Model Se

Question 1: You should not do stepwise at all. It is guaranteed to give wrong results. The p-values are too low, the parameter estimates are biased, the models are too complex and may not make any sense and it denies you the ability to use substantive knowledge. It is particularly egregious when there is collinearity as the variable eliminated is arbitrary. See my paper Stopping stepwise.

Question 2: In logistic regression, the odds ratios are more interpretable than the parameters of the equation you wrote out. The table summarizing the regression (and the text about it) should discuss the odds ratios. Whether you have to write the formula is really a matter of journal style, but it's uncommon to do so.

Question 3: This depends on your question and which error is worse. That is, is it worse to predict that a bird crashes when it doesn't or to predict that it doesn't crash when it does?
 
#4
Re: 1)Test multicollinearity b4/after Stepwise Log Regress? 2)Final Equatn 3)Model Se

Hi both,

Thanks so much for your inputs! :)

Regarding Question 2, I realised that I wanted to know how people write it because I was actually looking for answer to another not-so-related question I have. (Sorry! I wasn't clear about it myself when I asked. Hope you guys can help again.) Because the output in SPSS gives different p values and odd ratios for each of the category for categorical I.V. that has 3 or more categories, I want to know if the model only includes the category that is significant within the categorical I.V.

E.g. p of Building(1) = 0.3, exp(B) of Building(1) = .45; p of Building (2) = 0.01, exp(B) of Building(2) = 0.12 etc. There is a third building category that is used as reference category. Overall p is <0.05. but no B or exp(B) is given for overall I.V. (Building). In this case, does it mean only Building(2) is included in the model?

To gianmarco, thanks for the link! Very useful info. Will try to figure out how to use it for my study.

To PeterFlom, thanks for sharing your paper. Will have to think through it too. And a very valid point u raised for my Qn 3! Thanks!! Will think along the direction. :)
 
Last edited:
#5
Re: 1)Test multicollinearity b4/after Stepwise Log Regress? 2)Final Equatn 3)Model Se

The exact way to present findings depends on the journal. Many use the American Psychological Association style guide, which has sample tables for various types of statistical analysis.

My view is that you need to include the ORs for each level of any categorical variable that is in your model.
 

gianmarco

TS Contributor
#6
Re: 1)Test multicollinearity b4/after Stepwise Log Regress? 2)Final Equatn 3)Model Se

On the issue of stepwise Regression, I found this article interesting:
Austin PC, Tu J V. Statistical Bootstrap Methods Practice for Developing Predictive Models. Am Stat. 2004;58: 131–137
The methods has been implemented in R, but from an AIC standpoint (LINK).

As for assessing model performance (i.e., the first case I was referring to in my earlier reply), this book can be useful:
Hosmer DW, Lemeshow S, Sturdivant R. Applied Logistic Regression. Third. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 2000.

As for LR in SPSS, maybe Field's book on statistic and SPSS can prove useful (LINK).
 
#7
Re: 1)Test multicollinearity b4/after Stepwise Log Regress? 2)Final Equatn 3)Model Se

Hi both,

Thanks for your input! Thought I've replied but seems like it didn't send through.

PeterFlom, I think you are right. From this website, it does appear that we must add each and every level of the variable that is statistically significant. Look at the variable ses in this link (http://www.ats.ucla.edu/stat/spss/output/logistic.htm)
 
#8
Re: 1)Test multicollinearity b4/after Stepwise Log Regress? 2)Final Equatn 3)Model Se

You don't have to do anything. You don't have to include variables that are significant (suppose you had never included it?) and you don't have to delete ones that are insignificant (there are other reasons for keeping them in.

OK, I was wrong, there are things you have to do: You have to be able to defend your choices and you have to think.
 

Lazar

Phineas Packard
#9
Re: 1)Test multicollinearity b4/after Stepwise Log Regress? 2)Final Equatn 3)Model Se

Question 1: You should not do stepwise at all. It is guaranteed to give wrong results. The p-values are too low, the parameter estimates are biased, the models are too complex and may not make any sense and it denies you the ability to use substantive knowledge. It is particularly egregious when there is collinearity as the variable eliminated is arbitrary. See my paper Stopping stepwise.
This of course depends on purpose. Stepwise is fine when your goal is prediction rather than explanation. Indeed stepwise is similar in logic to a fair bit of machine learning (think Ridged, lasso, elasticnet regression). Trevor Hastie's "Elements of Statistical Learning" covers this nicely in the first couple of chapter's (3 in particular) http://statweb.stanford.edu/~tibs/ElemStatLearn/

So again it comes down to purpose. If the op want to explain WHY birds fly into a building then I agree that stepwise is a bad idea. However, if the aim is to PREDICT when a bird will fly into a building, for some practical purpose, then stepwise is a viable option.