Comparing coefficients in Logistic Regression

gianmarco

TS Contributor
#1
Hello !
I would like to have guidance on what follows.

I was fitting a Logistic Regression model, and I got the coefficients for the model's significant predictors.
Now, for the sake of a report I ma writing, is it correct/sound to rank the coefficients (i.e., ordering them from greatest to smallest) to provide an idea of their relative contribution to the prediction of the outcome of the (binary) dependent variable?

It is my understanding that would not be a viable option since coefficients could refer to variables measured by different scales (as indeed happens in my model). So, I am wondering if it would be sound to standardize them? Is it a viable strategy? On the other hand, I have also read that the interpretation of the standardized coefficient is not much straightforward....

If the latter strategy would be not viable, could the 'percentage change' be put to work instead. As for the percentage change, I got it from Allison's book on Logistic Regression in SAS. It can be calculated from the Odd Ratio of each coefficient: (OR-1)*100. This would indicate the percentage of change in the odds for the positive outcome of the dependent variable for each 1-unit increase in the independent variable. May be that ordering significant predictors by percentage change would make more sense in the context of coefficients comparison.

Cheers
Gm
 

hlsmith

Omega Contributor
#2
Yeah, this isn't an easy task and one I have not resolved myself. What you are missing is that one covariate may have a bigger effect, but also a giant standard error. You would think hey, standandize - but as you mentioned this process is highly debated in logistic regression as to its interpretation.

I think the percentage change seems like a good idea, unless others have suggestions.

P.S., How would you handle categorical variables in comparison to continuous, given your above description of percentage change?
 

noetsi

Fortran must die
#3
I spent a lot of time studying this because we do satisfaction surveys and want to rank variables impact on it. :p Using coefficients, odds ratios etc is not a good idea of getting at relative impact. Dozens of standardized coefficients have been created to accomplish this purpose (SAS has one of them built in) which serve the same function as beta weights in linear regression. Unfortunately there are major differences in these coefficients and there appears to be significant disagreement on which to use. After discussions here I chose to use the Wald statistic associated with each parameter (the higher the more important).

Scott Richter wrote an extensive article on the standardized coefficients in logistic regression with his recommendations. I will try to find it and send you the link.
 

gianmarco

TS Contributor
#6
P.S., How would you handle categorical variables in comparison to continuous, given your above description of percentage change?
I won't, because I do not have categorical predictors :)

Anyway, thank Noetsi, Jake, and Spunky for your comments. I am looking into dominance analysis, but I guess I will not have the time to get my hear around it for the time being...too much pressure in preparing a paper for a presentation, so percentage change will suffice for now.

As for Noetsi suggestion, while I grasp the 'meaning' of Allison's percentage change, I am wondering what the Wald statistic is actually communicating. I understand that it should be equal to the square of the Betas divided by the square of their Standard Errors...

Cheers
Gm
 

spunky

Super Moderator
#7
I'm pretty sure budescu and azen (the people involved in creating dominance analysis) have freely available excel and SAS Macros that do this. Or you can download the relaimpo package from R. It's been a few years so I can't quite remember whether or not it does logistic regression. But I'm pretty sure there are SAS macros.
With that being said I resent the fact that you didn't consider using the Pratt index :p
 

noetsi

Fortran must die
#8
I won't, because I do not have categorical predictors :)

Anyway, thank Noetsi, Jake, and Spunky for your comments. I am looking into dominance analysis, but I guess I will not have the time to get my hear around it for the time being...too much pressure in preparing a paper for a presentation, so percentage change will suffice for now.

As for Noetsi suggestion, while I grasp the 'meaning' of Allison's percentage change, I am wondering what the Wald statistic is actually communicating. I understand that it should be equal to the square of the Betas divided by the square of their Standard Errors...

Cheers
Gm
To be fair here it is not my suggestion (I am not that clever). Either Dason or Jake suggested it to me although they both expressed doubts about the value of standardizing the variables for relative contribution period. But since I needed to do it, this was the approach that seemed best. I never asked why substantively it worked, I honestly never though about it till now:p
 
#9
I'm pretty sure budescu and azen (the people involved in creating dominance analysis) have freely available excel and SAS Macros that do this. Or you can download the relaimpo package from R. It's been a few years so I can't quite remember whether or not it does logistic regression. But I'm pretty sure there are SAS macros.
With that being said I resent the fact that you didn't consider using the Pratt index :p
I frequently use the relaimpo package in R, and although it is easy to use, it's not based on logistic regression. You can read more about it here: http://cran.r-project.org/web/packages/relaimpo/relaimpo.pdf
 

noetsi

Fortran must die
#11
I went back to the tome I created to help me deal with logistic regression issues :p

This is a sage monograph now dated. Page 52-56 deals with standardized logistic regression coefficients.

http://books.google.com/books?id=EA... impact variables logistic regression&f=false

A conference paper I have, but do not have a link might also be of value (I am sure I found it online): "Standardized Coefficients in Logistic Regression" by Jason King from Baylor University circa 2007

This is an excellent article that I once had but no longer have access to. You might look it up.

http://sf.oxfordjournals.org/content/89/4/1409.abstract

I believe this is the lead in to that article..


There is little consensus on how best to rank predictors in logistic regression. This paper describes and illustrates six possible methods for ranking predictors: 1) standardized coefficients, 2) p‐values of Wald chi‐square statistics, 3) a pseudo partial correlation metric for logistic regression, 4) adequacy, 5) c‐statistics, and 6) information values. There are many other ways, these were chosen because the author used them or saw others do so this way.
Another view...

Another solution might be to report the Wald statistics or R-values from logistic regression. They're also scale independent measures that indicate strength and direction of an effect, and have the advantage that they're available for categorical variables as well. On the downside, they're conservative estimates, they tend to be a little lower than the actual likelihood ratio for an effect, and this is stronger as effects are larger.
https://groups.google.com/forum/?fromgroups#!topic/comp.soft-sys.stat.spss/W4Ri--ySjN8

These are my own comments based on readings so take them with a large grain of salt...

There is little agreement on how to compare the relative impact of variables in logistic regression or even if you should. The unstandardized parameters and odds ratios can not be directly compared due to differing variation and scale issues. The fact that variation matters here is another reason not to estimate results with different sample sizes for specific questions.
and purely for amusement [you had to be involved in the chat discussion to know how little enthusiaism Jake and Dason actually had for my question or my understanding of the issues at hand] :p

Jake suggested (with limited enthusiasm) the following. Use the higher Wald statistic to show which has more impact although he and Dason actually thought bivariate comparisons made more sense (for reasons that I don’t understand).
And probably never will....
 
Last edited:

spunky

Super Moderator
#12
I will be glad to consider it, provided that you put me on the right track elaborating a bit more on that and/or providing further links. :)
well, for the sake of simplicity on your part, you're probably better off working with budescu & azen's dominance analysis. mostly because there is a "plug-and-use" readily available SAS macro for your to use. i thought the relaimpo package had been extended to logistic regression as well already, but it Injektilo correctly mentioned it only works on multiple regression right now.

the relevant article for this is here. it's basically extending the concept of Pratt measure for relative "importance" of variables from standard OLS multiple regression to logistic regression. the Pratt measure is basically multiplying the correlation coefficient between one specific predictor and the criterion variable times the R-squared from the regression equation of your model. if you do that with all your predictors and add up all the numbers you'll notice that they add up to the model R-squared so they are considered measures of 'importance' of each predictor since they tell you how much variance can be accounted for by every covariate (kind of like the squared semi partial correlation but better).

for logsitic regression to work out, an extension based on weight least squares had to be done to the R-squared measure. this is probably more math than you have the time to look over so you'll maybe consider going over it when you have more time? i'm kind of just throwing it out there though because my advisor was one of the inventors of that measure, so i kinda have to market it :D
 

gianmarco

TS Contributor
#14
I spent a lot of time studying this because we do satisfaction surveys and want to rank variables impact on it. :p Using coefficients, odds ratios etc is not a good idea of getting at relative impact. Dozens of standardized coefficients have been created to accomplish this purpose (SAS has one of them built in) which serve the same function as beta weights in linear regression. Unfortunately there are major differences in these coefficients and there appears to be significant disagreement on which to use. After discussions here I chose to use the Wald statistic associated with each parameter (the higher the more important).

Scott Richter wrote an extensive article on the standardized coefficients in logistic regression with his recommendations. I will try to find it and send you the link.
Resuming an old thread just to ask how in R I can get the Wald statistics from the glm() summary output.

gm
 

Jake

Cookie Scientist
#15
The Wald statistics are given as part of the default output, as Wald z-statistics. If you want, you can square them and then they are Wald chi-squares.