Relative impact

noetsi

Fortran must die
#1
I circle around again to what is in demand at work, but for which I have no read answer despite a lot of talking and reading. :p

I need to determine which variables have the greatest relative impact. Note that some suggest beta weights but others reject this for dummy variables or when you have variables that are not normally distributed (and I will always have some).

they suggest this
It is more sensible to estimate the change in Y when X is changed by an amount that is subject matter relevant. For binary predictors this is the change from 0 to 1.
For many continuous predictors the interquartile range is a reasonable default choice. If the .25 and .75 quantiles of X are g and h, linearity holds and the estimated coefficient of X is b; b X(h-g) is the effect of increasing X by h-q units which is a span that contains half of the sample of x.

Are they saying you compare the slopes of the dummy variables to the slope of an X for the IQR and which is relatively larger would have the greatest impact. I am not entirely sure how to set the X to the IQR to run those slopes. You just make every X equal to the IQR of the variable?
 

Miner

TS Contributor
#2
@noetsi The following is my take on it. I had to develop a simple managerial level explanation for an analysis of survey data that was bounded between 1 and 10. There are two ways that you can evaluate the influence of independent variables.

One way, as you mentioned, is the beta weights (coefficients). The other is the % contribution. They do not measure the same thing. The beta weights explain the slope of the line between the X and Y. The greater the weight, the greater the change in Y for a unit change in X. This is independent of the variation in X and Y that were actually seen in the data. % contribution brings the variation seen in the data into the picture, so Xs with the highest % contribution help explain the spread (i.e., tails) in the Y distribution.

In my analysis, I explained that if you want to increase the average survey response (all customers) focus on the highest weights, but if you want to increase the lowest survey response (most dissatisfied customers), focus on the highest % contribution.
 

noetsi

Fortran must die
#3
Miner this is why I am reluctant to use beta weights. It comes from "Regression Modeling Strategies" by Harrell

He argues that the use of beta weights is questionable when using dummy variables. Or for non-normally distributed variables. We have many of both in our analysis. His suggested alternative is...

"It I more sensible to estimate the change in Y when is changed by an amount that is subject matter relevant. For binary predictors this is a change from 0 to 1. For many continuous predictors the IQR is a reasonable default change. In this case you are interested in X by the IQR which is the span that contains half of the data." They have to be linearly related to do this.

I am not entirely sure what his alternative means, the difference in the continuous predictor from when it is at the .25 and .75 quantile? There are no more details provided.

You might be interested in this, it largely confused me :)
https://stats.stackexchange.com/questions/202277/what-are-variable-importance-rankings-useful-for