- Thread starter noetsi
- Start date

A question I get asked a lot is, if we have these three predictors of Y, which of the 3 has the most, next most and least impact. I have tried various ways and never come up with an approach I am really happy with.

I need to do this for both interval and binary DV.

I need to do this for both interval and binary DV.

The classic slope interpretation would be: For every 1 unit increase in X(n), we expect Y to increase/decrease by |beta(n)|, holding all else constant.

The issue arises because you can't easily say that increasing mileage by 1 mile is equivalent to a 1 person increase in previous owners. The units are different, so it doesn't really make sense to say which has the "most impact" on the DV. Sure, one may elicit a larger change in the DV, but that comes from a given change in X(n), which might not be equal to that same change in another X variable.

I think one (partial) solution is to standardize (at least) the predictors. This way, you can say that a 1 SD change in X1 causes a larger change in Y than a 1 SD change in X2, but again, the standard deviations have units of measure, so it's not a perfect solution, but it does help in a small way (I think, anyway, because it puts these 1 unit increases on a scale of "statistical un-usualness" within their respective distributions).

Thoughts?

Using impact on R squared is an interesting idea although obviously it does not work with categorical DV. I have used for categorical DV, based on suggestions here years ago, the magnitude of the Wald value each predictor has to rank impact. SAS does something very similar with one of its inherent functions for binary DV.

I don't know much about LASSO although I will look into it. I don't understand what this means (what do you do to do this)?

If you have a sufficient amount of data, you can run cross-validation and see how good variables perform in other subsamples

I think you can still use partial R^2 with categorical variables. It would be intuitive with binary variables, though with more groups you would just have to make sure you mention what the reference groups is when explaining.

Using impact on R squared is an interesting idea although obviously it does not work with categorical DV.

The 'relaimpo' package in R executes all these R-squared partition measures and a few more. But I don't like the other ones

I am going to look those approaches up spunky, I know neither. My comment on R square is that there is no generally accepted pseudo R square for logistic models, last time I looked there were like 33 of them which differed significantly from each other

I do not quite remember all of them but a big one for me as that it requires you to assume that the binary observed variable arose from the discretization of a continuous, latent variable.

BTW when you get your PHD are you going to continue to be humble or become an arrogant jerk ...