Relative impact of an X on Y.

noetsi

Fortran must die
#1
This is probably the most important question I get asked to do, and I have never found a great answer for it (of course there may not be one). If you have multiple X (some of which will be dummy variables) and you want to know which has the greatest impact on Y controlling for the other X, how do you do this. For an interval Y and/or an ordinal/binary Y.

The later is particularly in dispute, I ended up using the wald statistic to determine this, but I would love to see the answers to this question generally. It seems a very basic question, but I have not seen it addressed very often in the literature I have run into.
 

rogojel

TS Contributor
#3
On the pragmatic side, if I have an acceptable prediction model then I would probably not need a separate heuristic for variable importance. If the real life question were, which factor to change by how much in order to achieve an expected result, I could use the model to figure this out, right?

I just built a simulation of a production line, ran a DoE on it and calculated the effect sizes for each parameter. That implicitly gave the impact of each variable. Would there be theoretical problems with this approach?

regards
 

noetsi

Fortran must die
#4
I come back to the question every year or so because its really important to my job and I never found an answer I was happy with.

This is a practical question. Those who make decisions want to know, which of these variables are the most important (and by how much).

From what I have read effect size is not the same thing as impact - one reason being that the scale is different between each predictor. If Y changes 1 for a one inch change in X1 how does that compare to Y changing 1 for a one pound change in X2 or a one unit change in a predictor dummy variable [which can only change one unit].
 

hlsmith

Omega Contributor
#5
The difficult thing with effect sizes is then selecting between two variables that have near comparable effect sizes and then looking at SE, where 1 variable could have a higher effect size but larger SE.


I would wonder if looking at changes in -2loglikelihood test could be the criteria?


P.S., it seems like the use of cross-validation could come into play here, like say in random forests. So how often do variables come up significant in bootstrap versions.
 

noetsi

Fortran must die
#6
it seems like the use of cross-validation could come into play here, like say in random forests. So how often do variables come up significant in bootstrap versions.
I am not sure what this means.

The SE will, sort of, tell you how likely the effect size really is what you think it is. Perhaps it would make more sense to compare the 95% percent range of the effect size, although I am not sure how that is done in practice. And this assumes effect size really determines relative impact, which I am not certain is accurate.

Spunky's article suggests what I thought was right, there is little agreement on how to measure this, there is disagreement that the ways to do so actually work, and regression generally was not created to look at relative impact.
 

rogojel

TS Contributor
#7
From what I have read effect size is not the same thing as impact - one reason being that the scale is different between each predictor. If Y changes 1 for a one inch change in X1 how does that compare to Y changing 1 for a one pound change in X2 or a one unit change in a predictor dummy variable [which can only change one unit].
hi,
if someone only wants to know what to change and by how much in order to achieve a given result, then the above question is not really ineteresting. I would not care whether the one inch change can or can not be compared to the 1 pound change as long as I can figure out that I need to increase X1 by a half an inch and decrease X2 by two pounds to achieve my goal.

regards
 

rogojel

TS Contributor
#8
P.S., it seems like the use of cross-validation could come into play here, like say in random forests. So how often do variables come up significant in bootstrap versions.
My guess is that you would need CV in order to develop a good predictive model. AFAIK it is not really interesting whether the variables are significant or not, and, for instance with random forests, you do not even have the option to calculate significance.

regards
 
#9
hi,
if someone only wants to know what to change and by how much in order to achieve a given result, then the above question is not really ineteresting. I would not care whether the one inch change can or can not be compared to the 1 pound change as long as I can figure out that I need to increase X1 by a half an inch and decrease X2 by two pounds to achieve my goal.

regards
I think the point is that people commonly conflate the idea of a larger coefficient magnitude meaning a more important impact on the DV, which isn't logical until you're also able to equate the units of measure between the independent variables. Then the conversation might get somewhere but will still have limitations.

The idea noesti mentioned is one I've heard from people who have a Ph.D. in Statistics and some with additional consulting experience, decades worth, so I think the argument is fair. Though your point may focus more on decision making to achieve a goal rather than describing relationships.
 

spunky

Smelly poop man with doo doo pants.
#10
Spunky's article suggests what I thought was right, there is little agreement on how to measure this, there is disagreement that the ways to do so actually work, and regression generally was not created to look at relative impact.
To that note, is there any particular reason as for why you wouldn't consider variance-decomposition type measures like the ones Grömping discusses? I mean, they address a few of the shortcomings you've mentioned. They have an unambiguous interpretation in dividing up the R-squared. They are all in the same standardized metric so you can make relative comparisons among them. They derive their meaning both from the role the predictors have by themselves and in conjunction with other predictors in the model. They come with their own theory (series of axioms) from which they are derived so you can read the axioms and see if you agree with them or not (I find them quite reasonable, to be honest).

So... you like them? Don't like them?
 

rogojel

TS Contributor
#11
The idea noesti mentioned is one I've heard from people who have a Ph.D. in Statistics and some with additional consulting experience, decades worth, so I think the argument is fair. Though your point may focus more on decision making to achieve a goal rather than describing relationships.
Do we have a clear definition of the importance of a variable, as different from effect size? I see the theoretical point - I just can't see myself (as a consultant) explaining that one variable is more "important" then the other even though one would use the first one for practical reasons.

E.g. in a chemical plant you might have the quantity of an additive and temperature as influencing factors. The model might give you that the additive is more "important" still you might want to use the reaction temperature to regulate the process.

regards
 

Miner

TS Contributor
#12
A few more thoughts from my two cents worth. You can look at which factor (IV) explains the most of the variation seen in the response (DV) using a measure such as eta^squared or similar measure. However, this is influenced by the range of the factor levels in the experiment/study. You can also look at the coefficients, which show how much leverage each factor has on the response, but is influenced by different units of measure. I have read that you can resolve this by standardizing the factors, but my needs have never required that I try this, so I cannot argue for or against that.

And, in my field, I completely agree with rogojel. Some factors have greater influence, but may be very difficult to adjust in a precise manner. So you use that factor to get your response in the general area then use another factor that has less influence, but can be controlled with high precision to zero the response in on the desired target.
 

noetsi

Fortran must die
#13
To that note, is there any particular reason as for why you wouldn't consider variance-decomposition type measures like the ones Grömping discusses? I mean, they address a few of the shortcomings you've mentioned. They have an unambiguous interpretation in dividing up the R-squared. They are all in the same standardized metric so you can make relative comparisons among them. They derive their meaning both from the role the predictors have by themselves and in conjunction with other predictors in the model. They come with their own theory (series of axioms) from which they are derived so you can read the axioms and see if you agree with them or not (I find them quite reasonable, to be honest).

So... you like them? Don't like them?
The reason is that I have not been able to get the article yet. Once I do, assuming I can understand them, I might like it a lot.

All I have been able to read so far is the abstract which is where my comments came from. I pretty much quoted the abstract.:p
 

noetsi

Fortran must die
#14
The reason that I have not stressed change in Y for a specific X [given different metrics for our variables] is the question I get asked is " which variable has greater impact" not, "if we change a given X this amount what impact will it have on Y." If the question was the latter I would just stress effect size, but its the former. As posters here know my statistical focus is entirely practitioner orientated, I am not smart enough to deal with the theory behind it. :p What this thread makes me realize, again, is that relative impact is a complex topic a point stressed by Dason when I first raised this issue 6 years ago!

I notice that no one commented on using standardized slopes to compare (which some suggest to deal with the different metric issue between variables). Does this reflect a lack of respect for those approaches in this issue? :)
 

hlsmith

Omega Contributor
#15
Yes, you can for linear reg use eta^2 values' test statistic to rank variables. It must be a field thing, in that in medicine we usually really care about ranking or at least knowing the general ranking of predictors. Thus the best predictor may be targeted for impacting the outcome of interest.


noetsi, I believe you now have access to the article!
 
#16
Do we have a clear definition of the importance of a variable, as different from effect size? I see the theoretical point - I just can't see myself (as a consultant) explaining that one variable is more "important" then the other even though one would use the first one for practical reasons.

E.g. in a chemical plant you might have the quantity of an additive and temperature as influencing factors. The model might give you that the additive is more "important" still you might want to use the reaction temperature to regulate the process.

regards
I definitely think a practical definition of the distinction is important, but I'd leave that to the person who is defining importance for his or her purposes. "Importance" is one of those vague terms that needs a firm definition. I think you're absolutely right that it's usually about a practical reason and that effect size doesn't necessarily fit the definition of importance.
 
#17
I notice that no one commented on using standardized slopes to compare (which some suggest to deal with the different metric issue between variables). Does this reflect a lack of respect for those approaches in this issue? :)
I did raise the standardization argument in another thread (in the short time I've been a member of the forum). I haven't the time to look for it right now, but I did mention it at that time :p
 

noetsi

Fortran must die
#18
I will look for it. Wacky things at work distracted me today [critical data we run analysis on turned out to be wrong and no one caught it either here or in other organizations who use it.