+ Reply to Thread
Results 1 to 10 of 10

Thread: Relative impact

  1. #1
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Relative impact




    I circle back to this question every few years, it is becoming critical these days and my readings have not turned up a strong consensus on this topic or even agreement it can be done in the context of regression.

    I have an interval DV (income gains). My IV are either dummy or interval variables although the most important analysis have the predictors being all dummy variables and that is what I am most interested in. I am trying to determine which variable has the greatest impact on the dv, that is on income gain. That is I am not interested in predicting Y or determining if a variable is significant, I want to know of 5 variables which has the greatest impact, the next greatest impact etc on the DV.

    The way I decided to do it was to rank the dummy predictor variables as most important depending on which had the largest slopes (which in this case means they cause the largest change in income).

    I would deeply appreciate any comments on whether that is a valid way to measure relative impact in a multivariate model.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  2. #2
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Relative impact

    Hello!

    Although I am not an expert, let me try to contribute to your quest. A while ago I was at a doctoral consortium that covered a very brief introduction to machine learning models. One of the applications that it was discussed for is a case when a researcher has many (i.e., hundreds) IVs and he/she wants to determine which of them are the most influential in relationship to a given DV. Based on the obtained results, one could figure out which IVs (out of hundreds) should be retained for future modeling and which should be dropped because of "weak influence".

    If this sounds plausible for your task, I can go through my archive and try to find the notes for that lecture (should have jotted down a few references).

  3. The Following User Says Thank You to kiton For This Useful Post:

    noetsi (06-21-2016)

  4. #3
    TS Contributor
    Points: 14,811, Level: 78
    Level completed: 91%, Points required for next Level: 39
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,171
    Thanks
    34
    Thanked 405 Times in 363 Posts

    Re: Relative impact

    There are two different ways of looking at this. The one that I recommend is whichever is most relevant to your specific needs. That is, the correct answer is context sensitive.
    • The first option would be to use the standardized coefficients. That shows which independent variable moves the dependent variable needle most for each unit change in the IV.
    • The second would be to use a metric such as epsilon^2. These metrics quantify how much of the DV variation is explained by each IV.
    Again, your interpretation of each and the corresponding usefulness of each is context sensitive.

  5. The Following User Says Thank You to Miner For This Useful Post:

    noetsi (06-21-2016)

  6. #4
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Relative impact

    Miner I thought about using standardized coefficients except I have read they are not recommended for dummy variables (and in this case all my variables are dummies). But I will run it and see if it changes the order from what I did.

    I am not familiar with eta squared. I will look that up.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  7. #5
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Relative impact

    Quote Originally Posted by kiton View Post
    Hello!

    Although I am not an expert, let me try to contribute to your quest. A while ago I was at a doctoral consortium that covered a very brief introduction to machine learning models. One of the applications that it was discussed for is a case when a researcher has many (i.e., hundreds) IVs and he/she wants to determine which of them are the most influential in relationship to a given DV. Based on the obtained results, one could figure out which IVs (out of hundreds) should be retained for future modeling and which should be dropped because of "weak influence".

    If this sounds plausible for your task, I can go through my archive and try to find the notes for that lecture (should have jotted down a few references).
    I would be interested in that. I think I am going to try multiple approaches and see if that changes the results.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  8. #6
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Relative impact

    The method I had in mind is LASSO (least absolute shrinkage and selection operator) regression. Those interested, PM me and I'll share a great presentation on the topic.

    noetsi, please check a PM from me.

  9. The Following User Says Thank You to kiton For This Useful Post:

    noetsi (06-21-2016)

  10. #7
    TS Contributor
    Points: 22,448, Level: 93
    Level completed: 10%, Points required for next Level: 902
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Relative impact

    Hello noetsi! Sorry I didn’t have the chance to respond to this thread before. Thing is measures of relative importance were kind of a “thing” we used to study in my lab back when I started my MA. I got a little bit tired with the subject because I feel like it really brings forward more questions than answers, but I guess this is maybe something that could help you? Anyway, it’s all in R (sorry!) but I think a few of these ideas have been implemented in SAS or you can calculate (laboriously!) by hand in a few cases.
    Anyway, let’s use R to generate some data:

    Code: 
    library(MASS)
    library(relaimpo)
    
    S <- matrix(c(1.0, 0.5, 0.5, 0.0, 0.0,
                  0.5, 1.0, 0.3, 0.3, 0.3,
                  0.5, 0.3, 1.0, 0.3, 0.3,
                  0.0, 0.3, 0.3, 1.0, 0.3,
                  0.0, 0.3, 0.3, 0.3, 1.0),5,5)
    
    datam <- as.data.frame(mvrnorm(200, c(0,0,0,0,0), S))
    
    datam$V3 <- as.factor(ifelse(datam$V3>=mean(datam$V3), 1, 0))
    datam$V5 <- as.factor(ifelse(datam$V5>=mean(datam$V5), 1, 0))
    
    colnames(datam) <- c("y","x1","x2","a","b")
    Nothing very important here. Notice that variables x1 and x2 have some degree of correlation (r=0.5) with the dependent variable y. Variables a and b, however, have a correlation of 0 with y *BUT* they have a non-zero correlation with x1 and x2 (r=0.3) so they might somehow impact the relationship x1 and x2 have with y. I made x2 and b factors with 2 levels in dummy code. Now, I KNOW this is not how you’re supposed to generate factors for a linear model but bear with me here, this is just for an example.

    You can fit the model and look a little bit at how it looks like:

    Code: 
    > mod1 <- lm(y~x1 + x2 + a + b, data=datam)
    > summary(mod1)
    
    Call:
    lm(formula = y ~ x1 + x2 + a + b, data = datam)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -1.8060 -0.4714  0.0705  0.4484  2.2025 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept) -0.10048    0.08754  -1.148 0.252421    
    x1           0.63551    0.05747  11.058  < 2e-16 ***
    x21          0.80964    0.10735   7.542 1.71e-12 ***
    a           -0.24526    0.05640  -4.349 2.20e-05 ***
    b1          -0.39230    0.10620  -3.694 0.000287 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.7234 on 195 degrees of freedom
    Multiple R-squared:  0.5217,    Adjusted R-squared:  0.5119 
    F-statistic: 53.18 on 4 and 195 DF,  p-value: < 2.2e-16
    So yeah, it looks good and everything. Coefficients are significant, everything looks pretty right? Well now you load this package called ‘relaimpo’ and do this:

    Code: 
    > calc.relimp(mod1, rela=TRUE, type=c("lmg","pratt"))
    Response variable: y 
    Total response variance: 1.072211 
    Analysis based on 200 observations 
    
    4 Regressors: 
    x1 x2 a b 
    Proportion of variance explained by model: 52.17%
    Metrics are normalized to sum to 100% (rela=TRUE). 
    
    Relative importance metrics: 
    
              lmg       pratt
    x1 0.59972616 0.640443155
    x2 0.31616532 0.330426166
    a  0.04506672 0.007039895
    b  0.03904180 0.022090783
    
    Average coefficients for different model sizes: 
    
                1X        2Xs        3Xs        4Xs
    x1  0.62317792  0.6277305  0.6317808  0.6355053
    x2  0.90872043  0.8663028  0.8341169  0.8096434
    a  -0.01698384 -0.1129560 -0.1869412 -0.2452618
    b  -0.12542072 -0.2576869 -0.3388214 -0.3923031
    The stuff you should be looking at is where it reads “LMG” (stands for Lindeman, Merenda, and Gold) and Pratt, because that gives you a breakdown of the contribution that each variable gives to the model R-squared. So, under the LMG metric, (approx.) 60% of the R-squared is contributed by variable x1, 32% of the R-squared is contributed by variable x2, 5% of the R-squared is contributed by variable a and 3% of the R-squared is contributed by variable b. The total R-squared of the model 52.17% and that 52.17% can be further subdivided per variable. You can see that the Pratt metric gives a similar ordering, albeit with different (but close) relative contributions. In any case, you would say variables x1 and x2 are carrying the weight of the predictive power of the model.

    Now, how did the LMG and Pratt measures came to be? Well, that’s where things get a little tricky (and the main reason of why I abandoned this line of work) because each defines “importance” as a way in which you can partition the R2 and attribute each contribution to each variable. And those definitions do not necessarily overlap all the time. For instance, the core of the LMG approach is to calculate all-possible subsets regression (so y = b1x1 , y= b1x1 + b2x2 , y = b1x1 + b2a , y = b1 + b2b, y = b1x1 + b2x2 + b3a, …) gets the R-squared in each case and does stuff to make sure it can average how much R-squared each variable brings to each subset of the model. The Pratt metric takes a much more elegant approach (in my opinion) and uses the geometry of linear models to transect the length of the y-hat vector by projecting the vector lengths of each variable onto that subspace. For a more detailed description, the author of the relaimpo package (Ulrike Gromping) has a nice overview of measures of variable importance in linear regression:

    https://prof.beuth-hochschule.de/fil...t07mayp139.pdf

    I believe that in SAS, the LMG approach is called “Dominance Analysis” championed by Budescu & Azen. The Pratt approach is championed by Zumbo (my advisor) & Thomas. Each measures has its pros and cons and has been extended to other models (logistic regression, multilevel models, etc.).

    Anyhoo, those are my 2 cents for your thread. Have fun!
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  11. The Following 4 Users Say Thank You to spunky For This Useful Post:

    gianmarco (06-25-2016), GretaGarbo (06-26-2016), noetsi (06-25-2016), rogojel (06-25-2016)

  12. #8
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Relative impact

    how does this related to eta squared Spunky?

    This is the first I have heard of this. I will have to explore it some.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  13. #9
    TS Contributor
    Points: 22,448, Level: 93
    Level completed: 10%, Points required for next Level: 902
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Relative impact

    Quote Originally Posted by noetsi View Post
    how does this related to eta squared Spunky?
    well, eta-squared is the analog of R-squared for the case of ANOVA. being ANOVA a type of regression you can also use the variable importance metrics I pointed out to see how much each factor (i.e. predictor) contributes to the variance explained, as measured by eta-squared
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  14. #10
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Relative impact


    Spunky I can't open this link....it says there is no such page.
    https://prof.beuth-hochschule.de/fil...t07mayp139.pdf

    Do you know anywhere else I can go and find it?
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats