+ Reply to Thread
Results 1 to 9 of 9

Thread: Variance inflation and interactions between variables?

  1. #1
    Points: 24, Level: 1
    Level completed: 47%, Points required for next Level: 26

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Variance inflation and interactions between variables?




    Hello everyone

    I am having trouble interpreting some of my results.
    I am using logistic regression to infer a model based on measured data. Some of my explanatory variables are continuous (e.g. temperature [°C]) and some are categorical (e.g. time of day [night, morning, day, afternoon, evening]). To investigate multicollinearity issues, I have calculated Generalized Variance Inflation Factors (GVIF) in R (using the CAR package). R automatically calculates GVIF^(1/(2Df)), which to my understanding is an estimate of the factor by which the confidence interval of each coefficient is inflated (please correct me if I am wrong).

    My problem is: How should I interpret the GVIF of interaction terms between continuous and categorical variables?

    One of my simple models looks like this:

    Code: 
    Coefficients:
                              Estimate Std. Error z value Pr(>|z|)    
    (Intercept)               -8.76208    2.54158  -3.447 0.000566 ***
    Temperature                0.08847    0.11441   0.773 0.439363    
    timeMorning                3.20504    2.82524   1.134 0.256614    
    timeDay                    0.72913    2.77043   0.263 0.792409    
    timeAfternoon             -0.34141    2.77430  -0.123 0.902057    
    timeEvening               -0.97397    3.16012  -0.308 0.757926    
    Temperature:timeMorning   -0.02669    0.12782  -0.209 0.834601    
    Temperature:timeDay        0.06239    0.12415   0.503 0.615302    
    Temperature:timeAfternoon  0.09116    0.12386   0.736 0.461711    
    Temperature:timeEvening    0.06535    0.13907   0.470 0.638410
    with the following GVIFs

    Code: 
                            GVIF Df GVIF^(1/(2*Df))
    Temperature      2.091206e+01  1        4.572971
    time             1.595779e+08  4       10.601604
    Temperature:time 1.899285e+08  4       10.834872
    I would like to make a table like the one below:

    Code: 
     
                      Estimate     std.Dev   std.Err   C.I. 2.5%   C.I 97.5%   Inflation
    Intercept        
            Night     -8.76208     2.54158   0.010494  -8.78       -8.74           XX
            Morning   -5.55704     3.800212  0.01569   -5.59       -5.53           XX
            Day       -8.03295     3.759642  0.015523  -8.06       -8.00           XX
            Afternoon -9.10349     3.762495  0.015535  -9.13       -9.07
            Evening   -9.73605     4.055365  0.016744  -9.77       -9.70
    Temperature
            Night      0.08847     0.11441   0.000472  0.0875       0.0894
            Morning    0.06178     0.171545  0.000708  0.0604       0.0632
            Day        0.15086     0.168828  0.000697  0.1495       0.1522
            Afternoon  0.17963     0.168615  0.000696  0.1783       0.1810
            Evening    0.15382     0.180084  0.000744  0.1524       0.1553
    My problem is: How do I calculate the Inflation of the confidence intervals?

    I would really appreciate if anyone can help!!
    Last edited by enur; 08-03-2012 at 10:43 AM.

  2. #2
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Variance inflation and interactions between variables?

    By categorical variable, do you mean ordinal?

  3. #3
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Variance inflation and interactions between variables?

    You should not think too hard about the VIFs in this scenario. If you do not center your predictors (and it looks like you haven't!), then there will almost always be apparently extreme multicollinearity between the interaction term and the simple effect terms. This makes sense: if you have an interaction term A*B, it should not be surprising that this is highly correlated with A, because half of what comprises A*B is A itself!

    However, this multicollinearity is a red herring. It is an artifact of having not centered your predictors and does not actually inflate your confidence intervals to an undue degree.

    To see this, take a look at the formula for the (1 - \alpha)% confidence interval of the coefficient \beta_j for a predictor X_j:

    b_j \pm \sqrt{\frac{(F_{1,n-p;\alpha})(MSE)}{(SSX_j)(TOL_j)}}

    F_{1,n-p;\alpha} is the critical value of F, MSE is the mean squared error of the model, SSX_j is the variance of the predictor X_j (technically the sum of squared errors: SSX_j = s_j^2(n - 1)), and TOL_j is the "tolerance" of X_j, which is just \frac{1}{VIF_j}.

    As you can see, as the tolerance decreases (conversely, as the VIF increases), the confidence interval expands. (This also answers your question about what exactly the inflaction factor is -- the confidence interval expands with the square root of the VIF.) However, in the situation of predictors that are products of uncentered variables, it turns out that this decrease in tolerance caused by not centering the predictors is counterweighed by an increase in the variance of the predictor, SSX_j, so that these two effects cancel out and the width of the confidence interval is net unchanged.

    The following tables might help to illustrate the effect of centering on both multicollinearity and variance:

    Uncentered
    Code: 
    > uncen
          x1 x2 x1x2
     [1,]  7  8   56
     [2,]  4  6   24
     [3,]  9  9   81
     [4,]  6  8   48
     [5,]  6  9   54
     [6,]  6  5   30
     [7,]  6  9   54
     [8,]  6  1    6
     [9,]  8  3   24
    [10,]  5  9   45
    > 
    > # correlations
    > cor(uncen)
                   x1           x2      x1x2
    x1    1.000000000 -0.002730559 0.4655388
    x2   -0.002730559  1.000000000 0.8715734
    x1x2  0.465538807  0.871573389 1.0000000
    > 
    > # variances
    > apply(uncen, 2, var)
            x1         x2       x1x2 
      2.011111   8.233333 459.733333
    Centered
    Code: 
    > cen
            x1   x2  x1x2
     [1,]  0.7  1.3  0.91
     [2,] -2.3 -0.7  1.61
     [3,]  2.7  2.3  6.21
     [4,] -0.3  1.3 -0.39
     [5,] -0.3  2.3 -0.69
     [6,] -0.3 -1.7  0.51
     [7,] -0.3  2.3 -0.69
     [8,] -0.3 -5.7  1.71
     [9,]  1.7 -3.7 -6.29
    [10,] -1.3  2.3 -2.99
    > 
    > # correlations
    > cor(cen)
                   x1           x2      x1x2
    x1    1.000000000 -0.002730559 0.1632143
    x2   -0.002730559  1.000000000 0.1961749
    x1x2  0.163214305  0.196174937 1.0000000
    > 
    > # variances
    > apply(cen, 2, var)
           x1        x2      x1x2 
     2.011111  8.233333 10.530667
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  4. The Following User Says Thank You to Jake For This Useful Post:

    hlsmith (01-06-2015)

  5. #4
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Variance inflation and interactions between variables?

    enur, what would be the purpose of examining this collinearity?

  6. #5
    Points: 24, Level: 1
    Level completed: 47%, Points required for next Level: 26

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Variance inflation and interactions between variables?

    Thank you for your replies – I really appreciate it!

    Hlsmith: yes when I wrote categorical I meant ordinal - I often make this mistake (I don’t know why).

    Jake: I am not really sure what you mean by centering of predictors. Maybe I was not clear enough about my data.
    I have been measuring temperature in residential buildings (and some other variables) for a period of time. The variables (including temperature) were measured on 10 minute intervals. Based on the time of day, I have created an ordinal variable called time [night, morning, day, afternoon, evening].
    I have also recorded different events (on/off) in the buildings. My aim is to create models, which can predict events, based on the measured variables. I have used logistic regression and stepwise forward and backward selection of variables to infer the different models (the selection was based on AIC). The model in my example was the simplest I could think of. Most of the inferred models include more variables.

    I would like to calculate the possible inflation of the confidence intervals due to collinearity, so I (and others) can be aware of this in the future, when I start using (and validating) the models.

    If I understand it correctly, a GVIF^(1/(2DF)) of 10.6 for the variable ‘time’ means that the effects of ‘time’ on the intercept may be inflated to such an extent, that the inferred confidence intervals for the intercept may be up to 10.6 times too large. A GVIF^(1/(2DF)) of 4.6 for the variable ‘Temperature’ means that the confidence interval for the ‘Temperature’ coefficient may be up to 4.6 times too large, as compared to the case with no multicollinearity. My problem is that I have interactions between ‘time’ and ‘Temperature’ resulting in five different coefficients for the variable ‘Temperature’. How do I interpret a GVIF^(1/(2DF)) of 10.8 for the interaction between Temperature and time?
    Can I simply add the GVIFS, so that the temperature confidence intervals may be 4.6+10.8=15.4 times too large?

    Any insights are highly appreciated!

  7. #6
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Variance inflation and interactions between variables?

    Quote Originally Posted by enur View Post
    Jake: I am not really sure what you mean by centering of predictors. Maybe I was not clear enough about my data.
    Yes, I think I understand the example. To "center" a predictor means to subtract off the mean value of that predictor from all the individual values, so that the new mean is 0. Observe the values of x1 and x2 in the first code block that I posted and compare them to the values of x1 and x2 in the second code block.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  8. #7
    Points: 1,980, Level: 26
    Level completed: 80%, Points required for next Level: 20
    derksheng's Avatar
    Posts
    247
    Thanks
    49
    Thanked 35 Times in 28 Posts

    Re: Variance inflation and interactions between variables?

    At risk of hijacking (and asked here to avoid the risk of making crap threads with 1 liner questions):

    If there are two regressors in a 5 regressor cross-sectional regression that have correlation (pearson) = 0.5 with p-value < 0.001, is this bad? I have about 350 observations in the cross-section and other Gauss-Markov assumptions are in tact.

  9. #8
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Variance inflation and interactions between variables?

    Probably not. What are the VIFs?

    Note that even when multicollinearity is a big problem, it's really only a "problem" from the perspective of having a negative influence on power. There is no "assumption" of non-collinearity to be violated. It just works out more nicely to have the predictors be close to orthogonal.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  10. #9
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Variance inflation and interactions between variables?


    An old thread, but one I have a question on. If I understand Jakes comment...

    As you can see, as the tolerance decreases (conversely, as the VIF increases), the confidence interval expands. (This also answers your question about what exactly the inflaction factor is -- the confidence interval expands with the square root of the VIF.) However, in the situation of predictors that are products of uncentered variables, it turns out that this decrease in tolerance caused by not centering the predictors is counterweighed by an increase in the variance of the predictor, , so that these two effects cancel out and the width of the confidence interval is net unchanged.
    correctly then while VIF will likely indicate multicolinearity for interaction terms [and possibly the main effects associated with this] the multicolinearity will not effect the test of statistical signficance [through the standard errors] as it would normally. I assume this is because there really is no actual multicolinearity in this case it is only the VIF test being distorted [although I am not certain of this from the post].

    I assume Josh means main effects when he mentions simple effect terms in his post.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats