I have discovered an interesting statistical paradox. If you familiar with it, please, let me know about its conventional name (or author's name) and where can I get more information on it.
I studied individual contribution of regressors into explanatory power of a multivariate linear regression and discovered an example where the same variable is the best if we choose which one to add first to the regression, but is the worst if we choose which one to remove first from the regression. Here is the example:
Let x1, x2 and x3 have joint normal distribution and y=x1+x2+x3. Let's denote V(*) as variance and Cov(*,*) as covariance. Let's put V(x1)=V(x2)=V(x3)=1 and Cov(x1,x2)=0, Cov(x1,x3)=0.3 and Cov(x2,x3)=0.7.
Now we have to choose which one variable (x1, x2 or x3) is the best one to predict y. We will use variance of y conditional on x (let's denote as V(y|x)) as a criterion. Since all regression estimators that we consider are unbiased (because of joint normal distribution), the choice with minimal conditional variance will be the best choice. We know that for joint normal distribution conditional variance does not depend on the condition value (V(y|x) is independent of x). The formula that I derived (hopefully without mistakes) is as follows: V(y|x1) = V(x2) + V(x3) + 2*Cov(x2,x3) - (Cov(x1,x2) + Cov(x1,x3))^2/V(x1). If ve calculate conditional variances we get V(y|x1)=3.3, V(y|x2)=2.1 and V(y|x3)=1.0, so, x3 individually better predicts y than x1 or x2.
Jointly x1, x2 and x3 perfectly predict y (i.e. V(y|x1,x2,x3)=0). And now we know that x3 is the best individual predictor of y. So, it would be intuitive to expect that removing x3 from the triple (x1,x2,x3) would be the most harmful for prediction power of the triple, compared to removing x1 or x2. However, the results are opposite:
The formula that I derived is as follows: V(y|x1,x2) = V(x3) - [V(x2)*Cov(x1,x3)^2 + V(x1)*Cov(x2,x3)^2 - 2*Cov(x1,x2)*Cov(x1,x3)*Cov(x2,x3)] / (V(x1)*V(x2) - Cov(x1,x2)^2). If ve calculate conditional variances we get V(y|x1,x2)=0.4, V(y|x1,x3)=0.5 and V(y|x2,x3)=0.8, so regression without x3 better predicts y than without x1 or without x2.
As far as I understand, this result means that presence of multicollinearity in regression kills «individual contributions of regressors» as a concept. The paradox obtained above remains in effect even if all the correlations between regressors approach zero, e.g. Cov(x1,x2)=0, Cov(x1,x3)=e and Cov(x2,x3)=e as e approaches zero while e>0.
Tweet |