univariate vs multivariate

#1
Hi, I have a question about interpreting univariate vs. multivariate (linear) regression. When I'm reading journal articles often both approaches will be mentioned in the results/discussion.

e.g. If 'Y' is the dependent variable
'X' and 'Z' are the predictors/independent variables

What does it mean if:
1. 'X' is a statistically significant predictor in simple/univariate regression, but not in multiple regression (i.e. when 'Z' is included in the model)
2. 'X' is significant in both simple and multiple regression
3. 'X" is significant only in multiple regression but not simple regression

Does this have anything to do with partial correlation?

Thanks
William
 

Masteras

TS Contributor
#2
1. means that the effect of X is statistically significant but it is probably caused due to another variable (Z). For instance, the heart weight of mice is affected by their gender, reasonable, but when you insert the body weight, the gender is no longer significant. The weight is also affected by the gender, so the heart weight is only affected by the body weight, no need to put also the gender inside.
2. it means that the gender effect (for instance) is still significant after you insert the body weight.
3. The effect of the gender is not significant, but when you put body weight it gains significance.
Absolulutely partial correlation, if Y and X are correlated but when you request for the partial of these two controlled over Z and the correlation vanishes it is the exact problem as in 1. The opposite is in case 3. Where as in case 2, partial is not affected by any other variable. Is everything clear now?
 
#3
Hi Masteras
Thanks for your reply, it makes sense but I have 2 follow up questions:

1. In case2 where the independent variable ('X') is significant in both univariate and mutlivariate cases, what does the difference in the R-square of the univariate analysis and the partial R-square for 'X' of the multivariate analysis represent? Would it be the variance explained common to both 'X' and 'Z'?

2.a) To confirm I have understood case3: we would get a non-significant Pearson correlation between 'X' and 'Y', but if we did a partial correlation between these 2 controlling for 'Z', it would be significant.
b) Is something like case3 (signficant in multivariate but not univariate) even possible? I can't think of a real life example.
 
#4
Yes, case 3 is possible. Here's one way it could occur.

First, we note that the estimate of a regression coefficient is biased if other explanatory variables are omitted. It can be shown that the bias is equal to (the true coefficient on the omitted variable) times (the coefficient on the omitted variable when it is regressed on the included variable). The proof isn't hard if you know a bit of matrix algebra--let me know if you want details.

Suppose the true model is Y=bX+cZ+e, where X and Z are explanatory variables, a and c are their coefficients, and e is a normally distributed error.

If we run the regression on both X and Z, we should get accurate estimates, but if we run it on X only, b will be biased. As above, the estimated b will be:
b+(c*d), where d is the estimate from the regression z=dX+u.

So, if c and d are both positive or both negative, the estimate of b is too big, while if only one of c and d is positive, the estimate of b is too small. In the case where c*d=-b, the estimate of b would be zero.

Hence, X could appear insignificant in a univariate regression if c*d is close enough to -b.

A real world example: Suppose a certain public works program gets funding from two government agencies X and Z. Each agency must give a certain fixed percentage, say b% and c%, respectively, of its budget to the program. Clearly the total funding for the program depends positively on both budget X and budget Z.

But now suppose that the budgets X and Z are both drawn from a larger pool in such a way that more money for X means less money for Z. (Perhaps X gets a random percentage and then Z gets a random percentage of whatever's left, with the final remainder going elsewhere). We now have X and Z negatively correlated.

These conditions would result in a univariate estimate of b that is biased towards 0 and which may make X appear insignificant.
 
#5
Hi Atlasfrysmith,

Thanks for your enlightening comments. Regarding the examples of a predictor variable being significant in univariate but not multivariate (or vice versa), what then is the point then of having univariate analyses? Especially given that the more 'real' situation is described by the multivariate case (and rarely will there be only one possible predictor). At worst, univariate analyses could be misleading (see Masteras's example of heart weight of mice above).

Also, could you answer one of my questions from above:
1. In case2 where the independent variable ('X') is significant in both univariate and mutlivariate cases, what does the difference in the R-square of the univariate analysis and the partial R-square for 'X' of the multivariate analysis represent? Would it be the variance explained common to both 'X' and 'Z'?​

Thanks
William