perhaps you're dealing with a suppressor effect?
Dear all,
I am trying to run an OLS regression in Stata 13, with log of per capita calorie as my dependent variable and age and years of education of household head, log per capita expenditure as my independent variables (other controls to be added eventually). When I run the regression with just age and education as control, they are significant and positive. However, as soon as I add log per capita expenditure, education becomes negative and significant. I am puzzled by this result (the literature on calorie consumption argues that education of the household head has a positive impact)- I understand that education of the household head might reflect a "wealth" effect, but the correlation coefficient is not that large. I have posted my regression results below, as well as summary statistics. I was wondering if someone could help me understand what is going on here. I realize that this sort of problem might (or might not ) be overcome using other techniques than OLS, but I have just started learning OLS and would like to understand how to deal with this in OLS, or at least know why it cannot deal with this.
Thanks,
Monzur
Code:. regress log_pccal age_hhhead eduy_hhhead [pw=hhweight], r Linear regression Number of obs = 3355 F( 2, 3352) = 105.40 Prob > F = 0.0000 R-squared = 0.0692 Root MSE = .25583 ------------------------------------------------------------------------------ | Robust log_pccal | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age_hhhead | .0049182 .0003602 13.65 0.000 .004212 .0056244 eduy_hhhead | .0075136 .0011997 6.26 0.000 .0051613 .0098659 _cons | 7.537586 .0171067 440.62 0.000 7.504045 7.571126 ------------------------------------------------------------------------------ . regress log_pccal age_hhhead eduy_hhhead log_pcexp [pw=hhweight], r Linear regression Number of obs = 3355 F( 3, 3351) = 601.38 Prob > F = 0.0000 R-squared = 0.4123 Root MSE = .20332 ------------------------------------------------------------------------------ | Robust log_pccal | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age_hhhead | .001919 .0002945 6.52 0.000 .0013415 .0024964 eduy_hhhead | -.0082508 .001044 -7.90 0.000 -.0102977 -.0062039 log_pcexp | .3777407 .0100402 37.62 0.000 .3580552 .3974262 _cons | 4.795607 .0730719 65.63 0.000 4.652337 4.938877 ------------------------------------------------------------------------------ . estat vif Variable | VIF 1/VIF -------------+---------------------- log_pcexp | 1.20 0.832228 eduy_hhhead | 1.16 0.863121 age_hhhead | 1.07 0.930743 -------------+---------------------- Mean VIF | 1.14 . su log_pccal eduy_hhhead log_pcexp, d log_pccal ------------------------------------------------------------- Obs 3698 Mean 7.783589 Std. Dev. .276406 Variance .0764003 Skewness .0350145 Kurtosis 3.511389 years of education of household head ------------------------------------------------------------- Obs 3698 Sum of Wgt. 3698 Mean 2.984857 Std. Dev. 3.776812 Variance 14.26431 Skewness .9461994 Kurtosis 2.751041 log of hh per capita expenditure ------------------------------------------------------------- Obs 3698 Sum of Wgt. 3698 Mean 7.762185 Std. Dev. .4636838 Variance .2150027 Skewness .4395734 Kurtosis 3.433132 . pwcorr log_pccal age_hhhead eduy_hhhead log_pcexp, sig | log~ccal age_hh~d eduy_h~d log_pc~p -------------+------------------------------------ log_pccal | 1.0000 | | age_hhhead | 0.2282 1.0000 | 0.0000 | eduy_hhhead | 0.0855 -0.1133 1.0000 | 0.0000 0.0000 | log_pcexp | 0.6401 0.1796 0.3254 1.0000 | 0.0000 0.0000 0.0000 |
perhaps you're dealing with a suppressor effect?
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
You might have multicolinearity or possibly a moderator effect (where one IV is influencing the impact of another variable on the DV). I do not know how to test for moderator effects ( I don't work with moderators generally) but you can test for MC by running a VIF test. If memory serves a change in sign when you add a variable is a sign often of one of these effects. This is an example that multivariate relationships and univariate relationships can be very different.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
they're different but related things.... a moderator could be a suppressor but not all suppressors are moderators. these people do a pretty good job at untangling the whole thing:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2819361/
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
noetsi (12-19-2014)
JEEBEZUZ! just look at the change in the fit of the model!
without the suppressor variable (log_pcexp) your R-squared is 0.0692.... so basically zero. but with your suppressor variable makes the R-squared jump to 0.4123!!!
my money's on the suppressor effect
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
Tweet |