Dear all,
I am trying to run an OLS regression in Stata 13, with log of per capita calorie as my dependent variable and age and years of education of household head, log per capita expenditure as my independent variables (other controls to be added eventually). When I run the regression with just age and education as control, they are significant and positive. However, as soon as I add log per capita expenditure, education becomes negative and significant. I am puzzled by this result (the literature on calorie consumption argues that education of the household head has a positive impact) I understand that education of the household head might reflect a "wealth" effect, but the correlation coefficient is not that large. I have posted my regression results below, as well as summary statistics. I was wondering if someone could help me understand what is going on here. I realize that this sort of problem might (or might not ) be overcome using other techniques than OLS, but I have just started learning OLS and would like to understand how to deal with this in OLS, or at least know why it cannot deal with this.
Thanks,
Monzur
I am trying to run an OLS regression in Stata 13, with log of per capita calorie as my dependent variable and age and years of education of household head, log per capita expenditure as my independent variables (other controls to be added eventually). When I run the regression with just age and education as control, they are significant and positive. However, as soon as I add log per capita expenditure, education becomes negative and significant. I am puzzled by this result (the literature on calorie consumption argues that education of the household head has a positive impact) I understand that education of the household head might reflect a "wealth" effect, but the correlation coefficient is not that large. I have posted my regression results below, as well as summary statistics. I was wondering if someone could help me understand what is going on here. I realize that this sort of problem might (or might not ) be overcome using other techniques than OLS, but I have just started learning OLS and would like to understand how to deal with this in OLS, or at least know why it cannot deal with this.
Thanks,
Monzur
Code:
. regress log_pccal age_hhhead eduy_hhhead [pw=hhweight], r
Linear regression Number of obs = 3355
F( 2, 3352) = 105.40
Prob > F = 0.0000
Rsquared = 0.0692
Root MSE = .25583

 Robust
log_pccal  Coef. Std. Err. t P>t [95% Conf. Interval]
+
age_hhhead  .0049182 .0003602 13.65 0.000 .004212 .0056244
eduy_hhhead  .0075136 .0011997 6.26 0.000 .0051613 .0098659
_cons  7.537586 .0171067 440.62 0.000 7.504045 7.571126

. regress log_pccal age_hhhead eduy_hhhead log_pcexp [pw=hhweight], r
Linear regression Number of obs = 3355
F( 3, 3351) = 601.38
Prob > F = 0.0000
Rsquared = 0.4123
Root MSE = .20332

 Robust
log_pccal  Coef. Std. Err. t P>t [95% Conf. Interval]
+
age_hhhead  .001919 .0002945 6.52 0.000 .0013415 .0024964
eduy_hhhead  .0082508 .001044 7.90 0.000 .0102977 .0062039
log_pcexp  .3777407 .0100402 37.62 0.000 .3580552 .3974262
_cons  4.795607 .0730719 65.63 0.000 4.652337 4.938877

. estat vif
Variable  VIF 1/VIF
+
log_pcexp  1.20 0.832228
eduy_hhhead  1.16 0.863121
age_hhhead  1.07 0.930743
+
Mean VIF  1.14
. su log_pccal eduy_hhhead log_pcexp, d
log_pccal

Obs 3698
Mean 7.783589
Std. Dev. .276406
Variance .0764003
Skewness .0350145
Kurtosis 3.511389
years of education of household head

Obs 3698
Sum of Wgt. 3698
Mean 2.984857
Std. Dev. 3.776812
Variance 14.26431
Skewness .9461994
Kurtosis 2.751041
log of hh per capita expenditure

Obs 3698
Sum of Wgt. 3698
Mean 7.762185
Std. Dev. .4636838
Variance .2150027
Skewness .4395734
Kurtosis 3.433132
. pwcorr log_pccal age_hhhead eduy_hhhead log_pcexp, sig
 log~ccal age_hh~d eduy_h~d log_pc~p
+
log_pccal  1.0000


age_hhhead  0.2282 1.0000
 0.0000

eduy_hhhead  0.0855 0.1133 1.0000
 0.0000 0.0000

log_pcexp  0.6401 0.1796 0.3254 1.0000
 0.0000 0.0000 0.0000
