Robust Regression v. Transformation of Variable (or both?)

#1
Hello everyone,


I have a question about a regression I am running. I believe it's a pretty basic question, although after a couple hours searching I couldn't find the answer.

I ran a regression in my software program (STATA) and saw that there are some concerns about heteroskedasticity in the model. As a result, I transformed my dependent variable to ln(y) and ran the regression that way. That seems to have solved the problem.

However, I also know that some programs (like STATA) also have options to use robust standard errors as a way to combat heteroskedasticity. My question is: Is it better to transform the variable or use the robust standard error option? Also, is it acceptable to use both at the same time (i.e. to run a regression with robust standard errors AND ln(y) as the DV)?

Thanks,

Jeffrey
 

ondansetron

TS Contributor
#2
Hello everyone,


I have a question about a regression I am running. I believe it's a pretty basic question, although after a couple hours searching I couldn't find the answer.

I ran a regression in my software program (STATA) and saw that there are some concerns about heteroskedasticity in the model. As a result, I transformed my dependent variable to ln(y) and ran the regression that way. That seems to have solved the problem.

However, I also know that some programs (like STATA) also have options to use robust standard errors as a way to combat heteroskedasticity. My question is: Is it better to transform the variable or use the robust standard error option? Also, is it acceptable to use both at the same time (i.e. to run a regression with robust standard errors AND ln(y) as the DV)?

Thanks,

Jeffrey
It depends on what you're going to do. If you're going to get prediction intervals for y, you can use the ln(y) model, then just take the anti-log of the lower and upper interval numbers (only works for individual y values, not for mean of Y). You would also need to understand how the coefficients can be interpreted (1 unit change in x for a beta %change in y). If you want prediction intervals for both Y and mean of Y I would probably just use the robust SE model. Just be aware the dependent variable is different, so you can't directly compare the R-squared or other model based statistics. You would need a "pseudo-rsquared" from anti-log y-hat values from the ln(y) model to calculate a comparable r-squared between the two. I don't know too much about the pros and cons comparing the two model types, though (may be some literature on it).
 

rogojel

TS Contributor
#3
With the transformed model you can't make any statements about the average y - only about the median y. This could be a problem, depending on what and to whom you need to communicate . Also, you implicitely switch from an additive to a multiplicative model for y, just to handle heteroskedasticity. So, by default, I would use the robust variant.

regards
 

hlsmith

Omega Contributor
#5
Can you post images of the errors? If it is not too severe there is also robust SE like you mentioned. The common suggestion is the simpler the better!