Finding predicted values with logged (ln) dependent variable.

Hello all,

I'm currently working on my first quantitative project. I am doing an OLS regression in SPSS. I had to take the natural log of my DV in order for it to pass assumptions, and I want to do predicted values. I understand that merely exponentiating the sum of the regression coefficients (which correspond with the naturally-logged dependent variable) introduces bias into my predicted values. I have heard of Duan's Smearing Factor, but do not understand it, and I'm not even sure I need to use it. My residuals are normal and my scatterplot for my regression looks fine.

Anyways, I found this article my Newman (1993) ( which seems to suggest that if my regression residuals are normal, then I can merely find my error term which makes my predicted values less biased by using the equation e^(error) = e^(mean square error / 2). I thus took the natural log of both sides, leaving me with my MSE of 1.066 / 2 for my error term (resulting in .533) to add to my predicted value equations before exponentiating their sums. I had previously calculated predicted values for both my non-logged model and my logged model (without error term). The former overestimated my DV (job tenure in months) and the latter underestimated it. However, with my new error term of .533, the predicted values seemed more correct. For instance, using independent variable values that reflect the median values of each variable in a predicted equation AND adding the error term gets me very close to the median of my dependent variable before it was logged. I hope that makes sense! The predicted values seem to be better after adding the error term I got from Newman's (1993) equation, but did I do it right?

My foremost concern is that Newman's work doesn't suggest to me that just having my dependent variable logged works with this equation to find the error term. I'm worried I need to have every variable logged, but that strikes me as very nonsensical.

Can anyone help me out in any way? My thesis adviser is gone for a couple weeks because of a family emergency, and my deadlines draw ever nearer! Thanks in advance! :]


Less is more. Stay pure. Stay poor.
Interesting article. It does seem to request all terms be logged.

Options, log just DV variable and interpret on changes in standard deviations or log all variables and used the proposed approach above.

I am sure there could be other approaches but I don't run a lot of linear reg models.
If I did log all terms and did this process to see if I got sensible values, how would I go about transforming my categorical variables that have values right now of 0, 1, etc.? I think taking the natural log of those (e.g., sex, marital status, etc.) would obviously not work.