Hi statistics experts,

I run a usual regression model (1 dependent, different independent variables) for each industry-year and save the residuals and the predicted values in new variables. However, depending on the code I use (in Stata), I get different results that both seem reasonable to me.

For the first code: When I substitute the original dependent variable with the newly generated predicted values in the regression model, adj. R² is 1 and all controls are highly significant. (I think this is reasonable since the predicted value is the value that is fully explained by the independent variables). When I include the residuals as dependent variables, I get an adj R² of approx. 0.4 and both significant and insignifcant coefficients.

For the second code: When I substitute the original dependent variable with the newly generated residual values in the regression model, adj. R² is 0 and all ind. variables are insignificant (t-values are all 0). I think this makes sense as well since the residuals are the values that are not explained by the ind. variables. However, this variable is not statistically correlated to almost every other variable as well.

As said, to me both results are reasonable but depending on the code the generated variables are very different. Hence, only one procedure can be correct. Can someone of you maybe tell me which result does make more sense in this case?

Your process isn't clear to me at all. What are you doing differently in the two trials?

Hi Dason,

thanks for your response. To make my actual question clearer: I only want to know the expected output, when I first run a regression model to estimate the residuals and then integrate this resulting residual variable as dependent variable in the original regression model.

Does it make sense when adj R² and all t-values are 0 in this case?

