Regression: Theory behind Intercepts, Coefficients and Effects of Recoding on Data

#1
Hi all,

I've worked through a huge chunk of my regression homework, but have questions about the theoretical explanations I am supposed to provide for my results.

My specific question(s) are as follows:

1) I recoded a set of data and regressed Y on it (we'll call this data "x_new"). I had previously regressed Y on the data x_new was derived from (we'll call it "x_original"). In constructing the two regression equations provided by the analyses performed, I see that while the unstandardized regression coefficient was identical, the intercept of x_new is not the same as x_original (in fact, it looks at least 3 times higher).

For example, the equation may look like this:

Regression equation for Y regressed on x_original = 2-.5(x)
Regression equation for Y regressed on x_new = 6-.5(x)

Can someone help me theoretically understand why this is?

2) After doing this, I then regress Y on both x_original and x_new, and find that the regression equation is IDENTICAL to the equation produced when regressing Y on x_original ALONE. Again, why is this?

I suppose the questions themselves are, well, redundant--so my apologies--but if anyone could explain this to me I would greatly appreciate it! My textbook is poorly organized and I could not find anything to help me with this (unless I spent 3 hours fishing through, which--as a very busy student--I don't have but already tried to do).

Thanks! :D
 

Dragan

Super Moderator
#2
Hi all,

I've worked through a huge chunk of my regression homework, but have questions about the theoretical explanations I am supposed to provide for my results.

My specific question(s) are as follows:

1) I recoded a set of data and regressed Y on it (we'll call this data "x_new"). I had previously regressed Y on the data x_new was derived from (we'll call it "x_original"). In constructing the two regression equations provided by the analyses performed, I see that while the unstandardized regression coefficient was identical, the intercept of x_new is not the same as x_original (in fact, it looks at least 3 times higher).

For example, the equation may look like this:

Regression equation for Y regressed on x_original = 2-.5(x)
Regression equation for Y regressed on x_new = 6-.5(x)

Can someone help me theoretically understand why this is?

2) After doing this, I then regress Y on both x_original and x_new, and find that the regression equation is IDENTICAL to the equation produced when regressing Y on x_original ALONE. Again, why is this?

Thanks! :D

It appears that all you've done is add a constant to X - i.e. X "original" + constant = X "new". This is why only the intercept term changes and not the slope coefficient. That is, you have neither changed the correlation nor the standard deviations - and thus you will not change the slope coefficient.

So, the X's will be perfectly correlated. As a result, when you regress Y on X "original" and X "new" the algorthim will automatically "throw out" X "new" because you have the untenable situation of perfect collinearity.
 
Last edited: