residuals against Y

Hi all-

I'm looking for an argument / example in simple linear regression, if you plot the residuals against Y, there appears to be a relationship (undefined, just know the plot is not a random scatter) but if you plot residuals against the predicted value (or X) the plot shows only random scatter.

Any ideas? I keeping thinking it will come to me but im waiting a long time ;-)
Hi B Miner,

Assume that Y = X1b1 + X2b2 + e where the independent variables are i.i.d and the residuals e is orthogonal to X=(X1,X2). Assume that you do not have access to X2. Hence, let u = X2b2 + e. Finally, assume that b1 = 0.

Estimate the following equation : Y = X1b1 + u. There will be a relationship between u_hat and Y because u is orthogonal to X1 but no relationship between u_hat and Y_hat = X1b1_hat.

More generally, I believe that this will be the case whenever you plot the residuals against some orthogonal variable - i.e. orthogonal to the variables that generate variation in your Y's.


Am I right ?
I dont know....I hope not because I dont fully understand your answer! :)

Can you explain in a little more detail, Im just not catching it I'm afraid...
Okay here is an example :

I have a dependent variable Y which is obtained as :

Y = X1b1 + X2b2 + e
where X1 and X2 are i.i.d. N(0,1). The coefficient b1 is not different from 0 : you can think of b1 as a N(0,1) variable.

Now you don't have X2 in your dataset. Let u = X2 + e. You estimate :

Y = X1b1 + u

The residuals that you obtain are orthogonal to the predicted value of Y because X1 is orthogonal to Y. On the other hand, when you plot the residuals against Y, there is a clear relationship because X2 is embedded in u.

Take a look at the attached figures : it is an example of this with 1000 observations.
In this example the coefficient b1 is itself a random variable. For this reason there is no variation in Y which originates in X1.

This is an extreme example but it illustrates a general point : when your independent variables have a very low explanatory power your plot of the residuals against the predicted value shows only random scatter.
Thank you! you have been very helpful. Do you use R? If so check this out:
The phenomenon appears when there is an omitted variable, regardless of low explanatory power (in my simulation b1 is very significant).

x1<-rnorm(1000, mean = 0, sd = 1)
x2<-rnorm(1000, mean = 0, sd = 1)


You're welcome.

By low explanatory power, I meant low R-square. Indeed, the following propositions are equivalent :

(i) your plot of the residuals against the predicted value shows only random scatter

(ii) for any given value of your Xs, there are still large variations in Y

(iii) a large part of the variations in your Ys is left unexplained

(iv) your R-square is low

Yes I use R ; I'll take a look at your example. I did the example I gave you with Stata.