residuals against Y

B_Miner

New Member
Hi all-

I'm looking for an argument / example in simple linear regression, if you plot the residuals against Y, there appears to be a relationship (undefined, just know the plot is not a random scatter) but if you plot residuals against the predicted value (or X) the plot shows only random scatter.

Any ideas? I keeping thinking it will come to me but im waiting a long time ;-)

Etienne

New Member
Hi B Miner,

Assume that Y = X1b1 + X2b2 + e where the independent variables are i.i.d and the residuals e is orthogonal to X=(X1,X2). Assume that you do not have access to X2. Hence, let u = X2b2 + e. Finally, assume that b1 = 0.

Estimate the following equation : Y = X1b1 + u. There will be a relationship between u_hat and Y because u is orthogonal to X1 but no relationship between u_hat and Y_hat = X1b1_hat.

More generally, I believe that this will be the case whenever you plot the residuals against some orthogonal variable - i.e. orthogonal to the variables that generate variation in your Y's.

Etienne

Am I right ?

B_Miner

New Member
I dont know....I hope not because I dont fully understand your answer!

Can you explain in a little more detail, Im just not catching it I'm afraid...

Etienne

New Member
Okay here is an example :

I have a dependent variable Y which is obtained as :

Y = X1b1 + X2b2 + e
where X1 and X2 are i.i.d. N(0,1). The coefficient b1 is not different from 0 : you can think of b1 as a N(0,1) variable.

Now you don't have X2 in your dataset. Let u = X2 + e. You estimate :

Y = X1b1 + u

The residuals that you obtain are orthogonal to the predicted value of Y because X1 is orthogonal to Y. On the other hand, when you plot the residuals against Y, there is a clear relationship because X2 is embedded in u.

Take a look at the attached figures : it is an example of this with 1000 observations.

B_Miner

New Member
This may be a very naive question but how can X1 be independent of Y since it is part of its generation?

Etienne

New Member
In this example the coefficient b1 is itself a random variable. For this reason there is no variation in Y which originates in X1.

This is an extreme example but it illustrates a general point : when your independent variables have a very low explanatory power your plot of the residuals against the predicted value shows only random scatter.

B_Miner

New Member
Thank you! you have been very helpful. Do you use R? If so check this out:
The phenomenon appears when there is an omitted variable, regardless of low explanatory power (in my simulation b1 is very significant).

x1<-rnorm(1000, mean = 0, sd = 1)
x2<-rnorm(1000, mean = 0, sd = 1)
y<-x1+x2
result<-data.frame(y,x1,x2)
result

obj<-lm(y~x1,data=result)
summary(obj)
plot(x,y)
abline(obj)

plot(obj$fitted.values,obj$residual)
plot(result$y,obj$residual)

Etienne

New Member
You're welcome.

By low explanatory power, I meant low R-square. Indeed, the following propositions are equivalent :

(i) your plot of the residuals against the predicted value shows only random scatter

(ii) for any given value of your Xs, there are still large variations in Y

(iii) a large part of the variations in your Ys is left unexplained