# Thread: (Li'l) theoretical question about correlations in regression...

1. ## (Li'l) theoretical question about correlations in regression...

hey y'all people!??! how're y'all doooin' huh? i hope everyone had the most awesomest holidays!!! heh.

so... who else is feelin' the pinch this semester? oh man it's my last (course-intensive) semseter before finishin'up my master's (just waiting on the thesis heh) so i had to cram up 2 seminars on the stats dept and 2 courses on my home dept in education... in any case, best of lucks to everyone.

so last friday i got asked a somewhat intersting question that i haven't been able to quite figure out yet... it goes like this:

let's pretend that we have a regression that looks like . now, as it usually happens in these cases, these variables have certain correlations so that and you know, all of those are not zero.

say that i now have a reduced regression model that looks like , so it's the same as the previous one but without one predictor, .

the question would then be:

what would be the correlation between the ommitted predictor and the predicted scores from the second, reduced model?

i am having a little bit of a hard time because there are a few too many correlations and i think the algebra's gonna get somewhat complicated if i try to sort it out by re-expressing in terms of its correlation with and ...

oh god, i'm really hoping someone knows maybe a smart matrix algebra trick or some relationship (maybe through the reduced model's ) to simplify things before i kind of tackle this in full force...

thanks to everyone!

2. ## Re: (Li'l) theoretical question about correlations in regression...

It seems that the problem is not related to the first regression model. Am I missing anything?

Also do you mean something like , assuming you have data. Or you are having other regression like the total least square?

I have tried but not successful to simplify the thing under these assumptions.

3. ## Re: (Li'l) theoretical question about correlations in regression...

hello BGM... thank you very much for taking the time to look at this...

let me re-express the question in terms of correlation matrices and see if it makes more sense.

under the first model, where you have as dependent and and as predictors you would have a 4 X 4 correlation matrix among all those, right? now, if you do a regression predicting Y from X and Z (without the W predictor) and consider as a new variable you could write up a new, 3 X 3 correlation matrix that would have:

the correlation between Y and W (because you already know that from the original 4 X 4 correlation matrix from which we started)

the correlation between Y and (because you can get that from the square root of the , which you obtained from the previous regression predicting Y from X and Z)

and then i'd just be missing the correlation between and the predictor W that was not included in the original regression...

... i know there are boundaries for that correlation but that's as far as i've got. and, just as you said, at some point my algebra became so extensive and confusing that i'm reaching out for help, hoping someone knows some smart trick or has an insight into what to do...

4. ## Re: (Li'l) theoretical question about correlations in regression...

Y only matters to get the coefficients for the linear model. But Yhat is just a linear combination of X and Z once we have the parameters.

Code:
n <- 10000
X <- rnorm(n, 10, 3)
Z <- X + runif(n, 2, 10)
W <- .2*X - .7*Z + rnorm(n)

a <- .2
b <- 30

# Y doesn't even matter!  yhat is just a linear combination
# of X and Z.  Note I omit an intercept but it doesn't matter for correlation.
yhat <- a*X + b*Z

# cXY stands for the correlation between X and Y
cwx <- cor(W,X)
cwz <- cor(W,Z)
cxz <- cor(X,Z)
sx <- sd(X)
sz <- sd(Z)

# All in terms of correlations and variances of X, Z, W
(a*cwx*sx + b*cwz*sz)/sqrt(a^2*sx^2 + b^2*sz^2 + 2*a*b*cxz*sx*sz)

# Or without the simplified names...
(a*cor(W,X)*sd(X) + b*cor(W,Z)*sd(Z))/sqrt(a^2*var(X) + b^2*var(Z) + 2*a*b*cor(X,Z)*sd(X)*sd(Z))

# and it matches...
cor(W, yhat)
Note that we would need to replace 'a' and 'b' with the coefficients from the linear model Y ~ b0 + b1*x + b2*z...

Note that all I did was replace Yhat with a*X + b*Z and then used definitions and rules about manipulating covariances.

Also now that I think about it... it would be a lot easier to do this with matrix manipulations. But I'm too lazy to work that out right now.

5. ## The Following User Says Thank You to Dason For This Useful Post:

spunky (01-08-2012)

6. ## Re: (Li'l) theoretical question about correlations in regression...

well... that is kind of what i wanted to ask you... so where did you start with the replacement and the covariance rules? like i see it works and i believe you... but how did you get to that final part where the product of correlations and regression coefficients just gets you that correlation... i've spent a good chunk of yesterday trying to tackle this problem and...oh wow!

7. ## Re: (Li'l) theoretical question about correlations in regression...

I just used a few facts.

1) Cor(X, Y) = Cov(X, Y)/(sd(X)sd(Y)) implies that Cov(X,Y) = sd(X)sd(Y)Cor(X, Y)

2) Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z)

3) Cov(a*X, b*Y) = abCov(X, Y)

Then I wrote Cor(W, Yhat) = Cor(W, a*X + b*Z) = Cov(W, a*X + b*Z)/(sd(W)*sd(a*X + b*Z)). I used (2) and (3) to break up the numerator and then (1) to get it into terms of correlations and standard deviations. I used variance rules to expand out the denominator.

8. ## Re: (Li'l) theoretical question about correlations in regression...

Originally Posted by spunky
well... that is kind of what i wanted to ask you... so where did you start with the replacement and the covariance rules? like i see it works and i believe you... but how did you get to that final part where the product of correlations and regression coefficients just gets you that correlation... i've spent a good chunk of yesterday trying to tackle this problem and...oh wow!

Spunky: Look at Equation (4.7) on page 90 in my book. I think you can get your answer by using that equation and multiplying the result by the value of R for the reduced model.

9. ## Re: (Li'l) theoretical question about correlations in regression...

i feel so... so... SOOOO ****... never in my life (until now) had i ever considered that cor(x,y) *sd(x) *sd(y) = cov(x,y)..... oh god, i really, really feel the need for a smack-my-forehead emoticon.... lol

@Dragan i think i'll need to go grab your book from the library yet one more time .... at this point i think i'm gonna just add it to my amazon wishlist, hehe...

10. ## Re: (Li'l) theoretical question about correlations in regression...

Previously I thought that spunky question means

and are some sort of least-square estimators which is a function of . In this case it seems that it is very complicated.

But if spunky just mean and is just the true parameter which are constants, then it is much simpler and just use the identities as what Dason suggested will be enough

11. ## Re: (Li'l) theoretical question about correlations in regression...

Originally Posted by BGM
Previously I thought that spunky question means

and are some sort of least-square estimators which is a function of . In this case it seems that it is very complicated.

But if spunky just mean and is just the true parameter which are constants, then it is much simpler and just use the identities as what Dason suggested will be enough

Actually, there's a method that is much easier to compute the answer to Spunky's question than what Dason suggeted, BGM.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts