hey y'all people!??! how're y'all doooin' huh? i hope everyone had the most awesomest holidays!!! heh.
so... who else is feelin' the pinch this semester? oh man it's my last (course-intensive) semseter before finishin'up my master's (just waiting on the thesis heh) so i had to cram up 2 seminars on the stats dept and 2 courses on my home dept in education... in any case, best of lucks to everyone.
so last friday i got asked a somewhat intersting question that i haven't been able to quite figure out yet... it goes like this:
let's pretend that we have a regression that looks like . now, as it usually happens in these cases, these variables have certain correlations so that and you know, all of those are not zero.
say that i now have a reduced regression model that looks like , so it's the same as the previous one but without one predictor, .
the question would then be:
what would be the correlation between the ommitted predictor and the predicted scores from the second, reduced model?
i am having a little bit of a hard time because there are a few too many correlations and i think the algebra's gonna get somewhat complicated if i try to sort it out by re-expressing in terms of its correlation with and ...
oh god, i'm really hoping someone knows maybe a smart matrix algebra trick or some relationship (maybe through the reduced model's ) to simplify things before i kind of tackle this in full force...
thanks to everyone!
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
hello BGM... thank you very much for taking the time to look at this...
let me re-express the question in terms of correlation matrices and see if it makes more sense.
under the first model, where you have as dependent and and as predictors you would have a 4 X 4 correlation matrix among all those, right? now, if you do a regression predicting Y from X and Z (without the W predictor) and consider as a new variable you could write up a new, 3 X 3 correlation matrix that would have:
the correlation between Y and W (because you already know that from the original 4 X 4 correlation matrix from which we started)
the correlation between Y and (because you can get that from the square root of the , which you obtained from the previous regression predicting Y from X and Z)
and then i'd just be missing the correlation between and the predictor W that was not included in the original regression...
... i know there are boundaries for that correlation but that's as far as i've got. and, just as you said, at some point my algebra became so extensive and confusing that i'm reaching out for help, hoping someone knows some smart trick or has an insight into what to do...
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
Y only matters to get the coefficients for the linear model. But Yhat is just a linear combination of X and Z once we have the parameters.
Note that we would need to replace 'a' and 'b' with the coefficients from the linear model Y ~ b0 + b1*x + b2*z...Code:n <- 10000 X <- rnorm(n, 10, 3) Z <- X + runif(n, 2, 10) W <- .2*X - .7*Z + rnorm(n) a <- .2 b <- 30 # Y doesn't even matter! yhat is just a linear combination # of X and Z. Note I omit an intercept but it doesn't matter for correlation. yhat <- a*X + b*Z # cXY stands for the correlation between X and Y cwx <- cor(W,X) cwz <- cor(W,Z) cxz <- cor(X,Z) sx <- sd(X) sz <- sd(Z) # All in terms of correlations and variances of X, Z, W (a*cwx*sx + b*cwz*sz)/sqrt(a^2*sx^2 + b^2*sz^2 + 2*a*b*cxz*sx*sz) # Or without the simplified names... (a*cor(W,X)*sd(X) + b*cor(W,Z)*sd(Z))/sqrt(a^2*var(X) + b^2*var(Z) + 2*a*b*cor(X,Z)*sd(X)*sd(Z)) # and it matches... cor(W, yhat)
Note that all I did was replace Yhat with a*X + b*Z and then used definitions and rules about manipulating covariances.
Also now that I think about it... it would be a lot easier to do this with matrix manipulations. But I'm too lazy to work that out right now.
spunky (01-08-2012)
well... that is kind of what i wanted to ask you... so where did you start with the replacement and the covariance rules? like i see it works and i believe you... but how did you get to that final part where the product of correlations and regression coefficients just gets you that correlation... i've spent a good chunk of yesterday trying to tackle this problem and...oh wow!
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
I just used a few facts.
1) Cor(X, Y) = Cov(X, Y)/(sd(X)sd(Y)) implies that Cov(X,Y) = sd(X)sd(Y)Cor(X, Y)
2) Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z)
3) Cov(a*X, b*Y) = abCov(X, Y)
Then I wrote Cor(W, Yhat) = Cor(W, a*X + b*Z) = Cov(W, a*X + b*Z)/(sd(W)*sd(a*X + b*Z)). I used (2) and (3) to break up the numerator and then (1) to get it into terms of correlations and standard deviations. I used variance rules to expand out the denominator.
i feel so... so... SOOOO ****... never in my life (until now) had i ever considered that cor(x,y) *sd(x) *sd(y) = cov(x,y)..... oh god, i really, really feel the need for a smack-my-forehead emoticon.... lol
@Dragan i think i'll need to go grab your book from the library yet one more time .... at this point i think i'm gonna just add it to my amazon wishlist, hehe...
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
Previously I thought that spunky question means
and are some sort of least-square estimators which is a function of . In this case it seems that it is very complicated.
But if spunky just mean and is just the true parameter which are constants, then it is much simpler and just use the identities as what Dason suggested will be enough
Tweet |