+ Reply to Thread
Results 1 to 10 of 10

Thread: (Li'l) theoretical question about correlations in regression...

  1. #1
    TS Contributor
    Points: 22,448, Level: 93
    Level completed: 10%, Points required for next Level: 902
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    (Li'l) theoretical question about correlations in regression...




    hey y'all people!??! how're y'all doooin' huh? i hope everyone had the most awesomest holidays!!! heh.

    so... who else is feelin' the pinch this semester? oh man it's my last (course-intensive) semseter before finishin'up my master's (just waiting on the thesis heh) so i had to cram up 2 seminars on the stats dept and 2 courses on my home dept in education... in any case, best of lucks to everyone.

    so last friday i got asked a somewhat intersting question that i haven't been able to quite figure out yet... it goes like this:

    let's pretend that we have a regression that looks like Y_{1} = \beta _{0}+\beta_{1}X+\beta_{2}Z+\beta_{3}W+\epsilon. now, as it usually happens in these cases, these variables have certain correlations so that r_{XY}, r_{XZ}, r_{XW}, r_{YZ}, r_{YW}, ... and you know, all of those are not zero.

    say that i now have a reduced regression model that looks like Y_{2} = \beta _{0}+\beta_{1}X+\beta_{2}Z+\epsilon, so it's the same as the previous one but without one predictor, W.

    the question would then be:

    what would be the correlation between the ommitted predictor W and the predicted scores \widehat{Y_{2}} from the second, reduced model?

    i am having a little bit of a hard time because there are a few too many correlations and i think the algebra's gonna get somewhat complicated if i try to sort it out by re-expressing \widehat{Y_{2}} in terms of its correlation with X and Z ...

    oh god, i'm really hoping someone knows maybe a smart matrix algebra trick or some relationship (maybe through the reduced model's R^{2}) to simplify things before i kind of tackle this in full force...

    thanks to everyone!
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  2. #2
    TS Contributor
    Points: 22,410, Level: 93
    Level completed: 6%, Points required for next Level: 940

    Posts
    3,020
    Thanks
    12
    Thanked 565 Times in 537 Posts

    Re: (Li'l) theoretical question about correlations in regression...

    It seems that the problem is not related to the first regression model. Am I missing anything?

    Also do you mean something like \hat{\mathbf{Y}} = \mathbf{H}\mathbf{Y}, assuming you have n data. Or you are having other regression like the total least square?

    I have tried but not successful to simplify the thing under these assumptions.

  3. #3
    TS Contributor
    Points: 22,448, Level: 93
    Level completed: 10%, Points required for next Level: 902
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: (Li'l) theoretical question about correlations in regression...

    hello BGM... thank you very much for taking the time to look at this...

    let me re-express the question in terms of correlation matrices and see if it makes more sense.

    under the first model, where you have Y as dependent and X,Z and W as predictors you would have a 4 X 4 correlation matrix among all those, right? now, if you do a regression predicting Y from X and Z (without the W predictor) and consider \widehat{Y} as a new variable you could write up a new, 3 X 3 correlation matrix that would have:

    the correlation between Y and W (because you already know that from the original 4 X 4 correlation matrix from which we started)

    the correlation between Y and \widehat{Y} (because you can get that from the square root of the R^{2}, which you obtained from the previous regression predicting Y from X and Z)

    and then i'd just be missing the correlation between \widehat{Y} and the predictor W that was not included in the original regression...

    ... i know there are boundaries for that correlation but that's as far as i've got. and, just as you said, at some point my algebra became so extensive and confusing that i'm reaching out for help, hoping someone knows some smart trick or has an insight into what to do...
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  4. #4
    Devorador de queso
    Points: 95,940, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,937
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: (Li'l) theoretical question about correlations in regression...

    Y only matters to get the coefficients for the linear model. But Yhat is just a linear combination of X and Z once we have the parameters.

    Code: 
    n <- 10000
    X <- rnorm(n, 10, 3)
    Z <- X + runif(n, 2, 10)
    W <- .2*X - .7*Z + rnorm(n)
    
    a <- .2
    b <- 30
    
    # Y doesn't even matter!  yhat is just a linear combination
    # of X and Z.  Note I omit an intercept but it doesn't matter for correlation.
    yhat <- a*X + b*Z
    
    
    # cXY stands for the correlation between X and Y
    cwx <- cor(W,X)
    cwz <- cor(W,Z)
    cxz <- cor(X,Z)
    sx <- sd(X)
    sz <- sd(Z)
    
    # All in terms of correlations and variances of X, Z, W
    (a*cwx*sx + b*cwz*sz)/sqrt(a^2*sx^2 + b^2*sz^2 + 2*a*b*cxz*sx*sz)
    
    # Or without the simplified names...
    (a*cor(W,X)*sd(X) + b*cor(W,Z)*sd(Z))/sqrt(a^2*var(X) + b^2*var(Z) + 2*a*b*cor(X,Z)*sd(X)*sd(Z))
    
    # and it matches...
    cor(W, yhat)
    Note that we would need to replace 'a' and 'b' with the coefficients from the linear model Y ~ b0 + b1*x + b2*z...

    Note that all I did was replace Yhat with a*X + b*Z and then used definitions and rules about manipulating covariances.

    Also now that I think about it... it would be a lot easier to do this with matrix manipulations. But I'm too lazy to work that out right now.

  5. The Following User Says Thank You to Dason For This Useful Post:

    spunky (01-08-2012)

  6. #5
    TS Contributor
    Points: 22,448, Level: 93
    Level completed: 10%, Points required for next Level: 902
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: (Li'l) theoretical question about correlations in regression...

    well... that is kind of what i wanted to ask you... so where did you start with the replacement and the covariance rules? like i see it works and i believe you... but how did you get to that final part where the product of correlations and regression coefficients just gets you that correlation... i've spent a good chunk of yesterday trying to tackle this problem and...oh wow!
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  7. #6
    Devorador de queso
    Points: 95,940, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,937
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: (Li'l) theoretical question about correlations in regression...

    I just used a few facts.

    1) Cor(X, Y) = Cov(X, Y)/(sd(X)sd(Y)) implies that Cov(X,Y) = sd(X)sd(Y)Cor(X, Y)

    2) Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z)

    3) Cov(a*X, b*Y) = abCov(X, Y)

    Then I wrote Cor(W, Yhat) = Cor(W, a*X + b*Z) = Cov(W, a*X + b*Z)/(sd(W)*sd(a*X + b*Z)). I used (2) and (3) to break up the numerator and then (1) to get it into terms of correlations and standard deviations. I used variance rules to expand out the denominator.

  8. #7
    Super Moderator
    Points: 13,151, Level: 74
    Level completed: 76%, Points required for next Level: 99
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    2,014
    Thanks
    0
    Thanked 223 Times in 192 Posts

    Re: (Li'l) theoretical question about correlations in regression...

    Quote Originally Posted by spunky View Post
    well... that is kind of what i wanted to ask you... so where did you start with the replacement and the covariance rules? like i see it works and i believe you... but how did you get to that final part where the product of correlations and regression coefficients just gets you that correlation... i've spent a good chunk of yesterday trying to tackle this problem and...oh wow!

    Spunky: Look at Equation (4.7) on page 90 in my book. I think you can get your answer by using that equation and multiplying the result by the value of R for the reduced model.

  9. #8
    TS Contributor
    Points: 22,448, Level: 93
    Level completed: 10%, Points required for next Level: 902
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: (Li'l) theoretical question about correlations in regression...

    i feel so... so... SOOOO ****... never in my life (until now) had i ever considered that cor(x,y) *sd(x) *sd(y) = cov(x,y)..... oh god, i really, really feel the need for a smack-my-forehead emoticon.... lol

    @Dragan i think i'll need to go grab your book from the library yet one more time .... at this point i think i'm gonna just add it to my amazon wishlist, hehe...
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  10. #9
    TS Contributor
    Points: 22,410, Level: 93
    Level completed: 6%, Points required for next Level: 940

    Posts
    3,020
    Thanks
    12
    Thanked 565 Times in 537 Posts

    Re: (Li'l) theoretical question about correlations in regression...

    Previously I thought that spunky question means

    \hat{Y}_2 = \hat{\beta}_0 + \hat{\beta}_1 X + \hat{\beta}_2 Z

    and (\hat{\beta}_0, \hat{\beta}_1, \hat{\beta}_2) are some sort of least-square estimators which is a function of X, Z. In this case it seems that it is very complicated.

    But if spunky just mean \hat{Y}_2 = \beta_0 + \beta_1 X + \beta_2 Z and (\beta_0, \beta_1, \beta_2) is just the true parameter which are constants, then it is much simpler and just use the identities as what Dason suggested will be enough

  11. #10
    Super Moderator
    Points: 13,151, Level: 74
    Level completed: 76%, Points required for next Level: 99
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    2,014
    Thanks
    0
    Thanked 223 Times in 192 Posts

    Re: (Li'l) theoretical question about correlations in regression...


    Quote Originally Posted by BGM View Post
    Previously I thought that spunky question means

    \hat{Y}_2 = \hat{\beta}_0 + \hat{\beta}_1 X + \hat{\beta}_2 Z

    and (\hat{\beta}_0, \hat{\beta}_1, \hat{\beta}_2) are some sort of least-square estimators which is a function of X, Z. In this case it seems that it is very complicated.

    But if spunky just mean \hat{Y}_2 = \beta_0 + \beta_1 X + \beta_2 Z and (\beta_0, \beta_1, \beta_2) is just the true parameter which are constants, then it is much simpler and just use the identities as what Dason suggested will be enough

    Actually, there's a method that is much easier to compute the answer to Spunky's question than what Dason suggeted, BGM.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats