+ Reply to Thread
Results 1 to 4 of 4

Thread: Correlation with residuals of regression

  1. #1
    Points: 814, Level: 15
    Level completed: 14%, Points required for next Level: 86

    Posts
    3
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Correlation with residuals of regression




    Hi all,

    Let's say I have one DV (X) and 5 IV's, A, B, C, D and E. I'm specifically interested in the effect of E. I could perform a regression like this:

    X = x + (a*A) + (b*B) + (c*C) + (d*D) + (e*E)

    Alternatively, I could do:

    X = x + (a*A) + (b*B) + (c*C) + (d*D);
    save residuals (Xres);
    do a Pearson correlation between Xres and E.

    I would expect the correlation coefficient to be equal to the regression coefficient of E. But it's not: SPSS gives me a correlation coefficient that is higher (and more significant).

    How is this possible? My reasoning was: in both cases, you calculate how much of the variance in X is explained by E while controlling for A-D...

    Hope anyone can clarify!

    Thanks,

    AnLo

  2. #2
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Correlation with residuals of regression

    The results will match in a sense if you also perform this residualizing procedure on the predictor E (regressing E on the other covariates) and then look at the association of these two sets of residuals.

    I say "in a sense" because technically what will match here is that the slope from (1) the regression of Xres on Eres will match the slope associated with E in (2) the regression of X on all the predictors. (This result is known as the Frisch–Waugh–Lovell theorem.) Because the denominator degrees of freedom are slightly different in these two cases, the t-statistics and p-values will be very slightly different, but quite close.

    There will also be an equivalence between the simple correlation coefficient in (1) and the partial correlation coefficient in (2). Although again the p-value will differ slightly as mentioned above.

    There will not generally be any sort of equivalence between the correlation coefficient and the regression coefficient, as you thought there might be -- I'm not really clear on why you thought those two thing would be equal.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  3. The Following User Says Thank You to Jake For This Useful Post:

    AnLo (11-07-2016)

  4. #3
    Points: 814, Level: 15
    Level completed: 14%, Points required for next Level: 86

    Posts
    3
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Correlation with residuals of regression

    Hi Jake,

    Thank you so much for your reply. During my statistics course, I learned (at least, this how I remember it) that performing a Pearson correlation between A and B is basically the same as performing a simple linear regression of A on B. That's why I thought that the correlation and regression coefficient should be the same.

    But as I understand from your reply, even performing these analyses below would not yield the same results:

    1) X = x + (a*A) + (b*B) + (c*C) + (d*D) + (e*E)

    versus

    2) X = x + (a*A) + (b*B) + (c*C) + (d*D)
    3) Xres = x + (e*E)

    I still find this difficult to understand. The only difference in regression 2) compared to 1) is that the variance in X that can be explained by E is not accounted for yet. So then why wouldn't E in 3) have the same beta as it would in 1)?

    I looked up the Frisch–Waugh–Lovell theorem but it's a bit to mathematical for me... ;-) Could you maybe explain in words why my line of reasoning is not correct?

    Again, thank you for your time!

    Best,

    AnLo

  5. #4
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Correlation with residuals of regression


    Quote Originally Posted by AnLo View Post
    I still find this difficult to understand. The only difference in regression 2) compared to 1) is that the variance in X that can be explained by E is not accounted for yet. So then why wouldn't E in 3) have the same beta as it would in 1)?
    Because your first method tells you the effect of E controlling for all the other predictors, but in your second method the other predictors have not been controlled out (the simple effect of E implicitly also contains the effects of all the other predictors with which it is confounded).

    Your intuition would be right IFF the predictor E were completely uncorrelated with the set of other predictors, but only in that very special case.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  6. The Following User Says Thank You to Jake For This Useful Post:

    AnLo (11-10-2016)

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats