+ Reply to Thread
Results 1 to 7 of 7

Thread: Significance aside - correlation coefficient

  1. #1
    Points: 2,412, Level: 29
    Level completed: 75%, Points required for next Level: 38

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Significance aside - correlation coefficient




    Hi everyone:

    It's been a while since I have posted here and I hope that I am posting in the right forum.

    In my field, sometimes people like to order correlation coefficients from strongest to weakest to make statements about which relationships are most important. This is done for a variety of reasons, usually when time, budget, or sample size do not allow for regression analyses.

    Recently, when I myself was ordering these coefficients from strongest to weakest I became nervous that this was not meaningful when sample sizes were different. What I want to know is:

    1) As sample sizes get larger, is it more likely that you would observe a correlation coefficient (r) that is smaller (closer to 0)
    2) Is that why a lower r (weaker relationship) results in lower p (less likely to occur by chance).

    If for example I have a sample of 100 and an r=.50, is that fundamentally different than a sample of 1000 and an r=.50. Or are the two .50s actually the same, and the only thing that changes is the p value.

    Thank you so much for answering. I am not a statistician, so layman responses are appreciated.

  2. #2
    Points: 3,730, Level: 38
    Level completed: 54%, Points required for next Level: 70

    Posts
    155
    Thanks
    7
    Thanked 30 Times in 29 Posts

    Re: Significance aside - correlation coefficient

    All things being equal, as sample size increases the P-value will decrease, even when the correlation coefficient is the same. For example, if n=20 and r=0.10, the P-value may be > 0.05 (typical significance level), but if n=20,000 and r=0.10, the P-value might well be < 0.05. This isn't limited to correlation coefficients, P-values generally get smaller as sample sizes increase. This is why in some fields overemphasis of P-values is somewhat frowned upon.

  3. #3
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Significance aside - correlation coefficient

    The formula for the p-value in pearson correlation is:

    t = r*sqrt((n-2)/(1-r^2))

    as you can see the n-value is in a numerator, thus the bigger the sample the bigger the product and t-value used to determine the p-value. Side note, the bigger the t-value the smaller the p-value

    So r=0.6 and n=10 or n=100 result in a t-value of approximately 2.6 or 9.3, respectively.

    In both cases the correlation is the exact same, however the larger sample has a small p-value. A larger sample size given no sampling errors (systematic errors) may be more representative of the population and less influenced by the inclusion of a randomly selected extreme value.
    Stop cowardice, ban guns!

  4. #4
    Points: 2,412, Level: 29
    Level completed: 75%, Points required for next Level: 38

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Significance aside - correlation coefficient

    Thanks for your reply. I understand that sample sizes decrease p values. But setting aside the p values to just look at the correlation coefficients themselves... assume that you are only dealing with significant correlations if it makes this clearer.... If I have an r=.50 for a sample of 100 and a r=.50 for a sample of 1000, are those two rs the same?

    I mean, I realize that the larger sample would have a lower p value because it would be a rarer occurrence (right?), but despite that one is more likely than the other is the strength measure still the same?

    Here's an example (I'm making up the numbers):

    I have n = 100, r = .50, p = .03
    I have n = 200 r = .48 p = .001

    If I'm just putting the correlations for strongest to weakest in a table, does it still make sense that r=.50 is stronger than r=.48 (I use the term stronger in relative terms, I know this might not be a meaningful difference).

    Thanks!

  5. #5
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Significance aside - correlation coefficient

    If they are both coming from the same population and sampling technique, the larger sample should be more reliable in my opinion. Perhap you could use something like

    Fisher weighted mean value of r, to rank them.
    Stop cowardice, ban guns!

  6. #6
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Significance aside - correlation coefficient

    Quote Originally Posted by blue11 View Post
    1) As sample sizes get larger, is it more likely that you would observe a correlation coefficient (r) that is smaller (closer to 0)
    Not as such. I think you are asking here about the bias of the correlation coefficient. Actually, the correlation coefficient is slightly biased toward zero (when the true correlation is non-zero), and this bias is larger when the sample is smaller. (This is assuming bivariate normality). In other words, on average, the small-sample estimates are actually more conservative.

    Admittedly this effect is small, so for practical purposes you can think of the coefficients as approximately unbiased.

    If for example I have a sample of 100 and an r=.50, is that fundamentally different than a sample of 1000 and an r=.50.
    Both coefficients will be approximately unbiased. However, their sampling distributions will have different variances.

    What this means in a practical sense is this. Say you have a bunch of correlation coefficients, with half being from small samples, and the other half being from large samples. Imagine further that the true population correlations are actually exactly the same for all the correlations examined. Obviously the sample estimates will be different from the true population correlations. You estimate the sample correlations, and rank them by size.

    Now all of the coefficients will be approximately unbiased. However, because the estimates based on small samples are more variable, you will find that largest and smallest correlations will tend to be produced by the small samples, whereas the midrange estimates will tend to be from the large samples.

    So this might be something to keep in mind if you're using a rank process to pick only the largest coefficients. The coefficients themselves may be unbiased, but if your decision process is to select for further analysis only the very highest correlation coefficients from a ranked list, this decision process is biased in favour of selecting coefficients from small samples.

  7. The Following User Says Thank You to CowboyBear For This Useful Post:

    blue11 (08-15-2014)

  8. #7
    Points: 2,412, Level: 29
    Level completed: 75%, Points required for next Level: 38

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Significance aside - correlation coefficient


    Thank you very much for this - this is exactly the info I was looking for. I think I had misremembered some information concerning the bias of the coefficient, and did not think at all about the variability of small samples in ranking the coefficients. I've shared this information with my colleagues, as it is typical practice to select the strongest correlations for further analysis.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats