# Thread: Significance aside - correlation coefficient

1. ## Significance aside - correlation coefficient

Hi everyone:

It's been a while since I have posted here and I hope that I am posting in the right forum.

In my field, sometimes people like to order correlation coefficients from strongest to weakest to make statements about which relationships are most important. This is done for a variety of reasons, usually when time, budget, or sample size do not allow for regression analyses.

Recently, when I myself was ordering these coefficients from strongest to weakest I became nervous that this was not meaningful when sample sizes were different. What I want to know is:

1) As sample sizes get larger, is it more likely that you would observe a correlation coefficient (r) that is smaller (closer to 0)
2) Is that why a lower r (weaker relationship) results in lower p (less likely to occur by chance).

If for example I have a sample of 100 and an r=.50, is that fundamentally different than a sample of 1000 and an r=.50. Or are the two .50s actually the same, and the only thing that changes is the p value.

Thank you so much for answering. I am not a statistician, so layman responses are appreciated.

2. ## Re: Significance aside - correlation coefficient

All things being equal, as sample size increases the P-value will decrease, even when the correlation coefficient is the same. For example, if n=20 and r=0.10, the P-value may be > 0.05 (typical significance level), but if n=20,000 and r=0.10, the P-value might well be < 0.05. This isn't limited to correlation coefficients, P-values generally get smaller as sample sizes increase. This is why in some fields overemphasis of P-values is somewhat frowned upon.

3. ## Re: Significance aside - correlation coefficient

The formula for the p-value in pearson correlation is:

t = r*sqrt((n-2)/(1-r^2))

as you can see the n-value is in a numerator, thus the bigger the sample the bigger the product and t-value used to determine the p-value. Side note, the bigger the t-value the smaller the p-value

So r=0.6 and n=10 or n=100 result in a t-value of approximately 2.6 or 9.3, respectively.

In both cases the correlation is the exact same, however the larger sample has a small p-value. A larger sample size given no sampling errors (systematic errors) may be more representative of the population and less influenced by the inclusion of a randomly selected extreme value.

4. ## Re: Significance aside - correlation coefficient

Thanks for your reply. I understand that sample sizes decrease p values. But setting aside the p values to just look at the correlation coefficients themselves... assume that you are only dealing with significant correlations if it makes this clearer.... If I have an r=.50 for a sample of 100 and a r=.50 for a sample of 1000, are those two rs the same?

I mean, I realize that the larger sample would have a lower p value because it would be a rarer occurrence (right?), but despite that one is more likely than the other is the strength measure still the same?

Here's an example (I'm making up the numbers):

I have n = 100, r = .50, p = .03
I have n = 200 r = .48 p = .001

If I'm just putting the correlations for strongest to weakest in a table, does it still make sense that r=.50 is stronger than r=.48 (I use the term stronger in relative terms, I know this might not be a meaningful difference).

Thanks!

5. ## Re: Significance aside - correlation coefficient

If they are both coming from the same population and sampling technique, the larger sample should be more reliable in my opinion. Perhap you could use something like

Fisher weighted mean value of r, to rank them.

6. ## Re: Significance aside - correlation coefficient

Originally Posted by blue11
1) As sample sizes get larger, is it more likely that you would observe a correlation coefficient (r) that is smaller (closer to 0)
Not as such. I think you are asking here about the bias of the correlation coefficient. Actually, the correlation coefficient is slightly biased toward zero (when the true correlation is non-zero), and this bias is larger when the sample is smaller. (This is assuming bivariate normality). In other words, on average, the small-sample estimates are actually more conservative.

Admittedly this effect is small, so for practical purposes you can think of the coefficients as approximately unbiased.

If for example I have a sample of 100 and an r=.50, is that fundamentally different than a sample of 1000 and an r=.50.
Both coefficients will be approximately unbiased. However, their sampling distributions will have different variances.

What this means in a practical sense is this. Say you have a bunch of correlation coefficients, with half being from small samples, and the other half being from large samples. Imagine further that the true population correlations are actually exactly the same for all the correlations examined. Obviously the sample estimates will be different from the true population correlations. You estimate the sample correlations, and rank them by size.

Now all of the coefficients will be approximately unbiased. However, because the estimates based on small samples are more variable, you will find that largest and smallest correlations will tend to be produced by the small samples, whereas the midrange estimates will tend to be from the large samples.

So this might be something to keep in mind if you're using a rank process to pick only the largest coefficients. The coefficients themselves may be unbiased, but if your decision process is to select for further analysis only the very highest correlation coefficients from a ranked list, this decision process is biased in favour of selecting coefficients from small samples.

7. ## The Following User Says Thank You to CowboyBear For This Useful Post:

blue11 (08-15-2014)

8. ## Re: Significance aside - correlation coefficient

Thank you very much for this - this is exactly the info I was looking for. I think I had misremembered some information concerning the bias of the coefficient, and did not think at all about the variability of small samples in ranking the coefficients. I've shared this information with my colleagues, as it is typical practice to select the strongest correlations for further analysis.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts