# Meta Analysis - Correlation Coefficients with only M and SD

#### cabrown89

##### New Member
Hi Folks,

I'm am currently assisting on a Meta-Analysis and we are currently faced with correlating variables without the raw scores. The question has 2 parts, however I feel like the answer to one will likely relate to the other.

I want to know how to calculate the correlation coefficient between two variables when only the MEAN and STANDARD DEVIATION is available. For example, for a sample of 20 people:

MEAN(SD)
Var 1 time 1: 2.3 (1)
Var 1 time 2: 3.4 (2)
Var 1 time 3: 3 (1.4)

Var 2 time 1: 4.5 (.4)
Var 2 time 2: 3.5 (.5)
Var 2 time 3: 3.4 (.7)

I also would like to know if It is possible to calculate the correlation between two variables with only one time point across a MULTITUDE OF STUDIES. For example:

Study 1 n=20
Var 1: 2.3 (2)
Var 2: 3.4 (1)

Study 2 n=20
Var 1: 4.5 (1.5)
Var 2: 6.4 (.5)

Study 3 n=20
Var 1: 1.4 (.5)
Var 2: 3.4 (.5)

Thanks in advance for any help

#### hlsmith

##### Less is more. Stay pure. Stay poor.
For second question are you trying to correlate all of the var1s or all of the var1s to all of the same study specific var2s. Add a little more detail please!

#### spunky

##### Can't make spagetti
I want to know how to calculate the correlation coefficient between two variables when only the MEAN and STANDARD DEVIATION is available.
Without any info regarding the covariance of said variables or some further assumptions, I don't think it is possible to obtain a correlation coefficient only from the info you provided

Do you have any other information like regression coefficients or something like that that we could transform into a correlation coefficient?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I think you may be able to convert some of it but they need to clarify objective.

#### spunky

##### Can't make spagetti
I think you may be able to convert some of it but they need to clarify objective.
you mean *JUST* from two means and two standard deviations? how? particularly if you have like no information on their covariance? that's why i said we either needed further assumptions or info that would indirectly tell us something about their covariance

#### cabrown89

##### New Member
Hello,

Thank you for your replies. Essentially in the study we are looking at the relation between two Hormones (let's call them Hormone A and Hormone B).

In my first question we would like to have one correlation for Hormone A and Hormone B across all the time points.

In my second question we would like to have one correlation from Hormone A and Hormone B across different studies.

For both questions there are clearly many "simple" correlations that are possible, but is there somehow a way to have an aggregate correlation with only M and SD (since we do do not have access to raw scores).

Let me know if you need any more details

We might be on a bit of a wild goose chase, but we just want to make sure we exhaust all options before we move on to something new.

Thanks again!

#### spunky

##### Can't make spagetti
For both questions there are clearly many "simple" correlations that are possible
care to elaborate more on this, please?

#### cabrown89

##### New Member
Perhaps that was superfluous information, but in any case, for example:

Question #1: Var 1 Time 1 correlated with Var 2 Time 1
Question #2: Var 1 Study 1 correlated with Var 1 Study 2

#### Dason

It would be impossible to get a correlation estimate for those types of questions for the reasons Spunky mentioned. You have no information about the covariance and pretty much nothing else that could be used to estimate it so... sorry?

#### spunky

##### Can't make spagetti
Question #1: Var 1 Time 1 correlated with Var 2 Time 1
Question #2: Var 1 Study 1 correlated with Var 1 Study 2
that's why i keep on referring you to having any info about the covariance between your Vars. just having means and standard deviations is not gonna cut it (unless you can make some further assumptions or maybe have some other info that would allow us to work backwards from it and find a correlation coefficient).

as far as i know, with only means and standard deviations the most you'll be able to come up with are bounds for the correlations. like "Var1 and Var2 can be correlated *at the most* at ##some number##". but anything below ##some number## would be possible.

#### Dason

as far as i know, with only means and standard deviations the most you'll be able to come up with are bounds for the correlations. like "Var1 and Var2 can be correlated *at the most* at ##some number##". but anything below ##some number## would be possible.
With only the means and the standard deviations I don't think the bounds will make anybody happy: it's from -1 to 1. I'm thinking you were thinking about putting bounds on the covariance?

#### spunky

##### Can't make spagetti
I'm thinking you were thinking about putting bounds on the covariance?
yup... sorry, on the covariance. with |cov(x1,x2)|< or = sd(x1)*sd(x2

vacation brain is ON right now

#### hlsmith

##### Less is more. Stay pure. Stay poor.
d = (Mean var1 - Mean var2) / (Within-groups standard deviation) ;

d = standardized mean difference

r = d / SQ(d^2 + a), where a = (n1 + n2)^2 / n1n2 ;

r = correlation

Var 1 and Var 2 at time 1:

r = -0.84, probably some rounding error since I did most of this quickly and in my head.

#### Dason

d = (Mean var1 - Mean var2) / (Within-groups standard deviation) ;

d = standardized mean difference

r = d / SQ(d^2 + a), where a = (n1 + n2)^2 / n1n2 ;

r = correlation

Var 1 and Var 2 at time 1:

r = -0.84, probably some rounding error since I did most of this quickly and in my head.

Where are you pulling this formula from? Because I can easily generate data that meets all the given criteria and has a correlation of 0 - or 1 or whatever you want...

Code:
> library(MASS)
> dat <- as.data.frame(mvrnorm(20, c(2.3, 4.5), Sigma = matrix(c(1, 0, 0, 0.4^2), 2, 2), empirical = TRUE))
> colnames(dat) <- c("Var 1 Time 1", "Var 2 Time 1")
> dat
Var 1 Time 1 Var 2 Time 1
1     2.6289243     4.784575
2     1.7874852     4.201106
3     2.1826384     4.635094
4     2.2313700     4.655435
5     0.1770363     3.758098
6     0.9997098     4.555028
7     2.7994411     5.014820
8     2.1105130     4.822299
9     1.2844538     4.864666
10    2.7044514     5.163805
11    2.0732136     4.040286
12    1.5431430     4.967794
13    1.5694236     4.772083
14    2.9203259     4.573180
15    4.3932436     4.377488
16    3.6422051     3.898609
17    2.1430956     4.333454
18    1.9113480     4.209506
19    3.8909433     4.447260
20    3.0070349     3.925414
> colMeans(dat)
Var 1 Time 1 Var 2 Time 1
2.3          4.5
> apply(dat, 2, sd)
Var 1 Time 1 Var 2 Time 1
1.0          0.4
> var(dat)
Var 1 Time 1  Var 2 Time 1
Var 1 Time 1  1.000000e+00 -4.606433e-18
Var 2 Time 1 -4.606433e-18  1.600000e-01
> # Correlation is 0 between the two variables at time 1
> cor(dat)
Var 1 Time 1  Var 2 Time 1
Var 1 Time 1  1.000000e+00 -1.151608e-17
Var 2 Time 1 -1.151608e-17  1.000000e+00
The correlation is 0 between the two variables at time 1.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
This is a ubiquitous formula in most Meta-analysis text books (it comes with some assumptions). See:

Introduction to Meta-Analysis
-Borenstein, Hedges, Higgins, Rothstein
page 48

I agree it seems like something is missing.

#### Dason

There is definitely something missing. And I'd say it's dangerous to advocate the formula until the something that is missing is identified. I don't really do meta analysis but without knowing the missing pieces it would seem irresponsible to use that formula. I know a few people here use meta analysis... maybe they can shed some light...

#### spunky

##### Can't make spagetti
it looks a lot like the r-to-t transformation used to assess the significance of the correlation coefficient. maybe it comes from there?

#### Dason

Just did a quick internet search and got the following:

http://www.campbellcollaboration.org/escalc/html/EffectSizeCalculator-R3.php

Plug the numbers in and got -0.82.

These conversions for effect sizes (odds ratios, d, r), are commonly used in the meta-analysis area.
I don't think the assumptions of that apply here. I'm pretty sure that is assuming that the observations in both groups are independent. But at least if I'm reading the OP correctly there would be pairing between observations... So the assumptions aren't met.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
This is the stated assumption:

"In applying this conversion assume that a continuous variable was dichotomized to
create the treatment and control groups."