View Full Version : standard deviation of multiple sample sets


shoes
02-07-2009, 08:55 PM
I'm trying to determine the standard deviation of multiple sample sets (measurements A, B, C taken on 3 different days), for which I know the means and standard deviations (but not the individual values). In a related thread I saw the advice to check out "pooled standard deviation" at http://en.wikipedia.org/wiki/Pooled_standard_deviation, but this doesn't seem to fit my case. The means of A, B, and C vary a bit (with a standard deviation of their means of, say, 0.1), while each individual standard deviation (sA, sB, sC) is pretty tight (say 0.01 - fyi, these standard deviations reflect errors in the measurement device). The wiki link gives a formula based only on sA, sB, and sC, which not surprisingly gives a low standard deviation for the whole population. I know this can't be right, since the means of A, B, and C have greater variability.
Any advice? Thanks!

Masteras
02-08-2009, 02:54 AM
the pooled variance is used in cases where you can assume that the variances do not differ statistically significantly. You say they differ. Have you tested your hypothesis using Levene's test for equality of variances?

Dragan
02-08-2009, 10:59 AM
I'm trying to determine the standard deviation of multiple sample sets .... Thanks!


Let me just ask the following for clarification of your problem. Are you suggesting that your scenario is this:

Let X={x1,x2,…,xN}, Y={y1,y2,…yN}, Z={z1,z2,…,zN} denote 3 data sets with known means and standard deviations (not necessarily with equal sample sizes).

Let A be the union of these data sets, i.e.
A ={x1,x2,…,xN,y1,y2,…yN,z1,z2,…,zN}.

Now, are you asking what is the mean and standard deviation of A when you don’t have the data but have only knowledge of the means and standard deviations of X, Y, and Z?...Is this scenario I describe correct?

shoes
02-09-2009, 12:28 AM
Yes, you have it right. I'd also say I wouldn't want to weight X, Y, and Z by the # of measurements of each, as you have effectively done in "A", but since each has the same # of measurements, this is moot. Put another way, consider I have 3 people, with 3 measurements of each one's height, and I'm given the mean value and standard deviation for each person. How does one calculate the standard deviation for the 3 people?
Thank you!

Dragan
02-09-2009, 09:35 AM
Yes, you have it right.
Thank you!

Okay, here is the formulae you need. This will give you the (exact) mean and variance as if you actually had the data. What you need to do is merge the data sets one by one using the results on the subsequent data set.

mean=[n1 /(n1+n2)]*Xbar1 + [n2 /(n1+n2)]*Xbar2

variance=[ n1^2*Var1 + n2^2*Var2 – n1*Var1 – n1*Var2 – n2*Var1 -
n2*Var2 + n1*n2*Var1 + n1*n2*Var2 +n1*n2*(Xbar1 – Xbar2)^2 ] / [ (n1+n2-1)*(n1+n2) ]

I’ll show an example for the means so you can get the idea on how to do this. This idea is the same for the variance (standard deviation).

Example: Suppose I have 3 data sets with:

Xbar1=5; Std.dev1.=2; Var1=4; n1=10
Xbar2=15; Std.dev2=3; Var2=9;n2=15
Xbar3=8; Std.dev3=5; Var3=25; n3=20

Now to get the mean of the 3 data sets apply the first two sets of statistics

mean(1,2) = [10 /(10+15)]*5 + [15 /(10+15)]*15 =11

Now, use this result as follows:

mean(1,2,3) = [25 /(25+20)]*11 + [20 /(25+20)]*8 = 9.66666.

Now just apply this idea using the formula for variance above.

Obviously, in the end just take the sqrt of the variance to get the standard deviation for the merged (3) sets of data.

BTW, this idea is completely general for k sets of data.

shoes
02-15-2009, 05:18 PM
[QUOTE=Dragan;19689]Okay, here is the formulae you need.

Thanks for the formula. This weights each mean (and standard deviation) the number of measurements of each, which is not exactly intuitive to me. For example, if I have 2 people, and A is 4' tall with 3 measurements, while B is 7' tall with 27 measurements, the mean height is:

mean(A,B) = [3/(27+3)]*4 + [27/(30)]*7 = 6.7 feet.

Very odd indeed, since I'd expect the mean to be at least near 5.5', but I'll take your word for it - perhaps an indication that one really should have equal numbers of measurements.

dervast
05-14-2010, 01:22 AM
Could you please provide how this "theory" is called? I want to do the same but I would like to study also on the theory first.

One more thing. Is it possible to have a more general equation that could be used for more parameters? I have something like 10 such populations so applying your equation 9 times is little bit time consuming.

Best Regards
Alex.


Okay, here is the formulae you need. This will give you the (exact) mean and variance as if you actually had the data. What you need to do is merge the data sets one by one using the results on the subsequent data set.

mean=[n1 /(n1+n2)]*Xbar1 + [n2 /(n1+n2)]*Xbar2

variance=[ n1^2*Var1 + n2^2*Var2 – n1*Var1 – n1*Var2 – n2*Var1 -
n2*Var2 + n1*n2*Var1 + n1*n2*Var2 +n1*n2*(Xbar1 – Xbar2)^2 ] / [ (n1+n2-1)*(n1+n2) ]

BGM
05-18-2010, 07:13 AM
Suppose in your data set you have total r groups and
there are sample size n_i for each group

Furthermore suppose you already got the
sample mean estimate
\bar{X_i} = \frac {\sum_{j=1}^{n_i}X_{ij}} {n_i}
and the sample variance estimate
\hat{\sigma}_i^2 = \frac {\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2}
{n_i - 1}
for the each group, i.e. i = 1, 2, ..., r

Then the pooled sample mean = \frac {\sum_{i=1}^r\sum_{j=1}^{n_i}X_{ij}} {\sum_{i=1}^rn_i}
= \frac {\sum_{i=1}^rn_i\bar{X_i}} {\sum_{i=1}^rn_i}
and the pooled sample variance = \frac {\sum_{i=1}^r
\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2} {\sum_{i=1}^r(n_i - 1)}
= \frac {\sum_{i=1}^r (n_i - 1)\hat{\sigma}_i^2}
{\sum_{i=1}^r(n_i - 1)}

It would be the same if you got the data in the form of the sufficient statistics
\sum_{j=1}^{n_i}X_{ij}, \sum_{j=1}^{n_i}X_{ij}^2 in each group i

dervast
05-19-2010, 03:31 AM
I would like to thank you for your reply. As masteras said before pooled statistics could only be used for samples that their variance does not differ too much (Actually How do you know that the variances do not differ too much to use this technique?)

In my case the mean value is the same and only variances change
here are some typical examples for my study
1) N(119,3)
2) N(119,12)
3) N(119,8)
4) N(119,30)

I would like to thank you in advance for your help

Best Regards
Alex.

Suppose in your data set you have total r groups and
there are sample size n_i for each group

Furthermore suppose you already got the
sample mean estimate
\bar{X_i} = \frac {\sum_{j=1}^{n_i}X_{ij}} {n_i}
and the sample variance estimate
\hat{\sigma}_i^2 = \frac {\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2}
{n_i - 1}
for the each group, i.e. i = 1, 2, ..., r

Then the pooled sample mean = \frac {\sum_{i=1}^r\sum_{j=1}^{n_i}X_{ij}} {\sum_{i=1}^rn_i}
= \frac {\sum_{i=1}^rn_i\bar{X_i}} {\sum_{i=1}^rn_i}
and the pooled sample variance = \frac {\sum_{i=1}^r
\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2} {\sum_{i=1}^r(n_i - 1)}
= \frac {\sum_{i=1}^r (n_i - 1)\hat{\sigma}_i^2}
{\sum_{i=1}^r(n_i - 1)}

It would be the same if you got the data in the form of the sufficient statistics
\sum_{j=1}^{n_i}X_{ij}, \sum_{j=1}^{n_i}X_{ij}^2 in each group i