An average of standard deviations?

#1
Hello,
My apologies if this isn't the correct forum for this, but I'm trying to analyze some data and can't figure out how to obtain an accurate standard deviation.

Basically, I have several samples spanning the course of a month, each with their own standard deviation (n=10 for each) and I want to calculate a monthly mean and standard deviation. The monthly mean seems fairly straight-forward (average the means?) but the standard deviation is less intuitive for me. I don't feel like averaging the standard deviations will accurately represent the data. Any advice how how to handle this problem? A simplified version of my data is below:

Month: January
Week 1 Mean: 67.3 Std. Dev: 0.8
Week 2 Mean: 80.5 Std. Dev: 0.6
Week 3 Mean: 82.4 Std. Dev: 0.8

What formula should I use to calculate the actual standard deviation for the entire month?

Thank you!
 
#2
You might need to treat all samples together as a single sample.
The average will be the average of the individual samples as long as the n is the same for all, but the new mean will need to be used to calculate the variance against all individual observations.
 
#3
Thank you for your response Mechnik. That had crossed my mind, but with some of my data I only have the mean and standard deviation that was spit out by an analytical instrument, so I wouldn't know the values of each sample.

Does a formula for the purposes of averaging standard deviations exist? My search all over this forum and the internet have been unsuccessful.
 

Dragan

Super Moderator
#4
Thank you for your response Mechnik. That had crossed my mind, but with some of my data I only have the mean and standard deviation that was spit out by an analytical instrument, so I wouldn't know the values of each sample.

Does a formula for the purposes of averaging standard deviations exist? My search all over this forum and the internet have been unsuccessful.
Well, yes. Why don't you just use the square root of the pooled (or weighted) variances.

With equal samples size, which is what you have, the standard deviation you are looking for is: Sqrt [ (.64 + .36 + .64) / 3 ] = 0.739369.

I think this should do.
 
#5
That's what I was looking for. Thank you so much!

For something so simple I'm surprised it wasn't easier to find. I really appreciate your help. :)
 
Last edited:
#6
Well, yes. Why don't you just use the square root of the pooled (or weighted) variances.

With equal samples size, which is what you have, the standard deviation you are looking for is: Sqrt [ (.64 + .36 + .64) / 3 ] = 0.739369.

I think this should do.
If "use the square root of the pooled variances" is true what is wrong with the example below:
> xx<-c(2,3,4)
> var(xx)
[1] 1
> sd(xx)
[1] 1
> mean(xx)
[1] 3
> yy<-c(9,10,11)
> mean(yy)
[1] 10
> var(yy)
[1] 1
> zz<-c(2,3,4,9,10,11)
> mean(zz)
[1] 6.5
> var(zz)
[1] 15.5
> sd(zz)
[1] 3.937004
 

Dragan

Super Moderator
#7
If "use the square root of the pooled variances" is true what is wrong with the example below:
> xx<-c(2,3,4)
> var(xx)
[1] 1
> sd(xx)
[1] 1
> mean(xx)
[1] 3
> yy<-c(9,10,11)
> mean(yy)
[1] 10
> var(yy)
[1] 1
> zz<-c(2,3,4,9,10,11)
> mean(zz)
[1] 6.5
> var(zz)
[1] 15.5
> sd(zz)
[1] 3.937004

What you are providing is the variance for the combined data set (S^2) ---- which is not an average of the two separate variances (1, 1).

In fact, you really don't even need the data to obtain your result (S^2=15.5). All that is needed are the two sample sizes (3, 3), the two means (3, 10), and the two variances (1, 1) of the indivdual data sets and then the variance for the combined data (Variance = 15.5) can be obtained as:

\( s^{2}=\frac{n_{x}^{2}s_{x}^{2}+n_{y}^{2}s_{y}^{2}-n_{y}s_{x}^{2}-n_{y}s_{y}^{2}-n_{x}s_{x}^{2}-n_{x}s_{y}^{2}+n_{y}n_{x}s_{x}^{2}+n_{y}n_{x}s_{y}^{2}+n_{x}n_{y}\left ( \bar{X}-\bar{Y} \right )^{2}}{\left (n_{x}+n_{y}-1 \right )\left ( n_{x}+n_{y} \right )} \)

where you can see in the far right-hand side of the numerator how the square of the difference between the two means will play a role in the computation of the variance for the combined data.

Now, if you want the variance for three (or more) combined data sets, then all you need to do is just keep applying the equation I provided above separately as you combine the data sets one at a time...e.g. combine 1 & 2 and then (1, 2) & 3 ....and so on.... for any number of data sets. Obviously, as you progress you would also need the means of combined data -- which are, of course, easy to obtain.

I would also note that the original poster is asking two different questions. The first question asks for an average of the three variances (or standard deviations) and the second question asks for the variance (or standard deviation) for the combined data....these are different calculations. My first post addresses the first question and this (second) post addresses the second question.
 
#8
What you are providing is the variance for the combined data set (S^2) ---- which is not an average of the two separate variances (1, 1).

In fact, you really don't even need the data to obtain your result (S^2=15.5). All that is needed are the two sample sizes (3, 3), the two means (3, 10), and the two variances (1, 1) of the indivdual data sets and then the variance for the combined data (Variance = 15.5) can be obtained as:

\( s^{2}=\frac{n_{x}^{2}s_{x}^{2}+n_{y}^{2}s_{y}^{2}-n_{y}s_{x}^{2}-n_{y}s_{y}^{2}-n_{x}s_{x}^{2}-n_{x}s_{y}^{2}+n_{y}n_{x}s_{x}^{2}+n_{y}n_{x}s_{y}^{2}+n_{x}n_{y}\left ( \bar{X}-\bar{Y} \right )^{2}}{\left (n_{x}+n_{y}-1 \right )\left ( n_{x}+n_{y} \right )} \)

where you can see in the far right-hand side of the numerator how the square of the difference between the two means will play a role in the computation of the variance for the combined data.

Now, if you want the variance for three (or more) combined data sets, then all you need to do is just keep applying the equation I provided above separately as you combine the data sets one at a time...e.g. combine 1 & 2 and then (1, 2) & 3 ....and so on.... for any number of data sets. Obviously, as you progress you would also need the means of combined data -- which are, of course, easy to obtain.

I would also note that the original poster is asking two different questions. The first question asks for an average of the three variances (or standard deviations) and the second question asks for the variance (or standard deviation) for the combined data....these are different calculations. My first post addresses the first question and this (second) post addresses the second question.
Very good.
I understand to answer "What formula should I use to calculate the actual standard deviation for the entire month?" one would apply the combined variance formula above in succession to every sample to be included.
Edit: Thank you again for the formula, I tried it and it worked as intended.
I found another formula online that did not work:
http://www.emathzone.com/tutorials/basic-statistics/combined-variance.html

I wonder if I am not applying it correctly or it is in error.
 
Last edited:

Dragan

Super Moderator
#9
Very good.
I understand to answer "What formula should I use to calculate the actual standard deviation for the entire month?" one would apply the combined variance formula above in succession to every sample to be included.
Edit: Thank you again for the formula, I tried it and it worked as intended.




I wonder if I am not applying it correctly or it is in error.


Right, the expressions are off a bit because the variances are not weighted correctly and the denominator should be (n1 + n2 -1).

It's perhaps easiest to consider the first expression -- the one right above the one you posted. The first two terms (variances) in the numerator should be multiplied by (n1 - 1) and (n2 -1) and then you will get the correct result.
 
#10
Dragan,

Many thanks for sharing this solution. I've implemented in Excel for a series of n/mean/sd values given by one of our instruments.
Is there a reference in the literature for your formula or the basis from which you derived it?
This thread seems to be the only place that I've found an appropriate solution.
 

Dragan

Super Moderator
#11
Dragan,

Many thanks for sharing this solution. I've implemented in Excel for a series of n/mean/sd values given by one of our instruments.
Is there a reference in the literature for your formula or the basis from which you derived it?
This thread seems to be the only place that I've found an appropriate solution.

Yes, of course. The citation would be:

Headrick, T. C. (2010). Statistical Simulation: Power Method Polynomials and other Transformations. Boca Raton, FL: Chapman & Hall/CRC.

See page 137, Equation 5.38.
 
#14
Hi Dragan,.

I have the same question of averaging standard deviations, however my samples sizes are not same. So what should I be doing?
Thanks
Biobee
Well, yes. Why don't you just use the square root of the pooled (or weighted) variances.

With equal samples size, which is what you have, the standard deviation you are looking for is: Sqrt [ (.64 + .36 + .64) / 3 ] = 0.739369.

I think this should do.
 

Dragan

Super Moderator
#15
Hi Dragan,.

I have the same question of averaging standard deviations, however my samples sizes are not same. So what should I be doing?
Thanks
Biobee
Just weight the variances by their respective samples sizes before taking the square root---like this:

\( s_{w}^{2}=\frac{\left ( n_{1} -1\right )s_{1}^{2}+\left (n _{2}-1 \right )s_{2}^{2}+\cdots +\left ( n_{k} -1\right )s_{k}^{2}}{n_{1}+n_{2}+\cdots +n_{k}-k} \).
 
#16
Hi,


i run the simulation to collect some data and then calculate its standard deviation.
then repeat the simulation 4 times again and calculate its standard deviation.
can i apply the suggested formula above to calculate the average standard deviation? (no of sample is the same)
 
#17
Hi,

I have a set of data that consists of annual population estimates and standard errors of the estimates, over a total of 8 years. The relative standard errors for each year are around 25%. The relative standard error for the sum of the population estimates over all of the years is about 11%

The equation provided by Dragan for weighting the annual variances to get an average standard error was helpful. The relative average standard error was about 25%, as expected.

I would like to determine an annual estimated population and error that represents my data. I could report the average standard error. But somehow that does not feel right, since surely an annual estimate based on a multiyear data set would have a smaller standard error than that based on one year alone?

My instincts sometime mislead me when using statistics, however ....
 
#18
Yes, thanks. That does look familiar!

A quick question: This formula appears appropriate when the samples are independent; that is, the n increases with every combination of samples undertaken. In contrast, I have the means and SDs for data sets (but not the individual data themselves) obtained for repeated measures (i.e., multiple measurements taken from the same sample). In this case, the n does not increase with every combination of data sets, right?

So, I'd like to collapse these measures and obtain the grand mean and the appropriate SD. The mean is easy to obtain but is there an alternative formula to the one above for obtaining the SDs?

Thank you in advance for any help you can provide!

Dave
 

tml

New Member
#19
Hi

I have found this information very useful but have a few queries.

I have analysed a number of genes in a number of muscles at various levels of the mouse limb in mice of various ages (n=3 mice). I have data that I'd like to 'pool'. I have completed all the calculations for pSD and pSEM (taking into account that n is not always equal - see bottom of post). My problem is that SDs for the means that I'd like to get an overall mean for, are not always similar. Can I still use and plot 'pooled SD/pooled SEM' on my graphs against these overall means. I have provided an example below
MUSCLE 1 MUSCLE 2 MUSCLE 3 MUSCLE 4 MEAN SD SEM SUM
Mouse 1 26.42 0.00 19.23 28.00 18.41 12.85 6.43
Mouse 2 37.88 13.33 20.00 25.32 24.13 10.39 5.20
Mouse 3 15.48 18.18 14.29 15.00 15.74 1.70 0.85


pSD =sqrt((n1-1)*(S1^2)+(n2-2)*(S2^2)+(ni-1)*(Si^2)/((n1+n2+ni)-k)

pSEM=pSD*sqrt[((S1^2)+(S2^2)+(Si^2))/k]

(k=number of mice; n=number of muscles)
 

tml

New Member
#20
Sorry, I sent the last message accidently, before I had finished. Here is the table of data. I would very much like help from an expert who could advise on if it is okay to poolSD/SEM from all mice (below) given that the SD for mouse 3 is so different. I want to plot the mean (of all three mice combined) with error bars

Muscle1 Muscle2 Muscle3 Muscle4 Mean SD SEM
Mouse 1 26.42 0.00 19.23 28.00 18.41 12.85 6.43
Mouse 2 37.88 13.33 20.00 25.32 24.13 10.39 5.20
Mouse 3 15.48 18.18 14.29 15.00 15.74 1.70 0.85

Thankyou!