the pooled variance is used in cases where you can assume that the variances do not differ statistically significantly. You say they differ. Have you tested your hypothesis using Levene's test for equality of variances?
I'm trying to determine the standard deviation of multiple sample sets (measurements A, B, C taken on 3 different days), for which I know the means and standard deviations (but not the individual values). In a related thread I saw the advice to check out "pooled standard deviation" at http://en.wikipedia.org/wiki/Pooled_standard_deviation, but this doesn't seem to fit my case. The means of A, B, and C vary a bit (with a standard deviation of their means of, say, 0.1), while each individual standard deviation (sA, sB, sC) is pretty tight (say 0.01 - fyi, these standard deviations reflect errors in the measurement device). The wiki link gives a formula based only on sA, sB, and sC, which not surprisingly gives a low standard deviation for the whole population. I know this can't be right, since the means of A, B, and C have greater variability.
Any advice? Thanks!
the pooled variance is used in cases where you can assume that the variances do not differ statistically significantly. You say they differ. Have you tested your hypothesis using Levene's test for equality of variances?
Let me just ask the following for clarification of your problem. Are you suggesting that your scenario is this:
Let X={x1,x2, ,xN}, Y={y1,y2, yN}, Z={z1,z2, ,zN} denote 3 data sets with known means and standard deviations (not necessarily with equal sample sizes).
Let A be the union of these data sets, i.e.
A ={x1,x2, ,xN,y1,y2, yN,z1,z2, ,zN}.
Now, are you asking what is the mean and standard deviation of A when you dont have the data but have only knowledge of the means and standard deviations of X, Y, and Z?...Is this scenario I describe correct?
Yes, you have it right. I'd also say I wouldn't want to weight X, Y, and Z by the # of measurements of each, as you have effectively done in "A", but since each has the same # of measurements, this is moot. Put another way, consider I have 3 people, with 3 measurements of each one's height, and I'm given the mean value and standard deviation for each person. How does one calculate the standard deviation for the 3 people?
Thank you!
Okay, here is the formulae you need. This will give you the (exact) mean and variance as if you actually had the data. What you need to do is merge the data sets one by one using the results on the subsequent data set.
mean=[n1 /(n1+n2)]*Xbar1 + [n2 /(n1+n2)]*Xbar2
variance=[ n1^2*Var1 + n2^2*Var2 – n1*Var1 – n1*Var2 – n2*Var1 -
n2*Var2 + n1*n2*Var1 + n1*n2*Var2 +n1*n2*(Xbar1 – Xbar2)^2 ] / [ (n1+n2-1)*(n1+n2) ]
I’ll show an example for the means so you can get the idea on how to do this. This idea is the same for the variance (standard deviation).
Example: Suppose I have 3 data sets with:
Xbar1=5; Std.dev1.=2; Var1=4; n1=10
Xbar2=15; Std.dev2=3; Var2=9;n2=15
Xbar3=8; Std.dev3=5; Var3=25; n3=20
Now to get the mean of the 3 data sets apply the first two sets of statistics
mean(1,2) = [10 /(10+15)]*5 + [15 /(10+15)]*15 =11
Now, use this result as follows:
mean(1,2,3) = [25 /(25+20)]*11 + [20 /(25+20)]*8 = 9.66666.
Now just apply this idea using the formula for variance above.
Obviously, in the end just take the sqrt of the variance to get the standard deviation for the merged (3) sets of data.
BTW, this idea is completely general for k sets of data.
Last edited by Dragan; 02-09-2009 at 04:42 PM. Reason: correction
abakshi (12-10-2014), Jo87 (04-30-2012), splictionary (04-26-2012)
[QUOTE=Dragan;19689]Okay, here is the formulae you need.
Thanks for the formula. This weights each mean (and standard deviation) the number of measurements of each, which is not exactly intuitive to me. For example, if I have 2 people, and A is 4' tall with 3 measurements, while B is 7' tall with 27 measurements, the mean height is:
mean(A,B) = [3/(27+3)]*4 + [27/(30)]*7 = 6.7 feet.
Very odd indeed, since I'd expect the mean to be at least near 5.5', but I'll take your word for it - perhaps an indication that one really should have equal numbers of measurements.
Last edited by shoes; 02-15-2009 at 06:21 PM. Reason: clarification
Could you please provide how this "theory" is called? I want to do the same but I would like to study also on the theory first.
One more thing. Is it possible to have a more general equation that could be used for more parameters? I have something like 10 such populations so applying your equation 9 times is little bit time consuming.
Best Regards
Alex.
Suppose in your data set you have total groups and
there are sample size for each group
Furthermore suppose you already got the
sample mean estimate
and the sample variance estimate
for the each group, i.e.
Then the pooled sample mean
and the pooled sample variance
It would be the same if you got the data in the form of the sufficient statistics
in each group i
I would like to thank you for your reply. As masteras said before pooled statistics could only be used for samples that their variance does not differ too much (Actually How do you know that the variances do not differ too much to use this technique?)
In my case the mean value is the same and only variances change
here are some typical examples for my study
1) N(119,3)
2) N(119,12)
3) N(119,8)
4) N(119,30)
I would like to thank you in advance for your help
Best Regards
Alex.
Could someone kindly respond to dervast's question above "Actually How do you know that the variances do not differ too much to use this technique?"
If the variances differ too much, what technique should we be using?
I am also interested to know.
Thanks!
No, I do not agree with shoes' comments here. Statistics is a branch of mathematics and always dealing honestly with data. If we measure A for 3 times we have 3 pieces of data. Since we have 3 pieces of data to enter into statistical process I could not accept that those 3 pieces of data have only 1 weight unit. I would think your example should be the same as you have 3 persons of size A and 27 persons of size B. So what dragon said was reasonable in this scenario. His idea was not very odd.
Download or read my document at md.rmutk.ac.th/file.php/471/to-my-students.pdf or here if the site does not allow you.
Last edited by wjt; 03-11-2012 at 11:36 AM. Reason: Gramma correction
Jo87 (04-30-2012), splictionary (04-26-2012)
Thank you Dragan, does this method have a name? Have been trying to work out this problem for a few days now.
Thank you wjy for the PDF, i can now call this "joint standard deviation"
Last edited by splictionary; 04-26-2012 at 03:37 PM. Reason: Realization
Soooo happy I found this discussion: I've had exactly the same problem a few days ago and couldn't find a solution. The replies of Dragan and wjt are very helpful.
I would also be very interested in a general equation (and a name of this method) to calculate the variance (as shown by Dragan). As I understand, the equation presented by BGM isn't the same since variance between the mean values is not considered?!?
Cheers,
Jo
Hi.
I wonder if you guys could help. I'm a scientist and looking to present my research.
I'm measuring two linked values - substance production and cell number, and present research as a value of substance produced per cell - I have experimental data where I have 25 observations each for several different conditions, measuring amount of substance produced and number of cells per reaction (this varies depending on the condition, so not constant), each giving a mean and standard deviation - I then take mean values from each set of observations to give mean substance production / cell. However, I would also like to be able to present the standard deviation for the substance/cell value - I'm sure there must be an equation to let me combine the standard deviations of substance production and cell number to give an overall standard deviation, but don't know what this is! Can anyone help?
Many thanks.
Hi bs0srj,
Just for clarification purposes: You grew n batches of cells each under different conditions with 25 observations for cell number and substance produced each. Subsequently, you took the mean and standard deviation for cell number and substance produced for each batch, and calculated the quotient to derive the mean substance produced per cell for each batch?!
If you want to calculate the standard deviation for this quotient you have to apply the rules of error propagation. For multiplication and division the rule is as follows:
If c = a * b, or c = \frac{a}{b}
then \frac{\sigma_{c}}{\left | c \right |} = \sqrt{\left( \frac{\sigma_{a}}{a}\right )^{2} + \left(\frac{\sigma_{b}}{b}\right )^{2}}
Also have a look here: http://en.wikipedia.org/wiki/Propagation_of_uncertainty
Hope that helps!
Tweet |