+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 16

Thread: standard deviation of multiple sample sets

  1. #1
    Points: 2,841, Level: 32
    Level completed: 61%, Points required for next Level: 59

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    standard deviation of multiple sample sets




    I'm trying to determine the standard deviation of multiple sample sets (measurements A, B, C taken on 3 different days), for which I know the means and standard deviations (but not the individual values). In a related thread I saw the advice to check out "pooled standard deviation" at http://en.wikipedia.org/wiki/Pooled_standard_deviation, but this doesn't seem to fit my case. The means of A, B, and C vary a bit (with a standard deviation of their means of, say, 0.1), while each individual standard deviation (sA, sB, sC) is pretty tight (say 0.01 - fyi, these standard deviations reflect errors in the measurement device). The wiki link gives a formula based only on sA, sB, and sC, which not surprisingly gives a low standard deviation for the whole population. I know this can't be right, since the means of A, B, and C have greater variability.
    Any advice? Thanks!

  2. #2
    TS Contributor
    Points: 8,362, Level: 61
    Level completed: 71%, Points required for next Level: 88

    Location
    Crete, Greece
    Posts
    717
    Thanks
    0
    Thanked 35 Times in 34 Posts
    the pooled variance is used in cases where you can assume that the variances do not differ statistically significantly. You say they differ. Have you tested your hypothesis using Levene's test for equality of variances?

  3. #3
    Super Moderator
    Points: 13,151, Level: 74
    Level completed: 76%, Points required for next Level: 99
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    2,014
    Thanks
    0
    Thanked 223 Times in 192 Posts
    Quote Originally Posted by shoes View Post
    I'm trying to determine the standard deviation of multiple sample sets .... Thanks!

    Let me just ask the following for clarification of your problem. Are you suggesting that your scenario is this:

    Let X={x1,x2,…,xN}, Y={y1,y2,…yN}, Z={z1,z2,…,zN} denote 3 data sets with known means and standard deviations (not necessarily with equal sample sizes).

    Let A be the union of these data sets, i.e.
    A ={x1,x2,…,xN,y1,y2,…yN,z1,z2,…,zN}.

    Now, are you asking what is the mean and standard deviation of A when you don’t have the data but have only knowledge of the means and standard deviations of X, Y, and Z?...Is this scenario I describe correct?

  4. #4
    Points: 2,841, Level: 32
    Level completed: 61%, Points required for next Level: 59

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Yes, you have it right. I'd also say I wouldn't want to weight X, Y, and Z by the # of measurements of each, as you have effectively done in "A", but since each has the same # of measurements, this is moot. Put another way, consider I have 3 people, with 3 measurements of each one's height, and I'm given the mean value and standard deviation for each person. How does one calculate the standard deviation for the 3 people?
    Thank you!

  5. #5
    Super Moderator
    Points: 13,151, Level: 74
    Level completed: 76%, Points required for next Level: 99
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    2,014
    Thanks
    0
    Thanked 223 Times in 192 Posts
    Quote Originally Posted by shoes View Post

    Yes, you have it right.
    Thank you!
    Okay, here is the formulae you need. This will give you the (exact) mean and variance as if you actually had the data. What you need to do is merge the data sets one by one using the results on the subsequent data set.

    mean=[n1 /(n1+n2)]*Xbar1 + [n2 /(n1+n2)]*Xbar2

    variance=[ n1^2*Var1 + n2^2*Var2 – n1*Var1 – n1*Var2 – n2*Var1 -
    n2*Var2 + n1*n2*Var1 + n1*n2*Var2 +n1*n2*(Xbar1 – Xbar2)^2 ] / [ (n1+n2-1)*(n1+n2) ]

    I’ll show an example for the means so you can get the idea on how to do this. This idea is the same for the variance (standard deviation).

    Example: Suppose I have 3 data sets with:

    Xbar1=5; Std.dev1.=2; Var1=4; n1=10
    Xbar2=15; Std.dev2=3; Var2=9;n2=15
    Xbar3=8; Std.dev3=5; Var3=25; n3=20

    Now to get the mean of the 3 data sets apply the first two sets of statistics

    mean(1,2) = [10 /(10+15)]*5 + [15 /(10+15)]*15 =11

    Now, use this result as follows:

    mean(1,2,3) = [25 /(25+20)]*11 + [20 /(25+20)]*8 = 9.66666.

    Now just apply this idea using the formula for variance above.

    Obviously, in the end just take the sqrt of the variance to get the standard deviation for the merged (3) sets of data.

    BTW, this idea is completely general for k sets of data.
    Last edited by Dragan; 02-09-2009 at 04:42 PM. Reason: correction

  6. The Following 3 Users Say Thank You to Dragan For This Useful Post:

    abakshi (12-10-2014), Jo87 (04-30-2012), splictionary (04-26-2012)

  7. #6
    Points: 2,841, Level: 32
    Level completed: 61%, Points required for next Level: 59

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    [QUOTE=Dragan;19689]Okay, here is the formulae you need.

    Thanks for the formula. This weights each mean (and standard deviation) the number of measurements of each, which is not exactly intuitive to me. For example, if I have 2 people, and A is 4' tall with 3 measurements, while B is 7' tall with 27 measurements, the mean height is:

    mean(A,B) = [3/(27+3)]*4 + [27/(30)]*7 = 6.7 feet.

    Very odd indeed, since I'd expect the mean to be at least near 5.5', but I'll take your word for it - perhaps an indication that one really should have equal numbers of measurements.
    Last edited by shoes; 02-15-2009 at 06:21 PM. Reason: clarification

  8. #7
    Points: 1,683, Level: 23
    Level completed: 83%, Points required for next Level: 17

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Could you please provide how this "theory" is called? I want to do the same but I would like to study also on the theory first.

    One more thing. Is it possible to have a more general equation that could be used for more parameters? I have something like 10 such populations so applying your equation 9 times is little bit time consuming.

    Best Regards
    Alex.


    Quote Originally Posted by Dragan View Post
    Okay, here is the formulae you need. This will give you the (exact) mean and variance as if you actually had the data. What you need to do is merge the data sets one by one using the results on the subsequent data set.

    mean=[n1 /(n1+n2)]*Xbar1 + [n2 /(n1+n2)]*Xbar2

    variance=[ n1^2*Var1 + n2^2*Var2 – n1*Var1 – n1*Var2 – n2*Var1 -
    n2*Var2 + n1*n2*Var1 + n1*n2*Var2 +n1*n2*(Xbar1 – Xbar2)^2 ] / [ (n1+n2-1)*(n1+n2) ]

  9. #8
    TS Contributor
    Points: 22,410, Level: 93
    Level completed: 6%, Points required for next Level: 940

    Posts
    3,020
    Thanks
    12
    Thanked 565 Times in 537 Posts
    Suppose in your data set you have total r groups and
    there are sample size n_i for each group

    Furthermore suppose you already got the
    sample mean estimate
    \bar{X_i} = \frac {\sum_{j=1}^{n_i}X_{ij}} {n_i}
    and the sample variance estimate
    \hat{\sigma}_i^2 = \frac {\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2}
{n_i - 1}
    for the each group, i.e. i = 1, 2, ..., r

    Then the pooled sample mean = \frac {\sum_{i=1}^r\sum_{j=1}^{n_i}X_{ij}} {\sum_{i=1}^rn_i} 
= \frac {\sum_{i=1}^rn_i\bar{X_i}} {\sum_{i=1}^rn_i}
    and the pooled sample variance = \frac {\sum_{i=1}^r
\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2} {\sum_{i=1}^r(n_i - 1)}
= \frac {\sum_{i=1}^r (n_i - 1)\hat{\sigma}_i^2} 
{\sum_{i=1}^r(n_i - 1)}

    It would be the same if you got the data in the form of the sufficient statistics
    \sum_{j=1}^{n_i}X_{ij}, \sum_{j=1}^{n_i}X_{ij}^2 in each group i

  10. #9
    Points: 1,683, Level: 23
    Level completed: 83%, Points required for next Level: 17

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I would like to thank you for your reply. As masteras said before pooled statistics could only be used for samples that their variance does not differ too much (Actually How do you know that the variances do not differ too much to use this technique?)

    In my case the mean value is the same and only variances change
    here are some typical examples for my study
    1) N(119,3)
    2) N(119,12)
    3) N(119,8)
    4) N(119,30)

    I would like to thank you in advance for your help

    Best Regards
    Alex.

    Quote Originally Posted by BGM View Post
    Suppose in your data set you have total r groups and
    there are sample size n_i for each group

    Furthermore suppose you already got the
    sample mean estimate
    \bar{X_i} = \frac {\sum_{j=1}^{n_i}X_{ij}} {n_i}
    and the sample variance estimate
    \hat{\sigma}_i^2 = \frac {\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2}
{n_i - 1}
    for the each group, i.e. i = 1, 2, ..., r

    Then the pooled sample mean = \frac {\sum_{i=1}^r\sum_{j=1}^{n_i}X_{ij}} {\sum_{i=1}^rn_i} 
= \frac {\sum_{i=1}^rn_i\bar{X_i}} {\sum_{i=1}^rn_i}
    and the pooled sample variance = \frac {\sum_{i=1}^r
\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2} {\sum_{i=1}^r(n_i - 1)}
= \frac {\sum_{i=1}^r (n_i - 1)\hat{\sigma}_i^2} 
{\sum_{i=1}^r(n_i - 1)}

    It would be the same if you got the data in the form of the sufficient statistics
    \sum_{j=1}^{n_i}X_{ij}, \sum_{j=1}^{n_i}X_{ij}^2 in each group i

  11. #10
    Points: 409, Level: 8
    Level completed: 18%, Points required for next Level: 41

    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: standard deviation of multiple sample sets

    Could someone kindly respond to dervast's question above "Actually How do you know that the variances do not differ too much to use this technique?"

    If the variances differ too much, what technique should we be using?

    I am also interested to know.

    Thanks!

  12. #11
    Points: 303, Level: 6
    Level completed: 6%, Points required for next Level: 47

    Posts
    1
    Thanks
    0
    Thanked 2 Times in 1 Post

    Re: standard deviation of multiple sample sets

    Quote Originally Posted by shoes View Post
    For example, if I have 2 people, and A is 4' tall with 3 measurements, while B is 7' tall with 27 measurements, the mean height is:

    mean(A,B) = [3/(27+3)]*4 + [27/(30)]*7 = 6.7 feet.

    Very odd indeed
    No, I do not agree with shoes' comments here. Statistics is a branch of mathematics and always dealing honestly with data. If we measure A for 3 times we have 3 pieces of data. Since we have 3 pieces of data to enter into statistical process I could not accept that those 3 pieces of data have only 1 weight unit. I would think your example should be the same as you have 3 persons of size A and 27 persons of size B. So what dragon said was reasonable in this scenario. His idea was not very odd.
    Download or read my document at md.rmutk.ac.th/file.php/471/to-my-students.pdf or here if the site does not allow you.
    Last edited by wjt; 03-11-2012 at 11:36 AM. Reason: Gramma correction

  13. The Following 2 Users Say Thank You to wjt For This Useful Post:

    Jo87 (04-30-2012), splictionary (04-26-2012)

  14. #12
    Points: 196, Level: 3
    Level completed: 92%, Points required for next Level: 4

    Posts
    1
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: standard deviation of multiple sample sets

    Thank you Dragan, does this method have a name? Have been trying to work out this problem for a few days now.

    Thank you wjy for the PDF, i can now call this "joint standard deviation"
    Last edited by splictionary; 04-26-2012 at 03:37 PM. Reason: Realization

  15. #13
    Points: 214, Level: 4
    Level completed: 28%, Points required for next Level: 36

    Posts
    3
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: standard deviation of multiple sample sets

    Soooo happy I found this discussion: I've had exactly the same problem a few days ago and couldn't find a solution. The replies of Dragan and wjt are very helpful.

    I would also be very interested in a general equation (and a name of this method) to calculate the variance (as shown by Dragan). As I understand, the equation presented by BGM isn't the same since variance between the mean values is not considered?!?

    Cheers,
    Jo

  16. #14
    Points: 185, Level: 3
    Level completed: 70%, Points required for next Level: 15

    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: standard deviation of multiple sample sets

    Hi.
    I wonder if you guys could help. I'm a scientist and looking to present my research.
    I'm measuring two linked values - substance production and cell number, and present research as a value of substance produced per cell - I have experimental data where I have 25 observations each for several different conditions, measuring amount of substance produced and number of cells per reaction (this varies depending on the condition, so not constant), each giving a mean and standard deviation - I then take mean values from each set of observations to give mean substance production / cell. However, I would also like to be able to present the standard deviation for the substance/cell value - I'm sure there must be an equation to let me combine the standard deviations of substance production and cell number to give an overall standard deviation, but don't know what this is! Can anyone help?
    Many thanks.

  17. #15
    Points: 214, Level: 4
    Level completed: 28%, Points required for next Level: 36

    Posts
    3
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: standard deviation of multiple sample sets


    Hi bs0srj,

    Just for clarification purposes: You grew n batches of cells each under different conditions with 25 observations for cell number and substance produced each. Subsequently, you took the mean and standard deviation for cell number and substance produced for each batch, and calculated the quotient to derive the mean substance produced per cell for each batch?!

    If you want to calculate the standard deviation for this quotient you have to apply the rules of error propagation. For multiplication and division the rule is as follows:

    If c = a * b, or c = \frac{a}{b}

    then \frac{\sigma_{c}}{\left | c \right |} = \sqrt{\left( \frac{\sigma_{a}}{a}\right )^{2} + \left(\frac{\sigma_{b}}{b}\right )^{2}}

    Also have a look here: http://en.wikipedia.org/wiki/Propagation_of_uncertainty

    Hope that helps!

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Similar Threads

  1. Replies: 1
    Last Post: 10-16-2010, 09:04 AM
  2. Standard error of the sample standard deviation
    By Taqman in forum Statistics
    Replies: 5
    Last Post: 06-10-2010, 08:50 PM
  3. Stat101 Standard Deviation of sample mean
    By aenyiema in forum Statistics
    Replies: 1
    Last Post: 09-09-2009, 09:16 PM
  4. Standard deviation of multiple populations
    By mcbenus in forum Psychology Statistics
    Replies: 2
    Last Post: 08-21-2009, 12:18 PM
  5. Sample Standard Deviation
    By kali2729 in forum Statistics
    Replies: 0
    Last Post: 07-05-2006, 02:01 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats