# Calculating missing mean and std deviations of subset

#### jacobhoffman

##### New Member
Hi all, first-time poster, thanks to anyone who may be able to guide me.

I am primarily a clinician, but am busy writing a systematic review and meta-analysis of certain medication trials and was wondering if there might be a simple solution to the problem I am facing.

Essentially, I am trying to infer data from other published data. The paper in question presents means and standard deviations of outcome data for adults and children combined, as well as for adults only. Is there any way to infer the mean and standard deviations for children only?

For example for adults and children combined the mean (SD) = 11.5 (6.8) with n = 17
For adults only the mean (SD) = 12.5 (6.3) with n = 14.

Is there any way to calculate the mean (SD) for the remainder of n = 3 (ie the children by themselves)?

My intuition is that there is likely a fairly simple way to do this?
I am able to infer the missing mean through basic algebra, but the standard deviation I am not sure of.

Thank you for any input!

Kindly,
Jacob

#### fed2

##### Active Member
i think easiest way is to apply formula
(n - 1)*stdev^2 = sum( Y^2 ) - n*average^2 to the total, and the adults. sum(Y^2) is the sum of squared values.

then sum of squares in childs = total sum - adults sum

then compute childs stdev using above formula with childs mean and sum of squares in childs.

Last edited:

#### jacobhoffman

##### New Member
Thanks fed2 for your reply. I am not sure if I am able to do this, as I don't have the individual participant data, so I am not sure how to calculate the sum of squares. Please see attached pdf with example. I have calculated and verified the mean for children, but trying to calculate for columns M and Y. This is all the data I have access to. If more data is required to calculate SD (ie individual participant data), then I can contact the original authors, I was just hoping there is a way I can calculate SD from what I have available.

Thanks for any further input.
Kind regards,
Jacob

#### Attachments

• 205.5 KB Views: 1

#### Dason

They told you how to get that value. You don't need the raw data - you've got enough. It probably would have been a bit easier to start with showing how to get the mean though as that's a bit easier to follow.

#### jacobhoffman

##### New Member
Thank you, I had another crack at it and seem to be getting some plausible results instead of just errors!
Thanks a lot

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Side note, if the above numbers are real - I am not sure a mean of three patients has much utility or generalizability.

#### Dason

Thank you, I had another crack at it and seem to be getting some plausible results instead of just errors!
Thanks a lot

#### fed2

##### Active Member
it wasn't the most direct explanation but I did not want to rob you of the chance to suffer. All true art is born of suffering!

#### jacobhoffman

##### New Member
Side note, if the above numbers are real - I am not sure a mean of three patients has much utility or generalizability.
Yes, I completely agree, but nevertheless we are reporting whatever data is out there and then commenting on quality / certainty of the evidence as well.

#### jacobhoffman

##### New Member
thank you, that would be most appreciated. Please see results in attached file that I got to.

Kind regards,
Jacob

#### Attachments

• 212.9 KB Views: 2

#### fed2

##### Active Member
no, something is off. can you print formulas?

for 'MGH-HPS C' active arm i am showing 8.5 for child sd?

C-like:
#the children ought to have been about sd=8.5;

SSY_total = (  (17 - 1)*6.8^2 + 17*11.5^2    )
SSY_adult = (  (14 - 1)*6.3^2 + 14*12.5^2    )

var_child =   (  SSY_child - 3*6.83333^2 )/2;

print(  sqrt(var_child)  );

##back check by construction
genSample = function(n,mn,std){
y = rep(mn, n)
t = sqrt( (n - 1)*std^2/2 )
e = c(rep(0, n -2) , -t, t)
y + e

}

childs =  genSample(3, 6.8333333, 8.5)
print(  sd(childs) == 8.5 );
print(  mean(childs) == 6.8333333 );

print(  sd(total)  );
print(  mean(total) );

#### fed2

##### Active Member
Side note, if the above numbers are real - I am not sure a mean of three patients has much utility or generalizability.

#### jacobhoffman

##### New Member

Here are the formulae I used for child mean and child SD respectively

1) child mean =((total mean * total N)-(adult mean * adult N))/child N

2) SD = SQRT(((mean * mean * N)/(N-1))-(mean^2))

Thanks again for your help with this

#### Attachments

• 211.9 KB Views: 1

#### fed2

##### Active Member
um well its still wrong.

start by creating rows for 'sum of squares (SSY)'. For adult, total, and children.

For adult and total, set SSY = ( (n - 1)*SD^2 + n*mean^2 )

for chld set to SSY = SSY_total - SSY of adults ;

#### jacobhoffman

##### New Member
thanks. any closer now?

#### Attachments

• 213.6 KB Views: 1

#### fed2

##### Active Member

For the child's sd i am getting 8.5 where you have 9.779 (active arm MGH-HPS).

Formula for child's sd = sqrt [ ( SSY_child - n*(average^2) )/(n - 1) ];

n refers to the number of children.

then you got it probably.

#### jacobhoffman

##### New Member
Think I finally got there. Thanks! (I was using the wrong formula from higher up in the page in the original link you sent)

#### Attachments

• 213.6 KB Views: 1

#### fed2

##### Active Member
Looks right. i am awarding you this 'statistical achievement award class 1', the highest award given to non-statisticians for courage in the face of numerical trials. Well done!