Calculating missing mean and std deviations of subset

Hi all, first-time poster, thanks to anyone who may be able to guide me.

I am primarily a clinician, but am busy writing a systematic review and meta-analysis of certain medication trials and was wondering if there might be a simple solution to the problem I am facing.

Essentially, I am trying to infer data from other published data. The paper in question presents means and standard deviations of outcome data for adults and children combined, as well as for adults only. Is there any way to infer the mean and standard deviations for children only?

For example for adults and children combined the mean (SD) = 11.5 (6.8) with n = 17
For adults only the mean (SD) = 12.5 (6.3) with n = 14.

Is there any way to calculate the mean (SD) for the remainder of n = 3 (ie the children by themselves)?

My intuition is that there is likely a fairly simple way to do this?
I am able to infer the missing mean through basic algebra, but the standard deviation I am not sure of.

Thank you for any input!



Active Member
i think easiest way is to apply formula
(n - 1)*stdev^2 = sum( Y^2 ) - n*average^2 to the total, and the adults. sum(Y^2) is the sum of squared values.

then sum of squares in childs = total sum - adults sum

then compute childs stdev using above formula with childs mean and sum of squares in childs.
Last edited:
Thanks fed2 for your reply. I am not sure if I am able to do this, as I don't have the individual participant data, so I am not sure how to calculate the sum of squares. Please see attached pdf with example. I have calculated and verified the mean for children, but trying to calculate for columns M and Y. This is all the data I have access to. If more data is required to calculate SD (ie individual participant data), then I can contact the original authors, I was just hoping there is a way I can calculate SD from what I have available.

Thanks for any further input.
Kind regards,



Ambassador to the humans
They told you how to get that value. You don't need the raw data - you've got enough. It probably would have been a bit easier to start with showing how to get the mean though as that's a bit easier to follow.


Less is more. Stay pure. Stay poor.
Side note, if the above numbers are real - I am not sure a mean of three patients has much utility or generalizability.


Active Member
no, something is off. can you print formulas?

for 'MGH-HPS C' active arm i am showing 8.5 for child sd?

#the children ought to have been about sd=8.5;

SSY_total = (  (17 - 1)*6.8^2 + 17*11.5^2    )
SSY_adult = (  (14 - 1)*6.3^2 + 14*12.5^2    )
SSY_child = SSY_total - SSY_adult;

var_child =   (  SSY_child - 3*6.83333^2 )/2;

print(  sqrt(var_child)  );

##back check by construction
genSample = function(n,mn,std){
    y = rep(mn, n)
    t = sqrt( (n - 1)*std^2/2 )
    e = c(rep(0, n -2) , -t, t)
    y + e

adults = genSample(14, 12.5, 6.3);
print(  sd(adults) == 6.3 );
print(  mean(adults) == 12.5 );

childs =  genSample(3, 6.8333333, 8.5)
print(  sd(childs) == 8.5 );
print(  mean(childs) == 6.8333333 );

total = c(adults, childs)
print(  sd(total)  );
print(  mean(total) );
Thanks for your reply. Apologies, I had copied and pasted whilst transposing rows to columns in excel, and forgot to paste values instead of formulae. Please see adjusted results in attached file.

Here are the formulae I used for child mean and child SD respectively

1) child mean =((total mean * total N)-(adult mean * adult N))/child N

2) SD = SQRT(((mean * mean * N)/(N-1))-(mean^2))

Thanks again for your help with this



Active Member
um well its still wrong.

start by creating rows for 'sum of squares (SSY)'. For adult, total, and children.

For adult and total, set SSY = ( (n - 1)*SD^2 + n*mean^2 )

for chld set to SSY = SSY_total - SSY of adults ;


Active Member
your ssy rows look right.

For the child's sd i am getting 8.5 where you have 9.779 (active arm MGH-HPS).

Formula for child's sd = sqrt [ ( SSY_child - n*(average^2) )/(n - 1) ];

n refers to the number of children.

then you got it probably.


Active Member
Looks right. i am awarding you this 'statistical achievement award class 1', the highest award given to non-statisticians for courage in the face of numerical trials. Well done!