Statistically significant difference between distributions only knowing the mean and std dev?

I have 3 specimens (A, B & C) and 5 sets of observations from A & B but only 4 from C.
Each observation has been reported as a mean & a standard deviation. I do not have access to the underlying data.

0.230, 0.104
0.245, 0.117
0.223, 0.098
0.246, 0.131
0.240, 0.135

0.225, 0.121
0.218, 0.111
0.224, 0.125
0.231, 0.129
0.204, 0.092

0.245, 0.14
0.225, 0.09
0.249, 0.104
0.229, 0.099
I am curious about whether the samples are statistically different from each other as far as this measurement is concerned.

My initial attempt was just to use the means so I did some T tests between them & then an ANOVA on all 3 using Python & Scipy's stats library.

import numpy as np
import scipy.stats as stats

A = np.array([0.230, 0.245, 0.223, 0.246, 0.240])
B = np.array([0.225, 0.218, 0.224, 0.231, 0.204])
C = np.array([0.245, 0.225, 0.249, 0.229])

AB_ttest  = stats.ttest_ind(A, B)
AC_ttest  = stats.ttest_ind(A, C)
BC_ttest  = stats.ttest_ind(B, C)

ABC_anova = stats.f_oneway(A, B, C)

print("A, B T test: ", AB_ttest)
print("A, C T test: ", AC_ttest)
print("B, C T test: ", BC_ttest)
print(" ")
print("ANOVA: ", ABC_anova)
Here's the results rounded to 3 decimal places...

A, B T test: statistic = 2.561, p value = 0.0336
A, C T test: statistic = -0.028, p value = 0.978
B, C T test: statistic = -2.263, p value = 0.058

ANOVA: statistic = 3.889, p value = 0.0528
OK so here's where I realize I dont know what's going on.

Given that p < 0.05 implies a statitically significant difference (ssd) then the T test for A & B suggests such a difference and yet the ANOVA suggests that there is no ssd between them? Or am I reading this all wrong?

Also I'm not convinced that my approach is an acceptable statitical anlysis of this kind of data . If not how should I be approaching this?


Well-Known Member
Hi DrBwts,

what do you mean by an average of a set? for example for group-A: 0.230 is average and 0.245 is average?

You need to read about multiple comparisons.
If you run 3 tests with a significance of 0.05 the actual significance level is 1-0.95^3=0.14, so you need to take a smaller sig level (Bonferroni correction or other correction)

The recommended method to compare means if you assume equal means is to use the ANOVA test and if ANOVA found significant different than to run the Tukey HSD instead of the Bonferroni correction, to find the relevant groups. (if meets the assumptions of the test)

Also I don't see any different between p-value = 0.049 and p-value = 0.0528.
The power of the ANOVA is low 0.104 for medium effect size with this small sample size.

If you don't assume equal means you can go directly to the Tukey HSD

And yes it is a valid option that ANOVA test will say no significant difference and Tukey will find a significant difference.

You may see that the Tukey results are different than the t-tests you used.

Did you calculate the required sample size before the experiment?