ANOVA vs. T-Test --> completely different results

If I compare three treatment groups with ANOVA (and Bonferroni posttesting), I get no significant difference results between groups (p>0.05)... However if I compare the treatment groups two-by-two, with a t-test (or a non-parametric test) I get significant differences between two groups (highly significant)... How can this difference be explained? How to deal with them???
First figure out whether the difference is practically significant. For example, if you are measuring human weight, and the difference is 1 pound, does that actually matter? (To you, to the people you are measuring, etc.) If your results are practically significant, then you can worry about statistical significance..
The differences is that in one case you are doing a pos hoc test and in one case you are doing an a priori or planned contrast test. The Bonferroni test is a much more conservative test than the t-test because it is a post-hoc test. If you planned on comparing two specific groups with each other from the beginning (i.e. before you ran your experiment or saw your data) then the t-test would be appropriate. However, if after you looked at your data you said "hmmm I wonder if there is anything significant here" then you should use the Bonferroni test. When you do not plan your tests in advance you are just trying everything and this is dangerous because it increases the chance of making a type 1 error. For each test you do the chance of making an error increases so that you go from a five percent probability of a type 1 error for one contrast to a 15% chance of making an error for three tests. The Bonferroni test corrects for this error and requires that the total probability of error be equal to .05 regardless of how many tests you do.
So which test you use depends on what you planned. However, if the ANOVA itself is not significant then you really should not follow up with either type of test. You should only use these if the F tests is significant.


Probably A Mammal
Aren't there also differences in the degrees of freedom? The Bonferroni multiple comparison method uses the sample size of all factor groups whereas the studentized t-test of the means would be based on the factor group sizes. I think I wondered about this when I learned it, and I thought "isn't this just a t-test for difference of means?" But then I saw my mistake.


Ambassador to the humans
That's true. You get more degrees of freedom but that's only because you're assuming there is a constant variance so every observation helps estimate the error variance whereas in the multiple t-test situation the group that isn't being compared isn't helping estimate the variance.
further information:

1) as data are not normal distributed, I have performed a 1-way ANOVA (Kruskal Wllis test with Dunn's multiple Comparison post-hoc testing)
2) On the other hand I have performed a Mann-Whitney test between each colomn

with ANOVA no significane... with Mann-Whitney some relevant significant differences... QUITE ASTONISHING: p>0.05 vs. highly signifant results with Mann-Whitney (p<0.001 and p<0.0001) !!!!!

For me it is absolutely not clear, when I am allowed to do a-priori-testing and when not... naturally, I intended to compare the according parameters before doing the experiments... (most people do?)

How to go on, now?
Last edited:


A priori testing is just that: You have a pretty good hypothesis that you're pretty sure of and do it as the initial (and usually only test of sig.), if not do the anova with follow up contrasts that control for family wise error. Generally you can't do both the apriori and the anova. If you google a priori contrasts vs. post hoc contrasts/follow up contrasts you'll get a pretty clear picture of when to use what and how many contrasts you can perform (some are J-1 and some are unlimited contrasts; depending on the test).

It's not that astonishing that you found significance when you did multiple a priori tests. As a metaphor it would be like having 3 green marbles and 10 red marbles in a bag. If you picked 10 marbles you're likely to pick a green marble (not to get too far off topic but the birthday paradox [google that too if you're unfamiliar with it] is a prime example of probability similar to family wise error).