I have 8 experimental datasets, 24 data points in each, which I compared using ANOVA followed by post hoc Tukey's HSD, using R. They are OK for the ANOVA assumptions. I found significant differences at 0.01 level between two of the groups and everything else, nothing else was significant.

But if I remove the two significantly different datasets and do the same analysis on the other 6, there are significant differences between them, and not marginal ones, p <0.01.

Is this OK or does it mean I did something wrong in my script? I can see that more groups will mean that there is more caution in deciding differences are significant, but this seems a big jump and I thought ANOVA was designed to allow for the numbers of groups.

This is the bit of the script

strain.aov = aov(abs ~ strain,data=df)

strain.tukey = TukeyHSD(strain.aov)

and df has either 8 or 6 sets of 24 readings of absorbance for each strain.

Thanks for any advice. I am not trying to break up my data any old way to get significance, I am wondering why this happens and whether I misunderstood something.