Which test should I use for two unequal groups ?

Hi all!

I have data that include 20 samples divided into 2 groups (category A and category B). The groups are independent, none of the value in one group repeat in other. N(A) =14, N(B) = 6.

here is the data:
category A category B
0.0119888167559 0.023185483871
0.00101354303189 0.312090168227
8.95231103909e-06 0.503371693147
2.9580256165e-05 0.522824974411
0.0596266691309 0.114932864532
4.02612958098e-05 3.32126606662e-05

I would like to show that mean values of 2 groups differ significantly. But I am very confused which test statistic I should use.

Here are the tests that I've performed so far:

1. Wilcoxon rank sum test (Mann-Whitney test) (two-tailed)
W=20, p=0.07575

2. Student t-test (two-tailed)
t = -2,24259 p = 0,03775

3. Welch t-test (two-tailed, unpaired, correction=False)
t=-1.7109, p = 0.1376

So as you see 3 tests present 3 different probabilities...to be more complicated ...

4. Normality test (Shapiro-Wilk)
I've checked also the normality of my data, and the first group category A is normally distributed (Test Shapiro-Wilka = 0,704713, p 0,000413591, p<0.05) but second is not-category B (Test Shapiro-Wilka = 0,868539, p 0,220442, p>0.05) probably because of low number of samples.

A list of questions:

Q1: Can I assume that my data in 2 groups are normally distributed and use Student t test or Welch t-test?
Q2: OR Should I use non-parametric Mann Whitney test? (I've written that it has low power for low number of samples...)
Q3: Another think is the equality of variation between groups, when I assume that there are equal I can use Student t- test, if not I can use Welch t-test...should I first perform test for variant equality?

To summarize post - I need help to find a test that will be OK:
- small number of samples in one group (less than 10)
- unequal number of samples in groups
- data not normally distributed in one group
- showing the difference of means (optional)

I would really appreciate for any suggestions,
Please help!

PS. This is for publication. Since the probability from Student t-test is the most significant (p<0.05) I would like to stay with that result :) can I?

Hi Agata, a significant Shapiro-Wilk test means that data are significantly not-normally dirstributed. Thus, especially in Group A the assumptions for parametric tests are violated, and you should trust only the results of the non-parametric Mann-Whitney test which tells you that differences between both goups are not significant. This test works finde with all the restrictions you mention above
Thanks mmercker for replay. So since my data is not normally distributed you suggest to do Mann-Whitney test, but when I performed it in R I got a warning : "You can not calculate the exact value of the likelihood of repeated values" , so I am afraid I am missing some informations. This error occurred when I compared two groups of samples when in one group value 0 was duplicated (analogous to data above but with 0 values). What do you think about transforming data to be normally distributed? And then use Welch t-test?


TS Contributor
Ciao Agata (I am supposing you are Italian),
Yes you could switch to a non-parametric test, or you could give permutation t-test a try.
I do not know what software you are using, but in case you are familiar with R, you may want to use the function I have implemented, which is described here:

The same webpage explain the rationale of the permutation t-test, and provides a bibliographical reference.
The function allows to compare the results of both the 'regular' t-test and its permuted version, and allows you to assess to what extent the results of the 'regular' t-test would be flawed.

Hope this helps
Ciao Gianmarco! (Unfortunately not Italian but Polish)
Thank you for your suggestions. I will try that.
I am sorry maybe for stupid question... but is it the same to non-parametric t-test with Monte-Carlo simulation?


TS Contributor
I think the two definitions should indicate the same thing....it is something that would be easy to ascertain by googling a little
I have red that Mann-Whitney test is not recommended for sample size lower than 20 so I will go with nonparametric t-test with permutations. Thank you all for help! Best,
Heh, the easiest way for me would be to use http://qiime.org/scripts/group_significance.html script because I have a lot of bacteria to compare in case of 2 groups...and there is non-parametric t test with Monte Carlo simulation test that I could use.
The one think that I am thinking about is the output from that script which present FDR and Bonferroni p-value corrections. Is it necessary to include corrections since I have only two-groups? Example output below (not connected to data above):

OTU Test-Statistic P FDR_P Bonferroni_P category_A_mean category_B_mean
bacteria1 2.47479722997 0.023976023976 0.736263736264 1.0 0.00142835984349 0.00044855807928
bacteria2 -2.2425947408 0.02997002997 0.736263736264 1.0 0.0742924977778 0.246073066142
probably it is printing automatically since this script is prepared for multiple groups comparison, but I cannot find information to do not take this corrections into account. Also I need to know beside p value and t value number of degrees of freedom... which is not presented in this output.

Or I am misunderstanding everything...
Last edited:
There is something strange with these data that are so close to zero for many of the data points. Can you tell us how you got the data?

because I have a lot of groups to compare in case of identified bacteria..
Does this mean that you have counted the number of spots on a petri dish (and taking logs) or something? Then it is maybe better with Poisson regression.

Both the t-test and the Mann-Whitney test are small sample test. So both can be used for sample sizes less than 20.
Because it is an abundance. When you do caunts*100 will give a %. For example for bacteria2 its 7% vs 24%. It is a results from NGS metagenomic analysis.

because I have a lot of groups to compare in case of identified bacteria.. --> sorry I've already change that in post above