Which test should I use for two unequal groups ?

#1
Hi all!

I have data that include 20 samples divided into 2 groups (category A and category B). The groups are independent, none of the value in one group repeat in other. N(A) =14, N(B) = 6.

here is the data:
category A category B
0.0119888167559 0.023185483871
0.00101354303189 0.312090168227
8.95231103909e-06 0.503371693147
2.9580256165e-05 0.522824974411
0.0596266691309 0.114932864532
4.02612958098e-05 3.32126606662e-05
0.337753287524
0.0115114590662
0.19273480545
0.232453117898
3.69713102632e-05
3.00480769231e-05
0.192851577717
1.58790650407e-05

I would like to show that mean values of 2 groups differ significantly. But I am very confused which test statistic I should use.

Here are the tests that I've performed so far:

1. Wilcoxon rank sum test (Mann-Whitney test) (two-tailed)
W=20, p=0.07575

2. Student t-test (two-tailed)
t = -2,24259 p = 0,03775

3. Welch t-test (two-tailed, unpaired, correction=False)
t=-1.7109, p = 0.1376

So as you see 3 tests present 3 different probabilities...to be more complicated ...

4. Normality test (Shapiro-Wilk)
I've checked also the normality of my data, and the first group category A is normally distributed (Test Shapiro-Wilka = 0,704713, p 0,000413591, p<0.05) but second is not-category B (Test Shapiro-Wilka = 0,868539, p 0,220442, p>0.05) probably because of low number of samples.

A list of questions:

Q1: Can I assume that my data in 2 groups are normally distributed and use Student t test or Welch t-test?
Q2: OR Should I use non-parametric Mann Whitney test? (I've written that it has low power for low number of samples...)
Q3: Another think is the equality of variation between groups, when I assume that there are equal I can use Student t- test, if not I can use Welch t-test...should I first perform test for variant equality?

To summarize post - I need help to find a test that will be OK:
- small number of samples in one group (less than 10)
- unequal number of samples in groups
- data not normally distributed in one group
- showing the difference of means (optional)

I would really appreciate for any suggestions,
Please help!

PS. This is for publication. Since the probability from Student t-test is the most significant (p<0.05) I would like to stay with that result :) can I?

Best,
Agata
 
#2
Hi Agata, a significant Shapiro-Wilk test means that data are significantly not-normally dirstributed. Thus, especially in Group A the assumptions for parametric tests are violated, and you should trust only the results of the non-parametric Mann-Whitney test which tells you that differences between both goups are not significant. This test works finde with all the restrictions you mention above
 
#3
Thanks mmercker for replay. So since my data is not normally distributed you suggest to do Mann-Whitney test, but when I performed it in R I got a warning : "You can not calculate the exact value of the likelihood of repeated values" , so I am afraid I am missing some informations. This error occurred when I compared two groups of samples when in one group value 0 was duplicated (analogous to data above but with 0 values). What do you think about transforming data to be normally distributed? And then use Welch t-test?
 

gianmarco

TS Contributor
#4
Ciao Agata (I am supposing you are Italian),
Yes you could switch to a non-parametric test, or you could give permutation t-test a try.
I do not know what software you are using, but in case you are familiar with R, you may want to use the function I have implemented, which is described here:
http://cainarchaeology.weebly.com/r-function-for-permutation-t-test.html

The same webpage explain the rationale of the permutation t-test, and provides a bibliographical reference.
The function allows to compare the results of both the 'regular' t-test and its permuted version, and allows you to assess to what extent the results of the 'regular' t-test would be flawed.

Hope this helps
Gm
 
#5
Ciao Gianmarco! (Unfortunately not Italian but Polish)
Thank you for your suggestions. I will try that.
I am sorry maybe for stupid question... but is it the same to non-parametric t-test with Monte-Carlo simulation?
 

gianmarco

TS Contributor
#6
Hello!
I think the two definitions should indicate the same thing....it is something that would be easy to ascertain by googling a little
 
#7
I have red that Mann-Whitney test is not recommended for sample size lower than 20 so I will go with nonparametric t-test with permutations. Thank you all for help! Best,
Agata
 
#9
Heh, the easiest way for me would be to use http://qiime.org/scripts/group_significance.html script because I have a lot of bacteria to compare in case of 2 groups...and there is non-parametric t test with Monte Carlo simulation test that I could use.
The one think that I am thinking about is the output from that script which present FDR and Bonferroni p-value corrections. Is it necessary to include corrections since I have only two-groups? Example output below (not connected to data above):

OTU Test-Statistic P FDR_P Bonferroni_P category_A_mean category_B_mean
bacteria1 2.47479722997 0.023976023976 0.736263736264 1.0 0.00142835984349 0.00044855807928
bacteria2 -2.2425947408 0.02997002997 0.736263736264 1.0 0.0742924977778 0.246073066142
.
.
.
probably it is printing automatically since this script is prepared for multiple groups comparison, but I cannot find information to do not take this corrections into account. Also I need to know beside p value and t value number of degrees of freedom... which is not presented in this output.

Or I am misunderstanding everything...
 
Last edited:
#10
There is something strange with these data that are so close to zero for many of the data points. Can you tell us how you got the data?

because I have a lot of groups to compare in case of identified bacteria..
Does this mean that you have counted the number of spots on a petri dish (and taking logs) or something? Then it is maybe better with Poisson regression.


Both the t-test and the Mann-Whitney test are small sample test. So both can be used for sample sizes less than 20.
 
#11
Because it is an abundance. When you do caunts*100 will give a %. For example for bacteria2 its 7% vs 24%. It is a results from NGS metagenomic analysis.

because I have a lot of groups to compare in case of identified bacteria.. --> sorry I've already change that in post above