Apparent contradiction in Mann-Whitney-Wilcoxon test results

#1
I have 2 data sets, each N=10, that I would like to compare. They fail Levene's test for homogeneity of variances, and for this reason, I can't use the t-test. I believe that the Mann-Whitney-Wilcoxon nonparametric test is my proper alternative.

Here are the data:
X1 = [90.4 96.0 94.8 90.4 96.0 90.4 96.0 94.8 94.4 96.4];
X2 = [29.6 40.0 44.4 29.6 40.0 29.6 40.0 44.4 44.0 31.6];

I have performed the MWW test, and I get the following results:

MANN-WHITNEY-WILCOXON TEST
--------------------------------------------------------------------------------
Sample size is good enough to use the normal distribution approximation

T: 155.0000
U: 100.0000
mT: 105.0000
sT: 13.2245
zT: 3.7431
p-value (1-tailed): 0.0001


From the p-value (as well as boxplot visualization), the difference between the two groups appears to be significant. However, from the reading I've done about the MWW U statistic, my U value should be *less than* or equal to the following values in order to achieve significance:

for N = 10,
~ at the 5% significance level, the U value must be less than or equal to 23 in order for the result to be significant.
~ at the 1% significance level, the U value must be less than or equal to 16 in order for the result to be significant.

Clearly, my U value of 100 is significantly larger than either of these values.

I would appreciate any clarification about how these 2 seemingly very different data sets can still have a U value that seems to indicate a lack of significant difference. (Also, if there is a better test to use for these data, that would be great to know.) Thanks a lot!
 
Last edited:
#2
The rules you quote for critical U values are simply wrong, although I can guess the thinking that went into them.

For two 10-count samples, the distribution of U under the null hypothesis is approximately normal with mean 50 and standard deviation 13. A value of 100 is thus a nearly four-sigma departure from the expected value, which is highly significant.

I suspect the whoever wrote your rules was considering only one-sided departures in which U is smaller than expected under the null hypothesis, in which case those claims are about right. Of course, you can switch U from being a given distance above its expected value to being the same distance below it by simply switching which group you call #1 and which group you call #2.
 
#5
Is there a reason why almost every second value in x2 is around 29 and the others 40 to 44 and no values between 32 and 40? And on x1 that most values are around 90 and 96?

Could it be that there is an omitted explanatory variable? (except the x1/x2 variable)

(If so then it can be estimated two-way analysis of variances.)
 
#6
Thanks for all of the helpful advice, everyone!

ichbin said:
The rules you quote for critical U values are simply wrong, although I can guess the thinking that went into them.
I had intended to cite a source for that interpretation of U: (note: I'm aware that this document uses a 2-tailed test, while my results are one-tailed...)
http://www.sussex.ac.uk/Users/grahamh/RM1web/Mann-Whitney worked example.pdf


Karabiner said:
You could rather use the Welch test.
TheEcologist said:
I second, Karabiner. The Mann-Whitney is not the only "proper" alternative
Thanks, I'll look into the Welch test.

You've got n=20.
Gotcha, thanks. I was using 10 because the tables to look up significant U values list one set's N on one axis & the other set's N on the other axis.

Thanks for the input, Greta. The data were generated using a neural network that uses pseudo-random numbers, and for repeated runs of the same experimental setup, it's not unusual for similar or identical values to result. I'll keep estimated two-way ANOVA in mind for future analyses.