# Thread: Mann-Whitney U-test (rank sum)

1. ## Mann-Whitney U-test (rank sum) Urgent!!!

Hi everyone!
I need help to interpret the outcome of the Mann-Whitney U-test in STATA. I want to compare two groups: online journal articles with few vs. articles with many clicks. I want to know how to interpret the 'rank sum' outcome! Further, I want to calculate the mean of the two sub groups. Is there any possibility to calculate the mean based on the 'rank sum' outcome? Thanks a lot for quick replies!!!
Daniela

2. ## Re: Mann-Whitney U-test (rank sum)

Hi Daniela!
I think that a little more information about your data and the goals you have in mind, could hallow people here to provide sounder help.
So, for example:
-what kind of data you have?
-why did you pick up MW test?
-why you are interested in calculating the mean based on rank sum?

Regards
Gm

3. ## Re: Mann-Whitney U-test (rank sum)

Hi! I want to find out if there is a significant difference between the two groups: so I ran a linear regression between lenght of articles and the number of clicks for each articke. Now I want to see if theres a difference between longer and shorter ones, using the MW test (MW because there is no normal distribution). I splitted the number of clicks according to the median and run a MW test. The outcome is the p-value (of course) and this mysterious rank sum thing. I would be interested to see if the mean differs. Thanks for replies!!!!

4. ## Re: Mann-Whitney U-test (rank sum)

Hi!
There is something I do not fully grasp.
The way you are describing your goal sounds not correct to me. In my opinion, dividing your sample in two sub-sample according to the median, and then testing for a difference in median, sounds to me like a circular reasoning...
May be your goals, as I could envisage, is to test if there is a significant difference in click numbers between short and long articles. Am I correct in assuming this?

Gm

5. ## Re: Mann-Whitney U-test (rank sum)

Yes, may be I have't described that right. I want to find if longer or shorter articles received more clicks -> therefore the linear regression. Now I want to find if theres a difference between the two subgroups: long / short articles, so I splitted the data set into two groups: long articles vs short articles (splitted according to median), took the number of clicks for long / short articles and assigned a dummy variable to the two groups. Then I run the MW test.

6. ## Re: Mann-Whitney U-test (rank sum)

What is this test going to tell you that the linear regression of clicks on article length doesn't? Dichotomising continuous variable is usually a bad idea. (Technically number of clicks is a count variable rather than continuous, but the same arguments still stand).

7. ## Re: Mann-Whitney U-test (rank sum)

Hi Daniela,
ok...just to make me sure to have well understood: when you say "long articles vs short articles (splitted according to median)", what median are you referring to?

Thanks for any clarification
Gm

8. ## Re: Mann-Whitney U-test (rank sum)

Well- the linear regression wasn't significant. But I can see in the data that there are somehow two subgroups, so I thought I can use the MW test to explain the difference. But I still don't know how to interpret the rank sum value. As far as I understand the MW test compares the median (not the mean as the t-test), right? Do you know how to interpret the rank sum value - because its not the median or the mean value. Or can I calculate the median or mean with the rank sum value? Thanks!

9. ## Re: Mann-Whitney U-test (rank sum)

Originally Posted by gianmarco
Hi Daniela,
ok...just to make me sure to have well understood: when you say "long articles vs short articles (splitted according to median)", what median are you referring to?

Thanks for any clarification
Gm
Hi! Sorry, overlooked your question. the median of the article lenght - so I sorted the data according to review lenght, and took the number of clicks for the subgroup above the median lenght vs number of clicks below median lenght.

10. ## Re: Mann-Whitney U-test (rank sum)

Ok,
so you are testing for a click difference between two groups, i.e. short articles vs long articles.
For a full explanation of the test, see, e.g., here and here. But note that there are many earlier thread on MW test on this same Forum. So, you could try and search by yourself (see this hor example).

I think that there is some confusion (generally speaking, and with reference to persons with no extensive stat background -like me-) about what MW is actually testing. So, often it is assumed that MW is testing a difference in median, but there are cases in which MW indicates a significant difference even in samples with the same median. This is because, as far as I understand it, MW is also sensible to a difference in other "features" of the data.

I think that a more general approach to the test's assumption is a viable option. In other words, you can perform MW to test if there is a tendency for the values of one sample to be "different" from the values of the other. Or, to put it more formally,
Under the null hypothesis, the distributions of both groups are equal, so that the probability of an observation from one population (X) exceeding an observation from the second population (Y) equals the probability of an observation from Y exceeding an observation from X, that is, there is a symmetry between populations with respect to probability of random drawing of a larger observation
Now, when you perfom the test, you should get the test statistic (U), along with the p value. I do not know the specific output of a specific statpack. But I guess it could return the rank sums, i.e. the sums of the ranks of the observations in each sample. For more info about these figures (that are used in the calculation of the test) see the previous links.

As for you last question
Is there any possibility to calculate the mean based on the 'rank sum' outcome?
I do not think it could be useful to calculate the mean in the context of the MW test. If you are to use a measure of central tendency, I believe that you could report the median, may be along with the MW results. But remember the above remarks on the issue of what MW is actually testing !!!!

So, in summary, I would suggest (but it is my personal opinion):
- report the medians of the two samples (if you are to use a measure of central tendency)
- perform MW (with possibly a more general assumption)
- report the test statistic (U) and the associated p value
- if the result is significant, you can conclude that the test indicates that there is a tendency for your sample x to have greater values (i.e., greater number of clicks) than sample y.

Finally, keep in mind the advice of CowboyBear (I will also read the article he has attached)...

Hope this helps
regards
Gm

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts