# Mann-Whitney U Test test valid for percentage data?

#### NateLa

##### New Member
Is the Mann-Whitney U Test test valid for percentages (there are no discrete counts underlying the percentage)? All my data points are average percent cover of seagrass with about 53% of the data being zeros across 8 group levels/columns (4 years x 2 seasons per year).

I'd like to test the null hypothesis that there is no statistical difference in median seagrass cover between "dry" and "wet" seasons, for each year.

Each value is the average of 10 randomly chosen, independent, observations per site (n=47 sites). I'm working in R.

I thought beta regression would work, but learned values can't equal 0 or 1, logistic regression is only for success/failure type data, and my data have unequal variance so I can't do any parametric tests.

Code:
years <- seq(2008:2011)
season <- c("dry", "wet")
site <- seq(1:47)
percent range between 0.0 (0%) to 0.85 (85%)

Site   2008_dry   2008_wet   2009_dry  etc...
1      0          0
2      0.10       0.26
3      0          0.71
4      0.52       0
etc...

#### Karabiner

##### TS Contributor
So you have n=47 sites, and you have 2 x 4 percentage values for each site, right?

If you want to compare the values between dry and wet seasons, separately
for each year, then these are dependent measures, and the U test is not appropriate
(U tests are for independent groups; and by the way, U tests do not compare
medians; the median test is the one which compares medians between independent
groups). Since the percentages are a continuous measurement, you could perform
three Wilcoxon signed rank tests, one for each year. It doesn't compare medians,
though. The sign test would be easier to interpret, but does not use the complete
information from the data. Admittedly, I do not know whether also a median test for
dependent variables exists.

Wouldn't a repeated-measures analysis of variance with 2 factors ("year" with 3 levels,
and "season" with 2 levels) be an option? It would use the whole data and you could
include the interaction between the factors.

With kind regards

Karabiner

#### NateLa

##### New Member
Thank you! "So you have n=47 sites, and you have 2 x 4 percentage values for each site, right?" - Correct.

Also, I'm essentially, in the words of my supervisor, "seeing if there is any difference" between the two seasons. It doesn't have to be a difference between the medians, I just thought that would be the easiest thing to test.

Wouldn't I run into trouble though with so many zeros and unequal variance between groups if I ran the repeated measures ANOVA?

Last edited:

#### NateLa

##### New Member
I have also started to consider using a zero-inflated beta regression, but it seems incredibly complicated with little resources online to follow.

#### Karabiner

##### TS Contributor
Wouldn't I run into trouble though with so many zeros and unequal variance between groups if I ran the repeated measures ANOVA?
Well, there's only 1 group.

If sphericity assumption is violated (as ususal), then you can do an adjustment (such as Greenhouse-Geisser, Huynh-Feldt etc.).

With kind regards

Karabiner

#### katxt

##### Well-Known Member
Since the percentages are a continuous measurement, you could perform
three Wilcoxon signed rank tests
Since the year to year differences don't seem to be of particular interest, why not just combine the 4 years for each season for each of the 47 sites and use the Wilcoxon signed rank test suggested by Karabiner on the 47 matched pairs. (Or even the matched pairs t test. You have plenty of sites, and the combining and differencing makes the resulting data better behaved. Easy to do, easy to interpret, easy to explain.)

#### NateLa

##### New Member
Observations at each site are independent of each other though.

Last edited:

#### katxt

##### Well-Known Member
Observations at each site are independent of each other though.
One observation may not influence another but they both may be influenced in the same way by ecological conditions at the site.