Please help me choose the appropriate test for a directional hypothesis

Hi everyone,
I've been asked to take a new direction with one of the research questions in my dissertation fairly last minute so I'd be really grateful of your help. I'm looking at the pH values from two treatment groups and my hypothesis is:
H1: The heat samples have a significantly more acidic post-exposure pH in comparison to the nutrient samples.

The data don't fully meet assumptions for parametric tests (see attached screenshot with Shapiro Wilk, Skewness & Kurtosis results). I used Wilcoxon rank sum to test for a statistically significant difference between the groups and was able to reject the null, but I assume I can't use those results to answer the above hypothesis?
Here is the write up from the Wilcoxon rank sum test:

"Data for pH did not meet assumptions of normality for an independent samples T-test (see Table X), so the Wilcoxon Rank Sum test was used. The median pH in the heat treatment was 6.16 (IQR = .37), whereas the median in the nutrients treatment was 7.34 (IQR = 1.01) as seen in Figure X. The Wilcoxon Rank Sum test revealed that the difference the location of the medians was significantly different (95% conf interval, W = 16, p = <.001, effect size r = 0.83)."

Thanks in advance,

Stay safe all



TS Contributor
If sample size is large enough (n > 30 or so), the t-test (or better: Welch test) is robust, even if the values
in the respective groups are not sampled from normally distributed populations.

The Wilcoxon rank sum test does not compare medians (the median test compares medians, by the way).

With kind regards

Each treatment had 30 samples so should be large enough..are the assumptions for the Welch test a bit more relaxed then? I'm not sure how to justify using a t-test if the data don't meet the assumptions.

I'm not sure where I got the information on Wilcoxon rank sum test but I definitely got the impression it compared the location of medians, what would be your summary if you don't mind sharing?

Many thanks
what would be your summary if you don't mind sharing?
Not a statistician here, but a ranksum is explained really nicely and easily in the Wikipedia entry (which is not necessarily the case for a number of statistics topics in Wikipedia). The test is helping you infer whether the distributions are different. Rightly or wrongly, i would usually give the medians and interquartile range of the data from each of the 2 arms, alongside the p value derived from the ranksum/Mann-Whitney U test. However if you are not somehow justifying that the data fits a symmetric distribution, a single IQR such as provided in the original post would seem insufficient.

Similarly, if it is central to that chapter of the thesis, maybe include a box and whisker plot of the 2 arms of the trial. It would not seem uncommon or wrong that the test statistic is unable to convey to the the reader the magnitude of difference or the distribution of data effectively without adding additional information to the result description which is not necessarily directly used to generate that statistic.

therefore making the figures up:

... the pH in the intervention arm was overall lower with a median of 6.62 (IQR 5.71 – 6.92) vs a control arm median of 7.34 (IQR 5.90 – 7.92), with a common language effect size of 63% (p 0.023).

Very happy to be corrected by the experts though.
Last edited: