Sample size too big for Wilcoxon Signed Rank test in R — what to do?

Dear Community,

We are trying to perform a Wilcoxon Signed Rank Test in R but suspect it is not working because our dataset is too large. We have around 10,000 paired samples: looking at number of hospital appointments before and after an intervention. About half of the population have a difference of zero e.g. same amount of appointments in the time period (one year) before and after intervention.

The data is in a straightforward format e.g. :
Patient PRE POST
1 3 9
2 7 6
3 2 1

We have run the following code in R:
test5 <- wilcox.test(mydata$PRE_followup, mydata$POST_followup, mu=0, alt="two.sided",
paired = TRUE,, conf.level=0.95, exact=FALSE, correct=FALSE)

Which gives us:
Wilcoxon signed rank test

data: mydata$PRE_followup and mydata$POST_followup
V = 34697374, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
0.5000322 0.9999830
sample estimates:

Because of the size of our sample we’re getting way too significant p-values. Is there a correction that can be done? Or an alternative test that our incessant googling has missed?

Thank you very much!


TS Contributor
What do you mean by "way too significant"? Statistical significance doesn't refer to anything in
the every day sense ("remarkable, distinctive, exceptional, impressive, outstanding, prominent,
valuable, worthwhile, distinguished, eminent, great, illustrious, noble, notable, noteworthy,
outstanding, preeminent, prestigious..."). It is just a statement with respect to the question of
whether the difference between pre and post ist exactely = 0.000000000000000000000000000
in the population from which your data were drawn.

With a huge sample size such as yours, even a small pre-post decrease (or increases, respectively)
will be sufficient to very clearly reject the hypothesis of a difference of size 0.000000000 in the
population .

You could perform a paired t-test (as Dason suggested) and have a look at the 99% confidence
interval for the difference, in order to gain an impression of how tiny the standard error
of estimation is with such a huge sample.

With kind regards



Less is more. Stay pure. Stay poor.
It would be informative for you to post histograms of the #appts pre, post, and difference. This may help us to understand your data.

As @Karabiner mentioned, there is no such thing as too significant. You may hear the phrase "over-powered", but that is always a good thing in most scenarios.

I don't follow the output. Is it stating the median difference is 0.5 appointments, which seems weird. And the CI for that is a number larger to 0.999?
Last edited: