Which Non-parametric Significance Test Should I Use?

#1
Hi,

I've been working on a little project. Last year I delivered pizzas. I recorded data for 300 of my deliveries like tip amounts, order totals, and delivery times. I have a few hypotheses I would like to test. For example, after examining the data I've noticed that the average tip from orders over the phone is pretty substantially less than the average tip from orders over a device. But as you all know, I can't take the difference in the average at face value. I have to perform a significance test to see if it's statistically significant. The problems I have are as follows:

1. The sets of data are pretty skewed and don't pass normality tests so I don't feel comfortable using a parametric test like Welch's two-sample t test. Also, I can't transform my data because it contains data values of 0 (when customers didn't tip).

2. The shapes of the distributions of the data sets I'm comparing aren't real similar (let alone identical) so I don't feel comfortable using the Mann Whitney U test, Kruskal-Wallis test, or Mood's Median test (because they assume identically shaped distributions).

I just found out about a non-parametric test called the randomization test (known by others as the permutation test or exact test) as mentioned here:

https://www.youtube.com/watch?v=BvdNZNl09eE

Unfortunately, this test doesn't seem to be incorporated into Minitab. (I may be mistaken on that. I hope I am.) And there isn't a lot of guidance on the internet about how to perform this test (which makes me worry that some have a concern about it's validity).

I would appreciate any and all solutions to these problems. I've been working on this project for quite a while and am a bit frustrated. Thank you for your help in advance!
 

Karabiner

TS Contributor
#2
I have a few hypotheses I would like to test. For example, after examining the data I've noticed
There's a fundamental problem here. If you examine data,
find some intersting pattern and the apply a significance test,
then the p-values are distorted. The significance test does
not take into account that implicitly dozens or hundreds of
possible associations have been checked before, by eyeballing
the data. It is very difficult to distinguish between truely
significant results and chance results in that case.

1. The sets of data are pretty skewed and don't pass normality tests
Not data have to be normally distributed, but data within each group
should be sampled come from a normally distributed population. But with
n=300, this assumption is no more important. What's more important, IMHO,
is whether the mean is a good representation of the data in case of extremely
skewed distributions. Your idea to use U-test, Median test or H-test could be
a good alternative. Why these rank-based tests should require identical
distributions, I don't know. They are non-parametric tests, so distributional
assumptions play no role, AFAICS.

With kind regards

K.
 
#3
Karabiner,

I really appreciate your reply. I'm a little confused about why finding a pattern and then checking it with a significance test isn't good. In my mind it shouldn't matter because organizing the tips according to whether someone ordered over the phone or through a device should produce the same results as going up to all 300 of my customers and saying "Hi, did you order over the phone or via a device? Have a great day." Am I making an incorrect assumption here?

So you're saying that, to use a t test, the distribution of the 300 orders has to be normally distributed rather than both the 160 device orders and 140 phone orders each being normally distributed?

I was unpleasantly surprised when I learned about the assumptions of those non-parametric tests. Here's a quote:

"...However, for a Mann-Whitney U test to be able to provide a valid test of the difference between two medians, both distributions must be the same shape..."
https://statistics.laerd.com/minitab-tutorials/mann-whitney-u-test-using-minitab.php

So I think what non-parametric means is that it's free from assuming any particular distribution but whatever the two distributions are, their shape has to be the same. :/

Again, I really appreciate your help and ask for your continued guidance. What's the next step I should take or what would you do in my situation?

Thank you,
Joel
 

rogojel

TS Contributor
#4
hi,
just to explain a bit more the first point: if you have a collection of measurements you will always see some patterns in it. Testing the significance of a pattern in the same dataset where you noticed it is a problem because of course the pattern will be unusual but you picked it out if a huge number of other possible patterns so in reality you performe a large number tests before proceeding to the formal one . Multiple test increase the chances of a false positive a lot, which is why your p value will not mean anything.

E.g. looking at tips you probably could have noticed that blondes tend to tip more, or that people living in houses painted yellow or those having a large dog or or or. These are all possible patterns that were unconsciously tested and rejected - but this is making your one test part of a huge number of multiple tests and basically invalidates your p value.

regards
 

Karabiner

TS Contributor
#5
So you're saying that, to use a t test, the distribution of the 300 orders has to be normally distributed rather than both the 160 device orders and 140 phone orders each being normally distributed?
The other way around. If you want to compare phone versus device,
then within each group the dependent variable should be be sampled from
normally distributed populations.But with a total n=300, normality
assumptions are not necessary.

"...However, for a Mann-Whitney U test to be able to provide a valid test of the difference between two medians, both distributions must be the same shape..."
If the U-test is used as a test between medians, then maybe the assumption
of equal shapes is necessary. But the U-test is not a genuine test for medians,
it just tests whether the values from one group tend to be higher than those in the
other group. For medians one can use the median test.

With kind regards

K.
 

gianmarco

TS Contributor
#6
Just to supplement Karabiner's points. Indeed, the MW test (as formulated in the original 1940s paper) was not 'designed' to be a test of medians, rather a test to assess if there is a tendency for the values of one group to score higher than the values of the second group (or viceversa). So, if you are happy with the idea of testing that, you can use MW test even if the two distributions do not have the same shape. And you could also use some measure of effect size to gauge if the difference is small, medium, or large. You can find some info (in the context of the use of R) in my website's page: http://cainarchaeology.weebly.com/r-function-for-visually-displaying-mann-whitney-test.html.
Should you wish to test for a difference in central tendency, the assumption about the similarity in shape of the distributions must hold.

Hope this helps,
gm