+ Reply to Thread
Results 1 to 6 of 6

Thread: Which Non-parametric Significance Test Should I Use?

  1. #1
    Points: 15, Level: 1
    Level completed: 29%, Points required for next Level: 35

    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Which Non-parametric Significance Test Should I Use?




    Hi,

    I've been working on a little project. Last year I delivered pizzas. I recorded data for 300 of my deliveries like tip amounts, order totals, and delivery times. I have a few hypotheses I would like to test. For example, after examining the data I've noticed that the average tip from orders over the phone is pretty substantially less than the average tip from orders over a device. But as you all know, I can't take the difference in the average at face value. I have to perform a significance test to see if it's statistically significant. The problems I have are as follows:

    1. The sets of data are pretty skewed and don't pass normality tests so I don't feel comfortable using a parametric test like Welch's two-sample t test. Also, I can't transform my data because it contains data values of 0 (when customers didn't tip).

    2. The shapes of the distributions of the data sets I'm comparing aren't real similar (let alone identical) so I don't feel comfortable using the Mann Whitney U test, Kruskal-Wallis test, or Mood's Median test (because they assume identically shaped distributions).

    I just found out about a non-parametric test called the randomization test (known by others as the permutation test or exact test) as mentioned here:

    https://www.youtube.com/watch?v=BvdNZNl09eE

    Unfortunately, this test doesn't seem to be incorporated into Minitab. (I may be mistaken on that. I hope I am.) And there isn't a lot of guidance on the internet about how to perform this test (which makes me worry that some have a concern about it's validity).

    I would appreciate any and all solutions to these problems. I've been working on this project for quite a while and am a bit frustrated. Thank you for your help in advance!

  2. #2
    TS Contributor
    Points: 17,773, Level: 84
    Level completed: 85%, Points required for next Level: 77
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,541
    Thanks
    56
    Thanked 640 Times in 602 Posts

    Re: Which Non-parametric Significance Test Should I Use?

    I have a few hypotheses I would like to test. For example, after examining the data I've noticed
    There's a fundamental problem here. If you examine data,
    find some intersting pattern and the apply a significance test,
    then the p-values are distorted. The significance test does
    not take into account that implicitly dozens or hundreds of
    possible associations have been checked before, by eyeballing
    the data. It is very difficult to distinguish between truely
    significant results and chance results in that case.

    1. The sets of data are pretty skewed and don't pass normality tests
    Not data have to be normally distributed, but data within each group
    should be sampled come from a normally distributed population. But with
    n=300, this assumption is no more important. What's more important, IMHO,
    is whether the mean is a good representation of the data in case of extremely
    skewed distributions. Your idea to use U-test, Median test or H-test could be
    a good alternative. Why these rank-based tests should require identical
    distributions, I don't know. They are non-parametric tests, so distributional
    assumptions play no role, AFAICS.

    With kind regards

    K.

  3. The Following User Says Thank You to Karabiner For This Useful Post:

    13gentj (06-08-2016)

  4. #3
    Points: 15, Level: 1
    Level completed: 29%, Points required for next Level: 35

    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Which Non-parametric Significance Test Should I Use?

    Karabiner,

    I really appreciate your reply. I'm a little confused about why finding a pattern and then checking it with a significance test isn't good. In my mind it shouldn't matter because organizing the tips according to whether someone ordered over the phone or through a device should produce the same results as going up to all 300 of my customers and saying "Hi, did you order over the phone or via a device? Have a great day." Am I making an incorrect assumption here?

    So you're saying that, to use a t test, the distribution of the 300 orders has to be normally distributed rather than both the 160 device orders and 140 phone orders each being normally distributed?

    I was unpleasantly surprised when I learned about the assumptions of those non-parametric tests. Here's a quote:

    "...However, for a Mann-Whitney U test to be able to provide a valid test of the difference between two medians, both distributions must be the same shape..."
    https://statistics.laerd.com/minitab...ng-minitab.php

    So I think what non-parametric means is that it's free from assuming any particular distribution but whatever the two distributions are, their shape has to be the same. :/

    Again, I really appreciate your help and ask for your continued guidance. What's the next step I should take or what would you do in my situation?

    Thank you,
    Joel

  5. #4
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Which Non-parametric Significance Test Should I Use?

    hi,
    just to explain a bit more the first point: if you have a collection of measurements you will always see some patterns in it. Testing the significance of a pattern in the same dataset where you noticed it is a problem because of course the pattern will be unusual but you picked it out if a huge number of other possible patterns so in reality you performe a large number tests before proceeding to the formal one . Multiple test increase the chances of a false positive a lot, which is why your p value will not mean anything.

    E.g. looking at tips you probably could have noticed that blondes tend to tip more, or that people living in houses painted yellow or those having a large dog or or or. These are all possible patterns that were unconsciously tested and rejected - but this is making your one test part of a huge number of multiple tests and basically invalidates your p value.

    regards

  6. #5
    TS Contributor
    Points: 17,773, Level: 84
    Level completed: 85%, Points required for next Level: 77
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,541
    Thanks
    56
    Thanked 640 Times in 602 Posts

    Re: Which Non-parametric Significance Test Should I Use?

    So you're saying that, to use a t test, the distribution of the 300 orders has to be normally distributed rather than both the 160 device orders and 140 phone orders each being normally distributed?
    The other way around. If you want to compare phone versus device,
    then within each group the dependent variable should be be sampled from
    normally distributed populations.But with a total n=300, normality
    assumptions are not necessary.

    "...However, for a Mann-Whitney U test to be able to provide a valid test of the difference between two medians, both distributions must be the same shape..."
    If the U-test is used as a test between medians, then maybe the assumption
    of equal shapes is necessary. But the U-test is not a genuine test for medians,
    it just tests whether the values from one group tend to be higher than those in the
    other group. For medians one can use the median test.

    With kind regards

    K.

  7. #6
    TS Contributor
    Points: 40,621, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,368
    Thanks
    232
    Thanked 301 Times in 225 Posts

    Re: Which Non-parametric Significance Test Should I Use?


    Just to supplement Karabiner's points. Indeed, the MW test (as formulated in the original 1940s paper) was not 'designed' to be a test of medians, rather a test to assess if there is a tendency for the values of one group to score higher than the values of the second group (or viceversa). So, if you are happy with the idea of testing that, you can use MW test even if the two distributions do not have the same shape. And you could also use some measure of effect size to gauge if the difference is small, medium, or large. You can find some info (in the context of the use of R) in my website's page: http://cainarchaeology.weebly.com/r-...tney-test.html.
    Should you wish to test for a difference in central tendency, the assumption about the similarity in shape of the distributions must hold.

    Hope this helps,
    gm
    http://cainarchaeology.weebly.com/

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats