+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 16 to 22 of 22

Thread: Median with "tied values"

  1. #16
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Median with "tied values"




    I will send you a PM today rather than steal this thread What the goal is in the real world varies from academics....
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  2. #17
    Points: 5,259, Level: 46
    Level completed: 55%, Points required for next Level: 91
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 25 Times in 22 Posts

    Re: Median with "tied values"

    @Noetsi

    I'm not sure I follow your argument here. At the end of the day, the median is just a summary statistic. In other words we are trying to use one (or a few) numbers to describe a data set. When there are a lot of tied values, however, the median, as it is traditionally calculated, falls short of this.

    The example I was struggling with in my previous post was this. Students rated a video and a practical demonstration from 1 to 5 on a questionnaire. One set of results was the following:

    Group 1 shown video

    3 3 4 2 4 4 3 4 2 3 4 3 4 4 2 3 4 3 4 4 4 3 4 4 3

    Group 2 given presentation

    4 4 3 5 4 4 4 4 4 5 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 4 4 4 4 5 5 5 5 5 4

    Now, anyone looking at this dataset would immediately say the presentation did better (got higher scores). And if you do Mann Whitney Test you get p<0.0001 which would validate this.

    But the median of both these datasets is 4. Therefore, in this particular example, the median (as it is normally calculated) does a poor job of summarising the data. It fails because of the large number of tied values in the middle. It just isn't a good summary statistic for this data.

    So what do you do? You could take the mean of the data since the mean is not affected by tied values in the middle but this is technically 'wrong' in this case as Likert data is ordinal and the mean shouldn't be used.

    So, for my purposes, bootstrapping worked well because it told me what the mean of many medians from the samples would be. Comparing the mean of medians (or, alternatively put, median + bias) was much better for my data as it gave me the following:

    Group 1 3.58
    Group 2 4.03

    This, to me, summarises the data much better.

    I guess the confusion arises when we say 'is there a better way to calculate the median'. In a sense, I don't have the median anymore, I have a new statistic (median + bias for want of a better term). But for me, median + bias does a better job of summarising the data than does the median (or mean).

    I guess maybe the OP just worded it badly when he said 'The median is 3.7'. Maybe if it was given a different name, such as the 'interpolated median', it might offend a little less.
    Last edited by SiBorg; 11-19-2011 at 07:02 PM.

  3. #18
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Median with "tied values"

    The point I was making was practical (as nearly always the case). It was that whatever the basic stats text say (and they use the traditional definition of median) is reality to virtually all data analyst outside academics (about 99.99999 percent I would guess plus or minu .0001). And thus to policy makers.

    Even if that is not accurate. In statistics and math reality is absolute. In most other fields its socially defined and the common definitions are what gets used.

    Next we move into existentialism or logical positivism....
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  4. #19
    Points: 5,259, Level: 46
    Level completed: 55%, Points required for next Level: 91
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 25 Times in 22 Posts

    Re: Median with "tied values"

    I can see the point that you are making and I would agree that in many fields in mathematics there are absolute rules. But you can't say that for statistics. As soon as I was taught at school that there were at least 3 ways of measuring the average (mean, median and mode) I developed an instant dislike for stats since there was no longer an 'answer' to the question. I therefore went on to study maths with mechanics instead where there was always 'an answer'.

    Is is only now that I've had to learn to grasp stats for my job and that I've got older and learned that life is full of nuances that I've finally come to appreciate that there isn't always a right answer and that is where some of the beauty lies in statistics.

    Since statistics is about choice of what test to perform, and you can perform any test you like if you are trying to convince policy makers, I can't see how you can argue for a 'reality' when it comes to statistics. This isn't quantum physics, this is a subjective field. There may be absolute rules for generating the results of each individual test, but each test has a set of assumptions and the choice of test is subjective and can only be guided by rules.

    I am interested in presenting my data in the fairest way possible and often go to great lengths to try to find what I feel is the best way to summarise my data. But ultimately you can perform any test you like. I was going to take the mean originally until someone asked me to find the standard error of the median. The only method I could find was the bootstrapping one, and when I implemented it, this interesting thing popped up which was the bias. So, I had stumbled across a better way of analysing my data and implemented it.

    I have shared it here for anyone else who might have the same issues that I had. But, ultimately, for anyone who doesn't care and wants to follow the standard statistics texts, there are loads of them out there and that's entirely their choice.

    This forum, on the other hand, is a great place for those really interested in what their data does (or does not) show. Statistics will always be a subjective field and that is why it is so open to misuse and misinterpretation.
    Last edited by SiBorg; 11-20-2011 at 07:22 AM.

  5. #20
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Median with "tied values"

    Quote Originally Posted by SiBorg77 View Post
    You could take the mean of the data since the mean is not affected by tied values in the middle but this is technically 'wrong' in this case as Likert data is ordinal and the mean shouldn't be used.
    I've heard this kind of sentiment before and never really understood it. Why do you view the mean as being "technically wrong" in this case, and what is it about the median that fixes this problem?
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  6. #21
    Points: 5,259, Level: 46
    Level completed: 55%, Points required for next Level: 91
    SiBorg's Avatar
    Posts
    255
    Thanks
    71
    Thanked 25 Times in 22 Posts

    Re: Median with "tied values"

    The median and non parametric statistical tests (such as Mann Whitney U) rely on ranking the data only, therefore they can cope with ordinal data. The problem with the mean (and its associate statistics) is that they assume, for example, that the difference between 1 and 2 on a Likert scale is equal to the difference between 4 and 5 and we can't be sure of this.

  7. #22
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Median with "tied values"


    I think statisticians believe there is an absolute reality (truth some call it even here) they just disagree on themself what it is. Among many other academic fields, say political science, and nearly all practical operations this is a meaningless concept. Because reality is not absolute, one's view of right and wrong, wise and unwise and what benefits you defines it.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats