when to use the median and when the mean ?

#1
Good morning,

I have an assignment, I need to specify with examples, in which situation the mean is the best central measure and in which the median is (I also need mode but I know that).

I read that when having an outlier, the median is better, and it makes sense, if I check salaries of people and sample Bill Gates, I will get a huge mean. According to this logic, why and when should I use the mean ? Isn't it better to use the median all the time ? I mean, if there are no outliers, why using the mean ?
 

tokai

New Member
#2
Median is a better measure of central tendency when there are outliers in the data. The mean is vulnerable to outliers -- that is to say that the mean can be skewed in the direction of the outlier. So for your income example, let's imagine that you have a sample of 50 individuals and their yearly salary is reported. If 49 people have a yearly salary between $50,000 and $60,000 and then Bill Gates (who happens to be sampled) has a yearly salary of 1.1 billion....your mean is going to be heavily skewed upwards to reflect the outlying salary...thus a median will give you a more reliable measure of central tendency as it remains unaffected by outliers.

hope this clears things up.
 

Dason

Ambassador to the humans
#3
I think because unless our population distribution is not perfectly symmetrical, or its size is reasonably large, median cannot indicate the center point. Consider those small asymmetries in one small population as micor-outliers which can make the median doesn't work perfectly, unless the number of those micro-outliers with positive and negative effect on location of median gets very high (neutralizing each other) or get zero. However, those micro-outliers don't affect mean because in calculating mean all of values are actually summed up, but in calculating median we only check which value is in the middle.
You don't define what you mean by micro-outliers but your whole post just feels wrong. Median as tokai points out is better in the case when there are outliers because the outliers don't affect it as much.
 

noetsi

Fortran must die
#4
If your data is influenced by non-normality (be that skew, outliers etc) medians are commonly better measure of central tendency. But there are better ones than that (winsorized means are commonly suggested as is simply transforming your data to deal with skew, outliers etc).
 
#5
thanks guys.

so yes, the median is better when the data is skewed or having outliers, but when do I use the mean then ? if the data is symmetric without outliers, the median and mean are almost equal, aren't they ?

when do I use the mean and why not median ?
 

Rhodo

New Member
#6
we also use the mean because it has the property that if it is subtracted from all numbers in the set, and these differences are squared and summed up, we obtain a number called the least sum of squares. this is crucial for the calculation of the variance and standard deviation.
 

bryangoodrich

Probably A Mammal
#7
You could technically take each value's squared distance from the median and operate on that value. What meaning or use it has, maybe smarter people than myself will know! But the mean has nice properties, no doubt.
 

Rhodo

New Member
#8
I thought about that too, but i'm also not sure if there would be a point. i don't think I know enough at this point to really speculate, perhaps someone else could!
 

Dason

Ambassador to the humans
#10
I'm not sure I agree that the mean relies on an assumption of normality. There are many cases where using the mean is better than using the median and the data isn't normal.
 

gianmarco

TS Contributor
#11
Dason,
I was relying upon what I read in a book (author R. Wilcox). I am here to widening my knowledge and to confront my views with those of others.
Thanks for providing fuel for further speculations.

Gm
 

Jake

Cookie Scientist
#12
I lean toward Dason's view. It's not obvious at all why taking a mean implies normality. Although I can't formally prove it, it seems intuitively the case that the mean should be an efficient estimator for any symmetrical distribution.
 
#14
thanks everyone, the discussion is interesting.

so I understand from you that if I calculate an expression like x-mean vs. x-median, and I square it, sum it and divide by n, for the mean I will get a smaller number ?

thanks again
 

BGM

TS Contributor
#16
If the population is a symmetric distribution with the population mean exist, then it is equal to the population median, and both sample mean and sample median are consistent estimator.

In such situation, one advantage of the sample mean over the sample median is that it is a more efficient estimator. Of course, the sample median is more robust as mentioned by many people above.
 

Dason

Ambassador to the humans
#17
Yes - one property that the mean has is that it gives the smallest sum of squared. The proof isn't difficult.
 

Rhodo

New Member
#18
When you say the sample mean is a more efficient estimator, what exactly do you mean?

If the population is a symmetric distribution with the population mean exist, then it is equal to the population median, and both sample mean and sample median are consistent estimator.

In such situation, one advantage of the sample mean over the sample median is that it is a more efficient estimator. Of course, the sample median is more robust as mentioned by many people above.
 

Dason

Ambassador to the humans
#20
Ah. See most people talk about the median being more robust to outliers because we don't typically care if the stuff in the middle is shifted slightly. But consider the following case:

Dataset 1: 1, 2, 3, 4, 5
Dataset 2: 1, 2, 3, 4, 10000000

The median in both cases is 3. The mean in the first case is 3 and the mean in the second case is... 2000002. Quite a difference. That one outlier significantly changes the mean but the median isn't changed in this case.