# when to use the median and when the mean ?

#### Yankel

##### New Member
Good morning,

I have an assignment, I need to specify with examples, in which situation the mean is the best central measure and in which the median is (I also need mode but I know that).

I read that when having an outlier, the median is better, and it makes sense, if I check salaries of people and sample Bill Gates, I will get a huge mean. According to this logic, why and when should I use the mean ? Isn't it better to use the median all the time ? I mean, if there are no outliers, why using the mean ?

#### tokai

##### New Member
Median is a better measure of central tendency when there are outliers in the data. The mean is vulnerable to outliers -- that is to say that the mean can be skewed in the direction of the outlier. So for your income example, let's imagine that you have a sample of 50 individuals and their yearly salary is reported. If 49 people have a yearly salary between \$50,000 and \$60,000 and then Bill Gates (who happens to be sampled) has a yearly salary of 1.1 billion....your mean is going to be heavily skewed upwards to reflect the outlying salary...thus a median will give you a more reliable measure of central tendency as it remains unaffected by outliers.

hope this clears things up.

#### Dason

I think because unless our population distribution is not perfectly symmetrical, or its size is reasonably large, median cannot indicate the center point. Consider those small asymmetries in one small population as micor-outliers which can make the median doesn't work perfectly, unless the number of those micro-outliers with positive and negative effect on location of median gets very high (neutralizing each other) or get zero. However, those micro-outliers don't affect mean because in calculating mean all of values are actually summed up, but in calculating median we only check which value is in the middle.
You don't define what you mean by micro-outliers but your whole post just feels wrong. Median as tokai points out is better in the case when there are outliers because the outliers don't affect it as much.

#### noetsi

##### Fortran must die
If your data is influenced by non-normality (be that skew, outliers etc) medians are commonly better measure of central tendency. But there are better ones than that (winsorized means are commonly suggested as is simply transforming your data to deal with skew, outliers etc).

#### Yankel

##### New Member
thanks guys.

so yes, the median is better when the data is skewed or having outliers, but when do I use the mean then ? if the data is symmetric without outliers, the median and mean are almost equal, aren't they ?

when do I use the mean and why not median ?

#### Rhodo

##### New Member
we also use the mean because it has the property that if it is subtracted from all numbers in the set, and these differences are squared and summed up, we obtain a number called the least sum of squares. this is crucial for the calculation of the variance and standard deviation.

#### bryangoodrich

##### Probably A Mammal
You could technically take each value's squared distance from the median and operate on that value. What meaning or use it has, maybe smarter people than myself will know! But the mean has nice properties, no doubt.

#### Rhodo

##### New Member
I thought about that too, but i'm also not sure if there would be a point. i don't think I know enough at this point to really speculate, perhaps someone else could!

#### gianmarco

##### TS Contributor
Hi,
just to add a link to an earlier discussion on a similar topic.

Regards
Gm

#### Dason

I'm not sure I agree that the mean relies on an assumption of normality. There are many cases where using the mean is better than using the median and the data isn't normal.

#### gianmarco

##### TS Contributor
Dason,
I was relying upon what I read in a book (author R. Wilcox). I am here to widening my knowledge and to confront my views with those of others.
Thanks for providing fuel for further speculations.

Gm

#### Jake

I lean toward Dason's view. It's not obvious at all why taking a mean implies normality. Although I can't formally prove it, it seems intuitively the case that the mean should be an efficient estimator for any symmetrical distribution.

#### Yankel

##### New Member
thanks everyone, the discussion is interesting.

so I understand from you that if I calculate an expression like x-mean vs. x-median, and I square it, sum it and divide by n, for the mean I will get a smaller number ?

thanks again

#### Rhodo

##### New Member
no problem. and yes, using the mean will give you the lowest possible sum of squares for that set of numbers.

#### BGM

##### TS Contributor
If the population is a symmetric distribution with the population mean exist, then it is equal to the population median, and both sample mean and sample median are consistent estimator.

In such situation, one advantage of the sample mean over the sample median is that it is a more efficient estimator. Of course, the sample median is more robust as mentioned by many people above.

#### Dason

Yes - one property that the mean has is that it gives the smallest sum of squared. The proof isn't difficult.

#### Rhodo

##### New Member
When you say the sample mean is a more efficient estimator, what exactly do you mean?

If the population is a symmetric distribution with the population mean exist, then it is equal to the population median, and both sample mean and sample median are consistent estimator.

In such situation, one advantage of the sample mean over the sample median is that it is a more efficient estimator. Of course, the sample median is more robust as mentioned by many people above.