Mad = 0

#1
Hello,

I'm trying to use Median Absolute Deviation as a way of determining outliers using this article as a guide.

http://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers

However I'm running into the MAD = 0 problem. Am I not able to use MAD to identify outliers over 50% of my dataset is equal to the median? I wasn't able to interpret the R code the author wrote when he addressed this topic in his article. I'm attempting to do this calculation in sql.
 
#4
@CowboyBear and @Miner, thanks for the replies..

What I'm actually trying to do is establish what an acceptable range of values should be for my data set. Then use that as a baseline to determine if I have outliers in other data sets going forward. Then I was planning on using 1,2,3 etc. MADs away from the median to identify potential outliers. The problem is, most of my values are the same as the median of the data set. Therefore most of my "absolute value of the deviations from the median" are 0, which is making my MAD = 0. If my MAD is equal to zero, am I still able to use this method?
 
#5
If MAD (median absolute deviation) is equal to zero, then that means that you have no deviations and all the values are the same.

If most of the deviations are zero, then you can still use it to identify deviating values (you can call them outliers if you want to, but that does not mean that there is something 'wrong' with these values).

Maybe you have a strange measurement devise, that gives the same value most of the time and only sometimes a different value. An example is when values are rounded to the same value most of the time or a Lickert item with values 1, 2, 3, 4, 5 and when you get "3" almost always.
 

CB

Super Moderator
#6
What I'm actually trying to do is establish what an acceptable range of values should be for my data set. Then use that as a baseline to determine if I have outliers in other data sets going forward.
What decision will you make based on knowing that a particular value is "unacceptable"? Very few statistical methods directly assume an absence of outliers. Outliers can result in violations of other distributional assumptions, but imo in that case it's better to find a model that makes more flexible distributional assumptions than to just delete valid pieces of data...