TASK / DATA DESCRIPTION

I need to determine if data points in a Test Population are outliers, where an outlier is defined as data points which fall outside of a confidence interval determined in a Control Population. The confidence interval may describe two separate variables (one a date, the other a $ value), but should generally describe 68.2% of the data.

Neither data set is normally distributed and both contain a very long right tail.

MY UNDERSTANDING OF BEST APPROACH

Because of the large presence of outlier data, I think that i need to use “robust” descriptors for my data (which I understand have higher breaking points for outliers). I’m pretty sure that the median is a much better substitute than a mean or weighted mean average.

Regarding setting up my confidence intervals, I have read that the Median Absolute Deviation (MAD) is used in such circumstances and it does in fact provide more useful descriptions of my data when applied. I does not, however, help me understand what % of data should be described by +/- 1 MAD.

I have read elsewhere on this web site that I can also use IRQ. My best guess is that I’d assess where 68.2% of the Control Population data is bound by IRQ and that I should ensure that 32.1% of the data points are on either side of the median. After this, I should apply the intervals to my Test Population data to identify outliers.

Does this sound right or is there a formula/application of MAD that I should also consider? I've also seen applications of Cumulative Binomial Prob which looked interesting.

Many thanks!!!