Hey everyone,
I have a data set which is log normally distributed and want to detect outliers, so far I used a threshold approach:
vector contains the values of interest.
outlier detected if:
value > ((2*(median absolute deviation(vector))/0.6745) + median(vector)
My problem is: is this even applicable? I stumbled upon a statistics site which listed the method above under methods that assume normality, so not my case.
If that is so, would anyone have any suggestions as to how to transform the data into a normally distributed set? Log transformation seems not to be fitting in this case, as I would also divide by a median based on log values (log / log).
I would of course also accept a different outlier detection method if somebody has suggestions.
Any help is greatly appreciated, many thanks in advance.
Rene
I have a data set which is log normally distributed and want to detect outliers, so far I used a threshold approach:
vector contains the values of interest.
outlier detected if:
value > ((2*(median absolute deviation(vector))/0.6745) + median(vector)
My problem is: is this even applicable? I stumbled upon a statistics site which listed the method above under methods that assume normality, so not my case.
If that is so, would anyone have any suggestions as to how to transform the data into a normally distributed set? Log transformation seems not to be fitting in this case, as I would also divide by a median based on log values (log / log).
I would of course also accept a different outlier detection method if somebody has suggestions.
Any help is greatly appreciated, many thanks in advance.
Rene