(Reasonable) way to simulate outliers?

spunky

Super Moderator
#1
hi!

so i've been wanting to tackle a project related to Spearman's rho through simulation and i think it would be reasonable to include an "outliers" condition since this is one of the reasons as for why people prefer Spearman's rho (or Kendall's tau) over the regular Pearson's r.

the problem i'm facing is that outliers is really not my area of expertise so i'm looking around to see what people do to simulate outliers. so far, i've found with two types of strategies:

1) people sample from some regular distribution (say a normal distribution) and then randomly select a proportion of the datapoints to be inflated by some constant factor (like they double the values, or triple them)

2) tukey apparently proposed using mixtures of normal distributions for robustness studies. so you get one general population (from where you get, i dunno, 80%-90% of your sample) and the rest come from another that has larger mean/variance to simulate the 'outliers'.

i'm just wondering whether these two would be the only two 'general' ways to go about it? are they reasonable at all?

i'm really a fish out of the water when it comes to simulating this stuff and i just wanna make sure i'm not missing something obvious.
 

noetsi

Fortran must die
#2
As you know I know little about simulation, but why can't you just add points that are 2 (or 3) standard deviations from the mean to the sample? Obviously the more you add, and the further from the original data, the more it will change the mean and standard deviation potentially no longer being outliers - but I would think that would always be an issue in adding points to a known distribution.

Dason should laugh himself silly when he sees this suggestion :(

Well for one thing if you are going to use ordinal data you won't even be able to generate a mean will you:p