+ Reply to Thread
Results 1 to 2 of 2

Thread: (Reasonable) way to simulate outliers?

  1. #1
    TS Contributor
    Points: 22,359, Level: 93
    Level completed: 1%, Points required for next Level: 991
    spunky's Avatar
    vancouver, canada
    Thanked 537 Times in 431 Posts

    (Reasonable) way to simulate outliers?


    so i've been wanting to tackle a project related to Spearman's rho through simulation and i think it would be reasonable to include an "outliers" condition since this is one of the reasons as for why people prefer Spearman's rho (or Kendall's tau) over the regular Pearson's r.

    the problem i'm facing is that outliers is really not my area of expertise so i'm looking around to see what people do to simulate outliers. so far, i've found with two types of strategies:

    1) people sample from some regular distribution (say a normal distribution) and then randomly select a proportion of the datapoints to be inflated by some constant factor (like they double the values, or triple them)

    2) tukey apparently proposed using mixtures of normal distributions for robustness studies. so you get one general population (from where you get, i dunno, 80%-90% of your sample) and the rest come from another that has larger mean/variance to simulate the 'outliers'.

    i'm just wondering whether these two would be the only two 'general' ways to go about it? are they reasonable at all?

    i'm really a fish out of the water when it comes to simulating this stuff and i just wanna make sure i'm not missing something obvious.
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  2. #2
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Thanked 915 Times in 874 Posts

    Re: (Reasonable) way to simulate outliers?

    As you know I know little about simulation, but why can't you just add points that are 2 (or 3) standard deviations from the mean to the sample? Obviously the more you add, and the further from the original data, the more it will change the mean and standard deviation potentially no longer being outliers - but I would think that would always be an issue in adding points to a known distribution.

    Dason should laugh himself silly when he sees this suggestion

    Well for one thing if you are going to use ordinal data you won't even be able to generate a mean will you
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

+ Reply to Thread


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Advertise on Talk Stats