+ Reply to Thread
Results 1 to 4 of 4

Thread: Leverage/outliers in large data sets

  1. #1
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Leverage/outliers in large data sets




    Typically discussions of these and their impact on regression deal with individual points. Commonly I have 10-30 thousand data points and its unlikely that one point will have a large impact. But I have many outliers and jointly a set of such points might influence the results.

    So how do you tell if a set, not one, outlier is influencing the results (ideally in terms of leverage).
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  2. #2
    Points: 3,789, Level: 38
    Level completed: 93%, Points required for next Level: 11

    Posts
    27
    Thanks
    3
    Thanked 1 Time in 1 Post

    Re: Leverage/outliers in large data sets

    I this case I would say looking at your data graphically might be a better option than relying on packages to calculate outliers for you.

    Or, you can try to use stats to create critical cut off values for outliers (from stack overflow): Lund, R. E. 1975, "Tables for An Approximate Test for Outliers in Linear Models", Technometrics, vol. 17, no. 4, pp. 473-476. and Prescott, P. 1975, "An Approximate Test for Outliers in Linear Models", Technometrics, vol. 17, no. 1, pp. 129-132

  3. #3
    Omega Contributor
    Points: 38,423, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,005
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Leverage/outliers in large data sets

    Per my own thoughts which may mirror Iken's links, you may think about temporarily removing the upper or lower ?tile observations and see if there is an affect. ?tile = whatever percentile you decided to define. Also, given the graphical approach you may be able to Color code these ?tiles observation in your graph and if they are way out on the fringe you can better understand them.
    Stop cowardice, ban guns!

  4. #4
    TS Contributor
    Points: 14,811, Level: 78
    Level completed: 91%, Points required for next Level: 39
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,171
    Thanks
    34
    Thanked 405 Times in 363 Posts

    Re: Leverage/outliers in large data sets


    Are you able to attribute these outliers to an assignable cause that would allow you to legitimately remove them? Have you considered robust regression methods?

  5. The Following User Says Thank You to Miner For This Useful Post:

    rogojel (07-28-2016)

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats