+ Reply to Thread
Results 1 to 2 of 2

Thread: What kind of distribution am I dealing with (and how can I identify outliers?)?

  1. #1
    Points: 3, Level: 1
    Level completed: 5%, Points required for next Level: 47

    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    What kind of distribution am I dealing with (and how can I identify outliers?)?




    I'm looking at a bunch of data involving how long it takes for an item to get from point A to point B. I have roughly 39,000 items and the associated time for each.

    The problem (or maybe it's not a problem?) is that this data is heavily right skewed. For example, about 1000 of the values got to point B on the same day as it left point A, about 7500 took one day, roughly 9000 took 2 days (over 50% took 5 days or less)...but then I have some items that took 100, 200, 300 even 600+ days to make it from point A to point B.

    What kind of distribution am I dealing with here, and how can I identify outliers in such a population?

  2. #2
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: What kind of distribution am I dealing with (and how can I identify outliers?)?


    Hi Slim, welcome.

    Unfortunately your post was automatically picked up by our spam filter for some reason; I've released it now. Sorry about that.

    To answer your question:
    There do exist distributions that describe continuous variables that are bounded to be non-negative. However, with a sample of 39,000, your robustness to non-normal errors in conventional parametric test would make this a non-issue. What are you intending to do with the data specifically, though?

    Re. outliers: You could still look at cases more than some threshold number of SD's above the mean, but personally I don't recommend subjectively deleting outliers unless it's clear that a case represents a genuine measurement/recording mistake.
    Matt aka CB | twitter.com/matthewmatix

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats