+ Reply to Thread
Results 1 to 8 of 8

Thread: Do I even need a statistical method??

  1. #1
    Points: 5,768, Level: 49
    Level completed: 9%, Points required for next Level: 182

    Posts
    203
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Do I even need a statistical method??



    After rethinking my problem with finding outliers in some GPS/SONAR data i started to wonder if i even need to do a statistical test for outliers at all. What i am trying to do is identify potentially erroneous readings and remove them from the data.

    if i were to remove a valid data point the impact on the resulting chart would be virtually nothing (i would be removing one of thousands of data points) therefore i wonder if all i need to do is to simply indentify suspicious data and toss it out? if it was bad, then i will markedly improve my map. if it was good i will hardly degrade the map at all.

    i will try a test of this over the weekend to see what happens if i remove some valid data.

    any thoughts??

    jerry

  2. #2
    TS Contributor
    Points: 13,042, Level: 74
    Level completed: 48%, Points required for next Level: 208
    Awards:
    User with most referrers
    JohnM's Avatar
    Posts
    1,948
    Thanks
    0
    Thanked 4 Times in 4 Posts
    In line with my usual pragmatic viewpoint, I agree....

  3. #3
    Admin
    Points: 13,944, Level: 76
    Level completed: 74%, Points required for next Level: 106
    quark's Avatar
    Location
    Canada
    Posts
    456
    Thanks
    23
    Thanked 138 Times in 57 Posts
    Jerry,

    I don't think it's a problem to toss out the extreme outliers and form a new dataset, as long as you keep most (>99%) of the data points. If there's any problem with the map you can always go back and analyze the original data.

    Tossing out outliers would be a no-no if you are doing biomedical studies, especially clinical trials, where sample size is usually small, and drug interactions, however rare, are critical. Your study should be much more robust.

  4. #4
    Points: 5,768, Level: 49
    Level completed: 9%, Points required for next Level: 182

    Posts
    203
    Thanks
    0
    Thanked 0 Times in 0 Posts
    thanks guys,

    i think i'll try this over the weekend:

    take one of the data sets that i have been using for testing (which contains an extreme outlier) and remove the outlier then remake the map. then i will remove three other data points at random and remake the map again. then i'll post the three maps so that we can compare the effect of removing a known outlier and the effect of removing a valid data point from the set.

    i know that removing the outlier from that set will have a dramatic effect, we'll see about remove some good data. this file is a relatively small data set ~2000 points if i recall correctly taken over a one mile area, give or take, so any negative effects should show up nicely.

    cheers
    jerry

  5. #5
    Points: 5,768, Level: 49
    Level completed: 9%, Points required for next Level: 182

    Posts
    203
    Thanks
    0
    Thanked 0 Times in 0 Posts

    results of test

    Hi,

    so i tested the results of removing the suspicious data points and then replotted.

    the orginal problems created by the bad data were cleaned up nicely!

    so i went ahead and did some trimming of the data, i removed all measurements with depth values below 2, since the depths from 2 feet to the shoreline do not really add any utility to the maps purpose.

    the result: no sigificant chage except for a "smoothing" effect on the shallowest contour line, which is good.

    so then i trimmed some more: i remove 20% of the data, using a convience selection process i removed every fifth data point from the list while it was sorted according to depth. thus i should have removed roughly the same number of points at all depth ranges. the result: some smoothing and almost no loss of definition of the contours.

    this data set contained just short of 5000 data points at the start and ended up with about 3000 points over an area of about 4 square miles.

    I think i am satisfied that simply removing any and all suspicious data will not harm the overall utility of the map. so i am going to work on a method for detecting suspicious data.

    any input is most welcome.

    cheers
    jerry

  6. #6
    Admin
    Points: 13,944, Level: 76
    Level completed: 74%, Points required for next Level: 106
    quark's Avatar
    Location
    Canada
    Posts
    456
    Thanks
    23
    Thanked 138 Times in 57 Posts
    Jerry,

    I think removing data systematically is ok. You can also remove data at random. As long as your map is useful, the less computation the better, just my .02.

  7. #7
    Points: 5,768, Level: 49
    Level completed: 9%, Points required for next Level: 182

    Posts
    203
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quark,

    i found a textbook with a method for detecting outliers in multidimensional data. The method given is basically the method that i had in mind to try anyway (although fully developed ), so i am going to go ahead and follow my idea through and see what point it locates in the data set that i just worked with. there is one section of data that i am worried about taking out useful and needed information so we'll see if that data makes it through or not.

    thanks again for the interest.

    jerry
    Last edited by jerryb; 12-08-2005 at 09:41 AM.

  8. #8
    Admin
    Points: 13,944, Level: 76
    Level completed: 74%, Points required for next Level: 106
    quark's Avatar
    Location
    Canada
    Posts
    456
    Thanks
    23
    Thanked 138 Times in 57 Posts

    Great. Please keep us posted on your progress.

+ Reply to Thread

Similar Threads

  1. What statistical method should I use !
    By Mac2 in forum Statistics
    Replies: 4
    Last Post: 02-21-2011, 06:25 AM
  2. Preference - which statistical method?
    By scully101 in forum Statistical Research
    Replies: 3
    Last Post: 01-15-2011, 02:38 PM
  3. What statistical analysis method to use?
    By Kayleeeliz in forum Statistics
    Replies: 0
    Last Post: 12-07-2010, 02:28 PM
  4. Replies: 7
    Last Post: 03-10-2009, 11:15 AM
  5. Which statistical method to use?
    By Ucuuba in forum Statistics
    Replies: 1
    Last Post: 01-07-2008, 01:37 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats