Hi,
i just joined this group and hope to find some help. i'll begin with a breif introduction of myself then my question:
i am a professor of mathematics and physics at a two year college and an avid fisherman. I have been working on a personal research project involving making hydrographic maps form GPS/SONAR data using consumer equipment. My statistics background is limited to that needed for a BS in math and an MS in science education, so i know the basics and can execute the mathematics once i have the right method at hand and i can usually spot when i don't have the right method.
so, now the question:
the data is lattitude, longitude and depth of water. this can be thought of as (x,y,z) triplets where the water depth is always recorded as a positive real number, the lat and long are in a geographical unit known as mercator meters (integers) but i don't think that will matter. lat and long are independant and depth is dependant.
the difficulty that i need to solve is how to detect outliers in the depth value as collected by the sonar unit. on occasion the sonar will record a "bad" data point, this typically happens when there is a sunken tree for example. the sonar will record the depth of the top of the tree rather than the depth of the bottom of the lake where the tree is resting. so the depth reading might be off by 20 feet or more. in a small data set i can simply plot all of the data and see where the outliers are influencing the contours and then weed them out of the data and get a good map. the problem with that method is that once i have this all working the way i want the data sets will likely grow into millions of data points and manual clean up will no longer be practical.
also, the data is collected by driving a boat around on the lake and collecting the GPS and SONAR data along the path of the boat.
so, does anyone have any insight into this type of outlier detection? i have so far been unable to find anyone who has published on thsi type of problem, though my search has not been exhaustive. i do have a book coming on interlibrary loan which might contain something useful.
i do have an idea that i would post for discussion if no one has any ideas.
thanks for any help,
jerry
i just joined this group and hope to find some help. i'll begin with a breif introduction of myself then my question:
i am a professor of mathematics and physics at a two year college and an avid fisherman. I have been working on a personal research project involving making hydrographic maps form GPS/SONAR data using consumer equipment. My statistics background is limited to that needed for a BS in math and an MS in science education, so i know the basics and can execute the mathematics once i have the right method at hand and i can usually spot when i don't have the right method.
so, now the question:
the data is lattitude, longitude and depth of water. this can be thought of as (x,y,z) triplets where the water depth is always recorded as a positive real number, the lat and long are in a geographical unit known as mercator meters (integers) but i don't think that will matter. lat and long are independant and depth is dependant.
the difficulty that i need to solve is how to detect outliers in the depth value as collected by the sonar unit. on occasion the sonar will record a "bad" data point, this typically happens when there is a sunken tree for example. the sonar will record the depth of the top of the tree rather than the depth of the bottom of the lake where the tree is resting. so the depth reading might be off by 20 feet or more. in a small data set i can simply plot all of the data and see where the outliers are influencing the contours and then weed them out of the data and get a good map. the problem with that method is that once i have this all working the way i want the data sets will likely grow into millions of data points and manual clean up will no longer be practical.
also, the data is collected by driving a boat around on the lake and collecting the GPS and SONAR data along the path of the boat.
so, does anyone have any insight into this type of outlier detection? i have so far been unable to find anyone who has published on thsi type of problem, though my search has not been exhaustive. i do have a book coming on interlibrary loan which might contain something useful.
i do have an idea that i would post for discussion if no one has any ideas.
thanks for any help,
jerry