+ Reply to Thread
Results 1 to 5 of 5

Thread: p-value for nearest neighbor classification

  1. #1
    Points: 15, Level: 1
    Level completed: 29%, Points required for next Level: 35

    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts

    p-value for nearest neighbor classification




    Hi all,

    For the past few days, I have been trying to compute a p-value for some experiment I ran recently, and I simply cannot figure it out.

    The experiment is very simple:
    - Given a population of 2000 individuals, that can belong to one of N different classes
    - Given a distance metric between them
    - I use a leave-one-out method to compute the prediction accuracy I can obtain using a nearest-neighbor method (i.e. given one individual I predict its class based on the class of the closest individual)
    - This gives me a % of correct predictions.

    Now, what I can't figure our is whether I should compute a p-value for each individual prediction or one for the whole experiment.

    My first guess is that:
    - my "null hypothesis" is that the distance metric and the class of the individuals are not related
    - my "test statistic" is the % of correct predictions
    - so the p-value is the probability of getting a % of correct predictions as high as the one I have by pure chance.

    Is this correct?

    Thanks in advance for any help I can get on this!

  2. #2
    Super Moderator
    Points: 31,766, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bugman's Avatar
    Posts
    2,255
    Thanks
    290
    Thanked 324 Times in 265 Posts

    Re: p-value for nearest neighbor classification

    You can obtain p-values for each group (I cannot see the benfit for each individual). Using R you can get this with a wlad test in the vegan package, - you can run a SIMPROF (similarirty profile) test which wil also do it using the package "clustsig". If you dont use R but have access to PRIMER, this programme will also did SIMPROF.
    The earth is round: P<0.05

  3. #3
    Points: 15, Level: 1
    Level completed: 29%, Points required for next Level: 35

    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: p-value for nearest neighbor classification

    Thanks for the response!

    I wanted to give this a try, but I cannot see the method "wlad" in the Vegan package. I'm assuming this is the package you referred to, http://cran.r-project.org/web/packages/vegan/vegan.pdf
    right?
    Last edited by popolon; 11-23-2013 at 09:52 PM.

  4. #4
    Points: 15, Level: 1
    Level completed: 29%, Points required for next Level: 35

    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: p-value for nearest neighbor classification

    Wait! I guess you meant the "Wald test", right? ( http://en.wikipedia.org/wiki/Wald_test ).

    So, if I understand right, what you are suggesting is that for each of my N classes I use a Wald test to compare the "distance" between elements of that class and the distance to elements of different classes. That actually makes sense, since the Wald test can be compared against a Chi-square distribution, and thus I can get a p-value for each of my classes.

  5. #5
    Super Moderator
    Points: 31,766, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bugman's Avatar
    Posts
    2,255
    Thanks
    290
    Thanked 324 Times in 265 Posts

    Re: p-value for nearest neighbor classification


    Yep, thats what I meant. Sometimes I have fat fingers on the key board when I type in haste. But yes, I think you are on the right track.
    The earth is round: P<0.05

  6. The Following User Says Thank You to bugman For This Useful Post:

    popolon (11-24-2013)

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats