# Thread: p-value for nearest neighbor classification

1. ## p-value for nearest neighbor classification

Hi all,

For the past few days, I have been trying to compute a p-value for some experiment I ran recently, and I simply cannot figure it out.

The experiment is very simple:
- Given a population of 2000 individuals, that can belong to one of N different classes
- Given a distance metric between them
- I use a leave-one-out method to compute the prediction accuracy I can obtain using a nearest-neighbor method (i.e. given one individual I predict its class based on the class of the closest individual)
- This gives me a % of correct predictions.

Now, what I can't figure our is whether I should compute a p-value for each individual prediction or one for the whole experiment.

My first guess is that:
- my "null hypothesis" is that the distance metric and the class of the individuals are not related
- my "test statistic" is the % of correct predictions
- so the p-value is the probability of getting a % of correct predictions as high as the one I have by pure chance.

Is this correct?

Thanks in advance for any help I can get on this!

2. ## Re: p-value for nearest neighbor classification

You can obtain p-values for each group (I cannot see the benfit for each individual). Using R you can get this with a wlad test in the vegan package, - you can run a SIMPROF (similarirty profile) test which wil also do it using the package "clustsig". If you dont use R but have access to PRIMER, this programme will also did SIMPROF.

3. ## Re: p-value for nearest neighbor classification

Thanks for the response!

I wanted to give this a try, but I cannot see the method "wlad" in the Vegan package. I'm assuming this is the package you referred to, http://cran.r-project.org/web/packages/vegan/vegan.pdf
right?

4. ## Re: p-value for nearest neighbor classification

Wait! I guess you meant the "Wald test", right? ( http://en.wikipedia.org/wiki/Wald_test ).

So, if I understand right, what you are suggesting is that for each of my N classes I use a Wald test to compare the "distance" between elements of that class and the distance to elements of different classes. That actually makes sense, since the Wald test can be compared against a Chi-square distribution, and thus I can get a p-value for each of my classes.

5. ## Re: p-value for nearest neighbor classification

Yep, thats what I meant. Sometimes I have fat fingers on the key board when I type in haste. But yes, I think you are on the right track.

6. ## The Following User Says Thank You to bugman For This Useful Post:

popolon (11-24-2013)

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts