# Thread: Can I calculate a p-value here?

1. ## Can I calculate a p-value here?

Hey everyone!

Ok so I ran a biochemistry experiment the other day and I wanted to calculate a p-value for my data. The experiment is pretty complex, but what it boils down to is that I have a solution containing about a thousand proteins. Some of these proteins are modified and some aren't. I am trying to determine which proteins in this mixture are modified, but the problem is that since the modification tends to fall off I cant always detect it. I detect these proteins by trying to purify out the proteins containing the modification so idealy I should end up with two solutions for each sample: one solution containing only the modified proteins (lets call this solution A) and one containing only unmodified proteins (solution B). While most of the modified proteins are contained only in solution A, some modified proteins are contained in both A and B since the modification falls off. Based on previous experiments I have a list of proteins that are confirmed to be modified along with their percent distribution amongst the two samples (ranges from 100&#37; A to 45% A).

In case it's not clear, what I mean by percent distribution is that any given modified protein will usually not be seen exclusively in either sample, but rather be distributed amongst the two. For isntance, Protein X may be 95% in solution A and 5% in solution B.

My question is that if I use the percent distributions of the proteins in the list of confirmed proteins as my true data set, can I calculate a p-value for the other non-confirmed proteins based on how their percent distribution between samples A and B compares to the true data set.

If the answer is yes then do you guys know how I might go about to calculate it.

So to give an example of what I want to calculate, lets assume I have 5 proteins (A-E) that I know for a fact are modified and one unknown protein (X) that may or may not be. The percent distribution of the confirmed proteins are:

A = 100% in solution A
B = 100% in solution A
C = 95% in solution A
D = 88% in solution A
E = 75% in solution A

If protein X has a percent distribution of 79% in solution A, can I calculate the probability that X is in fact a modified protein based on A-E?

Thanks again.

2. So I thought it over and I have an idea but it seems too simple. What I'm thinking is that I can calculate a probability based on where the unknown protein lies in the list of known proteins. So for the above example, protein X falls between known proteins D and E so the probability that X may be modified is:

(# known proteins with lower % in A) / (Total known proteins)

= 1/5 = 0.2

Is this valid? Seems too easy

3. So after a little more thought, I'm thinking that I can plot the known data in a histogram and calculate the kernel density estimation. Problem is that I have no idea where to start with the latter part (the kernel density estimation). Does anyone know a good resource for this?

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts