Ok so I ran a biochemistry experiment the other day and I wanted to calculate a p-value for my data. The experiment is pretty complex, but what it boils down to is that I have a solution containing about a thousand proteins. Some of these proteins are modified and some aren't. I am trying to determine which proteins in this mixture are modified, but the problem is that since the modification tends to fall off I cant always detect it. I detect these proteins by trying to purify out the proteins containing the modification so idealy I should end up with two solutions for each sample: one solution containing only the modified proteins (lets call this solution A) and one containing only unmodified proteins (solution B). While most of the modified proteins are contained only in solution A, some modified proteins are contained in both A and B since the modification falls off. Based on previous experiments I have a list of proteins that are confirmed to be modified along with their percent distribution amongst the two samples (ranges from 100% A to 45% A).
In case it's not clear, what I mean by percent distribution is that any given modified protein will usually not be seen exclusively in either sample, but rather be distributed amongst the two. For isntance, Protein X may be 95% in solution A and 5% in solution B.
My question is that if I use the percent distributions of the proteins in the list of confirmed proteins as my true data set, can I calculate a p-value for the other non-confirmed proteins based on how their percent distribution between samples A and B compares to the true data set.
If the answer is yes then do you guys know how I might go about to calculate it.
So to give an example of what I want to calculate, lets assume I have 5 proteins (A-E) that I know for a fact are modified and one unknown protein (X) that may or may not be. The percent distribution of the confirmed proteins are:
A = 100% in solution A
B = 100% in solution A
C = 95% in solution A
D = 88% in solution A
E = 75% in solution A
If protein X has a percent distribution of 79% in solution A, can I calculate the probability that X is in fact a modified protein based on A-E?