PDA

View Full Version : Is it possible to calculate a binomial distribution with a non-constant p?

jonnybgood
08-25-2010, 11:05 AM
Here's the actual problem I'm faced with. Suppose a segment of dna with 100 mutations (SNPs) which occur at different frequencies from each other and between 2 different populations for the same mutation. The expected number of mutations occuring in the segment of dna is different in either population, and using this difference I can predict from which population the segment originates. I need to determine the intersection/low point between the 2 frequency curves at which I can say, to the left of this point the segment is assigned to Pop 1, to the right of this point it's assigned to Pop 2. I've managed to do this by generating thousands of simulated curves with RND. But this causes the run time to increase by half an hour, which is unacceptable. That's when I started looking into trying to calculate this curve. From what I've read, the binomial distribution is clearly what I want, except for one thing, it assumes p is constant. In my problem, every p is different. Is this possible to calculate? [I have a feeling it isn't]

PS: I've glanced over the Beta-binomial distribution, but it seems to involve a random p. In my example, I have a known value for p for each of the k trials/events, and I need to use those exact p values.

BGM
08-25-2010, 11:40 AM

So say you have a mixtures binomial random variables (subpopulations).

You already know the exact values of each parameter?

You want to find a classification region so that you can classify a group of sample
belongs to which subpopulation, based on the observed sample count?

squareandrare
08-25-2010, 12:58 PM
It's hard to tell if this will help you (as I can't really understand what you're trying to do), but the beta-binomial model is useful for mixing binomial distributions with different proportion parameters.

Edit: oops... I should have read to the end of the post.

jonnybgood
08-25-2010, 11:45 PM
I did a poor job of explaining myself. Suppose 100 events/trials, each with a different (and known) frequency of success (p). I want to calculate the probabilities of the 100 trials resulting in 0 total succesful trials, 1 total successful trial, 2, 3, ... 99, and 100 total successful trials.

BGM
08-26-2010, 12:04 AM
Oh, then you are calculating the probability mass function of the sum of 100 different
Bernoulli trials. I am not sure you have any way to simplify the calculation process.
(I am not familiar with the numerical algorithm)

Basically, you need to calculate

\Pr\{X = 0\} = \prod_{i=1}^{100} (1-p_i)

\Pr\{X = 1\} = \sum_{j=1}^{100} p_j \prod_{i=1,i \neq j}^{100} (1-p_i)

and etc.

There may have some algorithm to reduce the computational burden in
the multiplication and summation.