Probability distribution function for extracting b balls out of a urn.

Radu

New Member
#1
Hello everybody,

First of all I would like to say that I am not a student trying to get his homework solved. Rather I am a software developer trying to implement a game that has a probabilistic side... In case it makes any difference :D.

The abstraction of the problem is pretty simple and hope someone more seasoned in statistics can point me in the right direction:

Imagine an urn with N balls, of which B balls are white and N-B are black. I draw M balls out of the urn with replacement, however, every time I get a white ball i paint it black (or replace the white ball with a black one). I want to express the probability that I have exactly X white balls remaining in the urn (obviously 0 <= X <= B) after the M trials.

If it makes any difference B is small (<100), N is very large (N>10^5) and M/N is in (0, 1). So limits could be applied if needed..

After a bit of wiki reading the problem seems very close to a Hypergeometric distribution but in my case the probabilities of drawing a white/black balls does not change in the same way.
In my case the probability for drawing white ball goes: B/N -> (B-1)/N -> ... -> 1/N -> 0 when drawing a white ball, and remains constant when drawing a black one. In the example of the hypergeometric distribution B/N -> (B-1)/(N-1) -> ... -> 0/(N-B) if I draw white and B/N -> B/(N-1) -> ... -> B/(N-M) if i draw black. But experimentally I see the probability distribution I'm looking for has a shape similar to a hypergeometric probability distribution..

Remembering my high school days I could calculate the average number of balls as a function of M/N and verified this by simulation. Also through simulation I plotted the probability distribution, and created an empirical model that i use now. But it could be way cooler to have some sort of a algebraic formula.

I also tried expressing the problem as a bayesian network where i can compute probabilities faster than by simulating. It works quite well but does not give me an algebraic function...

Another approach I think is feasible is a ``brute" counting all possible draw sequences. Each ball can be consider a digit in base N and the sequence of M draws results in a number in base N with M digits => I have ~N^M possible draw sequences. Imagine also the digits given to the white balls are numbered 1 to B. The problem is that I can not count all the numbers between [1, N^M) that have any X digits in the set of the first B digits, but not any of the rest of the (B-X) digits. Do you think this is a better approach?

Needles to say, any advice is greatly appreciated.

Thank you,
Radu
 

BGM

TS Contributor
#2
You have an accurate description of the problem, and it can be described as a Markov chain, with the transition matrix \( P \) is a bidiagonal matrix.

Now the distribution after \( M \) draws is completely described by \( P^M \)
 

Radu

New Member
#3
Sorry for the late answer, I'm new and I though I will get a "heads up" email when I get a reply to the post. It did not happen also it seems there is no quick way to find your posts. Anyway..

Your advice is very good, I forgot that I can do that with Markov chains. I got one step further to an analytical expression for the probability distribution: I have now a formula for q*P^M (q - initial state [1 0 ... 0]) but it involves sums, products (from 1..M) and I do not know how to calculate the limit or come up with an expression that is non-recursive.
My plan right now is two fold:
- do a similar calculation for the urn extraction without replacement problem and try to find similarities. My scenario can be abstracted to an urn extraction with partial replacement problem (replace black only) and perhaps I can find similarities.
- try to prove that there is no algebraic formula (I'm trying to reduce the problem of calculating a member of q*P^M to calculating the ways of partitioning an integer)

I will be back with the actual formulas once I write everything nicely in latex and do the comparison with hypergeometric distribution.

Thank you very much,
Radu
 
#4
I would try to compute the probability as the ratio of the number of favorable cases to the total number of possible cases.
Let's start from the total number of possible cases. It is equal to the number of combinations with repetition of M objects from N:
(N+M-1)! / [ (N-1)! * M! ]
Now, denote by
L=B-X
the number of white balls that you need to paint in order to achieve your goal of X white balls remaining in the urn.
The combinations with repetition that allow to achieve your goal can be built in two steps:

STEP 1: choose a combination without repetition of L white balls from B.
STEP 2: choose a combination with repetition of M-L balls from an urn containing the L white balls you have already chosen and the N-B black balls.

So, if you set
N1=L+N-B
M1=M-L
the total number of favorable cases is
{(B)! / [ (B-L)! * L! ]} * {(N1+M1-1)! / [ (N1-1)! * M1! ]}
 
#5
You have an accurate description of the problem, and it can be described as a Markov chain, with the transition matrix \( P \) is a bidiagonal matrix.

Now the distribution after \( M \) draws is completely described by \( P^M \)
I am afraid transition probabilities are state dependent, which complicates this approach quite a bit. See my previous post for an alternative.