1. ## Probability of duplicates

Suppose I have a bag of N balls numbered from 1 to N. If I draw K balls at random with replacement, how do I calculate the expected number duplicates?

Thanks

2. ## Re: Probability of duplicates

First you have to give a more precise meaning of duplicate first - do you mean the number that appear more than once? (i.e. not only twice, also including the case with 3, 4, ..., up to times)

Let be the number of appearances of the number

Let be the indicator of the number appear more than once, (i.e. if and only if ; otherwise = 0)

Then the total number of duplicated numbers is

Although are dependent, the linearity of expectation is not affected:

Note that

Finally you multiply this by (as the expectation of each is the same), you obtain the required answer.

3. ## The Following User Says Thank You to BGM For This Useful Post:

Jennifer Murphy (10-29-2015)

4. ## Re: Probability of duplicates

I am defining "duplicate" to mean drawing a ball that has already been drawn. I am interested in calculating the expected number of such duplicates.

I don't care how many times each ball has been drawn, just how many times a ball has been drawn that had been drawn previously.

Suppose I have 3 balls in my bag: 1, 2, & 3. Here are some sample draws and my definition of the number of duplicates:
Code:
Ball   #Dups
3       0
2       0
3       1
3       2
2       3
1       3 All 3 balls have now been drawn  1       3
3       4
2       5
1       6

Once all 3 balls have been drawn, every draw is a duplicate.
If I implemented it correctly, your formula seems to be calculating something else. Here's the results I get for the bag of 3 balls:
Code:
 N    K   F(N,K)
3    1    0.00
3    2    0.11
3    3    0.26
3    4    0.41
3    5    0.54
3    6    0.65
3    7    0.74
3    8    0.80
3    9    0.86
3   10    0.90

When I draw the first ball, the expected number of duplicates is zero, since there is only 1 ball.

When I draw the second ball, the expected number of duplicates is 1/3 or 0.3333. There is 1 chance that the new ball is the same as the first and 2 chances that it is different.

When I draw the third ball, it gets more complicated. I calculate the expected number of duplicates as 5/9 = 0.5555. If the second ball was a duplicate, there 1 chance that the third is as well. If not, there are 2 chances for each of the two scenarios.

After that, it gets a lot more complicated.

I ran a little simulation to check these values. Here's the data. The P(Dup) column is the probability of that draw being a duplicate. The Cum column is the cumulative probabilities, which is the estimated number of my kind of duplicates after that number of draws.
Code:
 N    K   P(Dup)   Cum
3    1   0.0000  0.0000
3    2   0.3328  0.3328
3    3   0.5555  0.8883
3    4   0.7048  1.5931
3    5   0.8019  2.3950
3    6   0.8675  3.2625
3    7   0.9127  4.1752
3    8   0.9240  5.1172
3    9   0.9607  6.0778
3   10   0.9741  7.0520

This agrees with my first two calculations.