+ Reply to Thread
Results 1 to 3 of 3

Thread: Probability of duplicates

  1. #1

    Probability of duplicates




    Suppose I have a bag of N balls numbered from 1 to N. If I draw K balls at random with replacement, how do I calculate the expected number duplicates?

    Thanks

  2. #2
    TS Contributor
    Points: 22,410, Level: 93
    Level completed: 6%, Points required for next Level: 940

    Posts
    3,020
    Thanks
    12
    Thanked 565 Times in 537 Posts

    Re: Probability of duplicates

    First you have to give a more precise meaning of duplicate first - do you mean the number that appear more than once? (i.e. not only twice, also including the case with 3, 4, ..., up to k times)

    Let Y_i \sim \text{Binomial}\left(k, \frac {1} {n}\right) be the number of appearances of the number i

    Let X_i be the indicator of the number i appear more than once, i = 1, 2, \ldots, n (i.e. X_i = 1 if and only if Y_i > 1; otherwise X_i = 0)

    Then the total number of duplicated numbers is \sum_{i=1}^n X_i

    Although X_i are dependent, the linearity of expectation is not affected:

    E\left[\sum_{i=1}^n X_i\right] = \sum_{i=1}^n E[X_i]

    Note that

    E[X_i] = \Pr\{X_i = 1\} = \Pr\{Y_i > 1\}

    = 1 - \Pr\{Y_i = 0\} - \Pr\{Y_i = 1\}

    = 1 - \binom {k} {0} \left(\frac {1} {n}\right)^0 \left(\frac {n - 1} {n}\right)^k - \binom {k} {1} \left(\frac {1} {n}\right)^1 \left(\frac {n - 1} {n}\right)^{k-1}

    = 1 - \frac {(n - 1)^{k-1}(n - 1 + k)} {n^k}

    Finally you multiply this by n (as the expectation of each X_i is the same), you obtain the required answer.

  3. The Following User Says Thank You to BGM For This Useful Post:

    Jennifer Murphy (10-29-2015)

  4. #3
    Points: 3,571, Level: 37
    Level completed: 48%, Points required for next Level: 79
    Jennifer Murphy's Avatar
    Posts
    37
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Probability of duplicates


    I am defining "duplicate" to mean drawing a ball that has already been drawn. I am interested in calculating the expected number of such duplicates.

    I don't care how many times each ball has been drawn, just how many times a ball has been drawn that had been drawn previously.

    Suppose I have 3 balls in my bag: 1, 2, & 3. Here are some sample draws and my definition of the number of duplicates:
    Code: 
    Ball   #Dups
     3       0
     2       0
     3       1
     3       2
     2       3
     1       3 All 3 balls have now been drawn  1       3
     3       4
     2       5
     1       6
    
    Once all 3 balls have been drawn, every draw is a duplicate.
    If I implemented it correctly, your formula seems to be calculating something else. Here's the results I get for the bag of 3 balls:
    Code: 
     N    K   F(N,K)
      3    1    0.00
      3    2    0.11
      3    3    0.26
      3    4    0.41
      3    5    0.54
      3    6    0.65
      3    7    0.74
      3    8    0.80
      3    9    0.86
      3   10    0.90   
    
    When I draw the first ball, the expected number of duplicates is zero, since there is only 1 ball.

    When I draw the second ball, the expected number of duplicates is 1/3 or 0.3333. There is 1 chance that the new ball is the same as the first and 2 chances that it is different.

    When I draw the third ball, it gets more complicated. I calculate the expected number of duplicates as 5/9 = 0.5555. If the second ball was a duplicate, there 1 chance that the third is as well. If not, there are 2 chances for each of the two scenarios.

    After that, it gets a lot more complicated.

    I ran a little simulation to check these values. Here's the data. The P(Dup) column is the probability of that draw being a duplicate. The Cum column is the cumulative probabilities, which is the estimated number of my kind of duplicates after that number of draws.
    Code: 
     N    K   P(Dup)   Cum
      3    1   0.0000  0.0000
      3    2   0.3328  0.3328
      3    3   0.5555  0.8883
      3    4   0.7048  1.5931
      3    5   0.8019  2.3950
      3    6   0.8675  3.2625
      3    7   0.9127  4.1752
      3    8   0.9240  5.1172
      3    9   0.9607  6.0778
      3   10   0.9741  7.0520
    
    This agrees with my first two calculations.

    Can your formula be adapted for

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats