+ Reply to Thread
Results 1 to 7 of 7

Thread: Sampling with replacement - probability problem

  1. #1
    Points: 21, Level: 1
    Level completed: 41%, Points required for next Level: 29

    Posts
    2
    Thanks
    4
    Thanked 0 Times in 0 Posts

    Sampling with replacement - probability problem




    Hi all,

    (this is my first post, hopefully I am not breaking any rules).

    I have the following problem that I cannot solve:

    I have a set of 'n' different elements. I sample 'k' elements at random, then I put them back in. Then I sample again 'k' elements at random, and I put them back in. And so on. I repeat the operation 'r' times. What is the probability that I picked each of the 'n' elements at least once?
    I just need a formula that gives the probability as a function of 'n', 'k' and 'r'. If you can prove it, it's a welcome bonus.
    Thanks.

  2. #2
    Omega Contributor
    Points: 38,374, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Sampling with replacement - probability problem

    So to dumb it down, you have 10 numbered balls, you grab a sample less then the total number of balls repeatedly. After you select a single ball, you are not returning it to the bag until after the sample is full at k, making it ineligible to be in a single sample more than once?


    So you can't have 1,2,2,2,4,10, since 2 was not returned during that sample selection.
    Stop cowardice, ban guns!

  3. The Following User Says Thank You to hlsmith For This Useful Post:

    Bep75 (04-17-2016)

  4. #3
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: Sampling with replacement - probability problem

    I would try to find the complement of the event. What is the probability of not sampling each element at least once?

  5. The Following User Says Thank You to Englund For This Useful Post:

    Bep75 (04-17-2016)

  6. #4
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Sampling with replacement - probability problem

    Hi,
    the sort of brute force approach would be to imagine n slots where we are randomly putting a total of kr stones into them. Each slot can have between 0 and kr stones, the only constraint being that the sum of stones across all n slots must always be kr.

    So the probability of having each slot occupied would be (number of ways kr can be particioned into n integers, without any of them being 0)/ (number of ways kr can be partitioned into n integers with some possibly being 0)

    Partitioning an integer into a given number of sub-sums is a pretty heavy combinatorial problem if I recall correctly

    On thinking this over - the probability can be expressed in a formula like:

    (p(kr,n) - Sum(p(kr, n-i))/p(kr,n) which is

    1-Sum_i(p(kr, n-i)/p(kr,n)

    where p(N,r) is the nzmber of ways N can be partitioned into r terms. https://en.m.wikipedia.org/wiki/Part...diate_function


    Regards

  7. The Following User Says Thank You to rogojel For This Useful Post:

    Bep75 (04-17-2016)

  8. #5
    TS Contributor
    Points: 22,410, Level: 93
    Level completed: 6%, Points required for next Level: 940

    Posts
    3,020
    Thanks
    12
    Thanked 565 Times in 537 Posts

    Re: Sampling with replacement - probability problem

    See if I correctly understand the model.

    The probability that a particular group of m elements are not selected in a particular operation is

    p_m = \frac {\displaystyle \binom {n-m} {k} \binom {m} {0}} {\displaystyle \binom {n} {k}} = \frac {(n - m)!(n-k)!} {n!(n-m-k)!} 
= \prod_{j=1}^m \frac {n-m-k+j} {n-m+j} = \prod_{j=1}^m \left(1 - \frac {k} {n-m+j}\right)

    So if they are not selected in all r operations, then the probability will be p_m^r

    As what Englund hinted, let A_i be the event that the i-th element is selected at least once out of the r operations, i = 1, 2, \ldots, n. The required probability is

    P\left(\bigcap_{i=1}^n A_i\right)

    = 1 - P\left(\bigcup_{i=1}^n A_i^c\right)

    = 1 - \sum_{m=1}^{n-k}(-1)^{m-1}\binom {n} {m} p_m^r

    where the second inequality is the de Morgan's law and the last equality is given by inclusion-exclusion principle.

  9. The Following User Says Thank You to BGM For This Useful Post:

    Bep75 (04-17-2016)

  10. #6
    Points: 21, Level: 1
    Level completed: 41%, Points required for next Level: 29

    Posts
    2
    Thanks
    4
    Thanked 0 Times in 0 Posts

    Re: Sampling with replacement - probability problem

    Hi all,

    thank you for your replies, very much appreciated.
    I'll give you some background to this problem, to see if we got it right.

    There is a popular blog (in Italy), which posts satire jokes about the news. It is crowd sourcing: users send jokes through a forum, and the supposedly best ones are periodically posted on the main page of the blog. Normally a post consists of 25-30 jokes, but sometimes a piece of news is so popular that more jokes that that are selected. This would make the post bulky, and the reader would have to keep on scrolling down.
    So the blog's administrator thought he had a 'brilliant' idea: in those cases, he selects say 40-50 jokes (n) but he only posts 8 of them (k), randomly selected. Every time the reader refreshes the page, 8 jokes are again selected at random among those 40-50. So I think this is a sampling with replacement situation.
    I'd like to show that administrator that his 'brilliant' idea is actually a dumb one, and that if he used a sampling without replacement algorithm, everyone would be happier.
    But to do that I need the formula I asked for.
    I'll try BGM's solution to see how many times (r) one has to refresh the page to have a 95% probability (p) of having read all the jokes at least once. I guess the F5 key will be so worn-out that you can see New Zealand through it.

    Thanks again!

  11. #7
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Sampling with replacement - probability problem


    hi,
    this is then a rather simple proposition:you only need to simulate the selection a few
    thousand times and show him the results. In my opinion that would be by far more convincing then any formula - and much easier to do.

    regards

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats