# Thread: Sampling with replacement - probability problem

1. ## Sampling with replacement - probability problem

Hi all,

(this is my first post, hopefully I am not breaking any rules).

I have the following problem that I cannot solve:

I have a set of 'n' different elements. I sample 'k' elements at random, then I put them back in. Then I sample again 'k' elements at random, and I put them back in. And so on. I repeat the operation 'r' times. What is the probability that I picked each of the 'n' elements at least once?
I just need a formula that gives the probability as a function of 'n', 'k' and 'r'. If you can prove it, it's a welcome bonus.
Thanks.

2. ## Re: Sampling with replacement - probability problem

So to dumb it down, you have 10 numbered balls, you grab a sample less then the total number of balls repeatedly. After you select a single ball, you are not returning it to the bag until after the sample is full at k, making it ineligible to be in a single sample more than once?

So you can't have 1,2,2,2,4,10, since 2 was not returned during that sample selection.

3. ## The Following User Says Thank You to hlsmith For This Useful Post:

Bep75 (04-17-2016)

4. ## Re: Sampling with replacement - probability problem

I would try to find the complement of the event. What is the probability of not sampling each element at least once?

5. ## The Following User Says Thank You to Englund For This Useful Post:

Bep75 (04-17-2016)

6. ## Re: Sampling with replacement - probability problem

Hi,
the sort of brute force approach would be to imagine n slots where we are randomly putting a total of kr stones into them. Each slot can have between 0 and kr stones, the only constraint being that the sum of stones across all n slots must always be kr.

So the probability of having each slot occupied would be (number of ways kr can be particioned into n integers, without any of them being 0)/ (number of ways kr can be partitioned into n integers with some possibly being 0)

Partitioning an integer into a given number of sub-sums is a pretty heavy combinatorial problem if I recall correctly

On thinking this over - the probability can be expressed in a formula like:

(p(kr,n) - Sum(p(kr, n-i))/p(kr,n) which is

1-Sum_i(p(kr, n-i)/p(kr,n)

where p(N,r) is the nzmber of ways N can be partitioned into r terms. https://en.m.wikipedia.org/wiki/Part...diate_function

Regards

7. ## The Following User Says Thank You to rogojel For This Useful Post:

Bep75 (04-17-2016)

8. ## Re: Sampling with replacement - probability problem

See if I correctly understand the model.

The probability that a particular group of elements are not selected in a particular operation is

So if they are not selected in all operations, then the probability will be

As what Englund hinted, let be the event that the -th element is selected at least once out of the operations, . The required probability is

where the second inequality is the de Morgan's law and the last equality is given by inclusion-exclusion principle.

9. ## The Following User Says Thank You to BGM For This Useful Post:

Bep75 (04-17-2016)

10. ## Re: Sampling with replacement - probability problem

Hi all,

thank you for your replies, very much appreciated.
I'll give you some background to this problem, to see if we got it right.

There is a popular blog (in Italy), which posts satire jokes about the news. It is crowd sourcing: users send jokes through a forum, and the supposedly best ones are periodically posted on the main page of the blog. Normally a post consists of 25-30 jokes, but sometimes a piece of news is so popular that more jokes that that are selected. This would make the post bulky, and the reader would have to keep on scrolling down.
So the blog's administrator thought he had a 'brilliant' idea: in those cases, he selects say 40-50 jokes (n) but he only posts 8 of them (k), randomly selected. Every time the reader refreshes the page, 8 jokes are again selected at random among those 40-50. So I think this is a sampling with replacement situation.
I'd like to show that administrator that his 'brilliant' idea is actually a dumb one, and that if he used a sampling without replacement algorithm, everyone would be happier.
But to do that I need the formula I asked for.
I'll try BGM's solution to see how many times (r) one has to refresh the page to have a 95% probability (p) of having read all the jokes at least once. I guess the F5 key will be so worn-out that you can see New Zealand through it.

Thanks again!

11. ## Re: Sampling with replacement - probability problem

hi,
this is then a rather simple proposition:you only need to simulate the selection a few
thousand times and show him the results. In my opinion that would be by far more convincing then any formula - and much easier to do.

regards

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts