Hi all,

thank you for your replies, very much appreciated.

I'll give you some background to this problem, to see if we got it right.

There is a popular blog (in Italy), which posts satire jokes about the news. It is crowd sourcing: users send jokes through a forum, and the supposedly best ones are periodically posted on the main page of the blog. Normally a post consists of 25-30 jokes, but sometimes a piece of news is so popular that more jokes that that are selected. This would make the post bulky, and the reader would have to keep on scrolling down.

So the blog's administrator thought he had a 'brilliant' idea: in those cases, he selects say 40-50 jokes (n) but he only posts 8 of them (k), randomly selected. Every time the reader refreshes the page, 8 jokes are again selected at random among those 40-50. So I think this is a sampling with replacement situation.

I'd like to show that administrator that his 'brilliant' idea is actually a dumb one, and that if he used a sampling without replacement algorithm, everyone would be happier.

But to do that I need the formula I asked for.

I'll try BGM's solution to see how many times (r) one has to refresh the page to have a 95% probability (p) of having read all the jokes at least once. I guess the F5 key will be so worn-out that you can see New Zealand through it.

Thanks again!