It sounds like you're looking for a sample size calculator for proportions.
I'm creating an excel spreadsheet for our office pool. If you're not familiar with the College Bowl Pick'em game, the rules are as follows...
There are 41 bowl games. Each participant picks the winner of each game and assigns Confidence Points to each game. These Confidence Points are from 1~41 and you can't duplicate. If your pick wins the game, you get the number of Confidence Points assigned to that game...if your pick loses, you get 0. So, you put 41 points on your most confident pick, and 1 on your least confident pick. Most points at the end wins. Same rules as ESPN.com's contest that they do every year.
I've run this pool the last few years, but it always bugged me how we'd get halfway through and if you were in the middle of the pack, or even dead last, you didn't know if you had a shot at winning, and if so, what that chance was. I've decided to fix it this year. My goal is to calculate the percentage chance of each participant winning the pool after each game is played. Ideally, I would take all the possible outcomes of all 41 games (2^41...or about 2.2 TRILLION), see the number outcomes where each participant won, and give percentage chance of winning that way. This first calculation would of course give even percentage for each participant since no games had been played. After the first game, the number of possible outcomes would be 2^40, so I'd re-run the numbers. After the second game, it would be 2^39 and so on. Sounds easy, but excel can only handle so much data. I'm sure the formulas I'm using could be improved upon, but in the end, to use this approach and have excel run smoothly, I could only do about 13 games,so (2^13), 8,192 outcomes. Well, that's not good enough. I want to know after every game, including the earlier ones.
I then decided to calculate the probability with a different model...instead of trying to create a chart of all the possible outcomes, what if I randomly generated, say, 20,000 different outcomes...and take the percentage chance of each participant winning. I did that and looked at the numbers vs the numbers with the "all outcomes" model. I looked at the percentages with 13 games left, and the "20,000 model" did a pretty good job vs the "all outcomes" model. It was with 1-2% for the most part. Obviously as you get down to only a handful of games left, then it's basically 100% accurate. The main issue is the more outcomes I generate, the more excel slows down.
All that said...here is my question! If I use the second model, how many random outcomes do I need to generate in order to obtain...let's say...within 0.5% accuracy? I don't know how many participants I'll have...let's assume 20. It seems that after each game is played, I would need less and less number of outcomes. After all, I don't need to flip a coin 20,000 times to tell me the odds are 50/50. At a minimum, I would like to be able to get the percentages after the first day of bowl season, in which 5 games are played. Oh, also, not sure if it matters, but I added functionality to the spreadsheet that allows the user to input odds for each game, since each game isn't really 50/50. Not sure if that affects anything.
Thanks in advance!
It sounds like you're looking for a sample size calculator for proportions.
I don't have emotions and sometimes that makes me very sad.
Dason,
I did some googling around and found a few online sample size calculators but I'm not sure how to make it work for my application. Can you elaborate? Thanks.
Tweet |