# Thread: Calculating the Probability of a Subset

1. ## Calculating the Probability of a Subset

I'm a newbie and not a stats savvy so help and guidance is appreciated. If I use incorrect terms please forgive and correct.

The data set population is 6530 contests, of which 3791 wins, 2731 loses and 8
ties. Probability is .5812 (Note: Ties reduce the number of contests for
purposes of calculation)

Of the 6530 contests, 6376 were Regular contests, of which 3698 wins,
2670 loses and 8 ties.

The remaining 154 contests were Post Season, of which 93 wins and 61 loses.

So my question is what is the correct method of calculating the probability of each subset? Simple right! Thanks in advance,

Thumper

2. ## Re: Calculating the Probability of a Subset

Probabilities are just the number of events of interest divided by the total number of opportunities for an event. So if you have a subset you just use the new total sample number as the denominator and the number of events per subset as the numerator.

I believe it is really as simple as that.

3. ## Re: Calculating the Probability of a Subset

Originally Posted by hlsmith
Probabilities are just the number of events of interest divided by the total number of opportunities for an event. So if you have a subset you just use the new total sample number as the denominator and the number of events per subset as the numerator.

I believe it is really as simple as that.
So based on that the following would be true;

SEASON WINS CONTESTS PROBABILITY
Regular 3698 6368 0.58071608
Post 93 154 0.603896104
Population 3791 6522 0.581263416

Here is another subset from the same population, there were 16 games played at a line between -18 and -26 that were all Wins. The calculated Probability based on you reply is 1.00 I would not assume that 100% of all contests at that Line would result in a favorable outcome.

So maybe what I'm really asking is how to factor in an adjustment to Probability for Statistical Significance? So if that then makes sense how do I proceed?

Thumper

4. ## Re: Calculating the Probability of a Subset

Excellent question! What you now need to incorporate are confidence intervals around your probabilities. The typical one used is 95% CI. You also need to update your probabilities when new information becomes available. It may be 100%, but eventually they will lose when the subset approaches infinity.

5. ## Re: Calculating the Probability of a Subset

Originally Posted by hlsmith
Excellent question! What you now need to incorporate are confidence intervals around your probabilities. The typical one used is 95% CI. You also need to update your probabilities when new information becomes available. It may be 100%, but eventually they will lose when the subset approaches infinity.
Let me start out by saying thanks for the help. Okay I understand what you are saying. So a quick Google search leads me to a simple calculator of CI. (http://www.gifted.uconn.edu/siegle/r...calculator.htm)

The inputs needed are;

Confidence Level: Input 95% or 99%
Sample Size: (Is this the subset Wins value?)
Population: (Is the Total Population of 6530 or the Population of the subset 154 for the Post or 6368 for the Regular Season Contests?)
Percentage: (Assume this is the Probability of Wins but for the Total Population 58.21% or the subset Probability 60.39% or 58.07% respectively?)
Confidence Interval: Output or the +/- percentage

My next question is, on the same page is another calculator tilted "Determine Sample Size." I assume and hope you will correct me if I'm not interpreting this correctly, that is the sample size is not equal to or greater than this calculator's output then the sample would not be considered statistically significant?

Then my final question is what are the actual formulas for both of these calculators so I can implement into my Excel worksheet?

Thanks for the help, Thumper

6. ## Re: Calculating the Probability of a Subset

Ignore sample size for now. I apologize for the quick email before. You need to look up binomial confidence interval. Binomial means each event has two outcomes (yes or no) . The formula is something simular to p +\- 2(sqaure of p times (1-p)).

Good luck.

7. ## The Following User Says Thank You to hlsmith For This Useful Post:

dofdear (09-27-2015)

8. ## Re: Calculating the Probability of a Subset

Originally Posted by hlsmith
Ignore sample size for now. I apologize for the quick email before. You need to look up binomial confidence interval. Binomial means each event has two outcomes (yes or no) . The formula is something simular to p +\- 2(sqaure of p times (1-p)).

Good luck.
Originally Posted by hlsmith
Ignore sample size for now. I apologize for the quick email before. You need to look up binomial confidence interval. Binomial means each event has two outcomes (yes or no) . The formula is something simular to p +\- 2(sqaure of p times (1-p)).

Good luck.
I really appreciate your guidance. So I've done some reading and research on the subject of Binomial CI. Is the Adjusted Wald CI adequate for my project? I found a pretty simple implementation for Excel at http://www.measuringux.com/adjustedwald.htm and the downloadable file is at http://www.measuringux.com/adjustedwald.xls . If you agree then can we go back to the question of sample size or statistical significance?

So with a population of 6530 and 3791 wins, let say there are multiple subsets under consideration, Regular Season 6368 with 3698 wins and Post season 154 with 93 wins. Adjusted Wald does not take into consideration that the subsets are from a larger population of data. Again, another subset previously mentioned is the Line between -18 and -26 where there were 16 contests and all were wins. Can calculation of Sample Size contextualize the subset against the set population? Or is there another way to get there?

Let's face it, on it's face, a 100% probability looks attractive but when you consider the low sample size some caution needs to be employed, hence the wager would be reduced appropriately.

Thanks again, Thumper

9. ## Re: Calculating the Probability of a Subset

hismith, Thanks so much for your assistance. So here is where I'm at;

1. Using Jenks Natural Breaks to group / classify the data
2. Creating a logical set of subsets, some of which we previously discussed included are Line by week, by Regular and Post Season, contests that were within Division and Conference, day of week and time of day (afternoon or evening), etc.
3. In process of implementing the Adjusted Wald CI and calculation of sample size with know population enhancement.
4. Developing a matrix or decision tree to evaluate future contests that will feed into a Kelly Criterion wager strategy.
So does this all make sense so far? But this is where I run into at least a mental block.

The formula for Kelly is f* =(bp-q)/b b = the Decimal odds -1 p = the probability of success q = the probability of failure (1-p) and f* is the fraction of the current bankroll to wage.

So while I can determine the CI and essentially validate it with sample size, the Kelly Criterion uses only probability in determining the wager. So going back to one of the subset examples previously provided, the Line between -18 and -26 where there were 16 contests and all were wins results in a 100% probability. I actually modified this example to broaden the class to between -15 and -26 where there were 20 contests and 16 wins. And according to the Sample Size output 20 is an adequate sample size. The problem is that the Kelly is suggesting to wager 60% of the available bankroll. Way to much in my view. Suggestions???

Thumper

 Tweet