+ Reply to Thread
Results 1 to 8 of 8

Thread: Calculating the Probability of a Subset

  1. #1
    Points: 22, Level: 1
    Level completed: 43%, Points required for next Level: 28

    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Calculating the Probability of a Subset




    I'm a newbie and not a stats savvy so help and guidance is appreciated. If I use incorrect terms please forgive and correct.

    The data set population is 6530 contests, of which 3791 wins, 2731 loses and 8
    ties. Probability is .5812 (Note: Ties reduce the number of contests for
    purposes of calculation)

    Of the 6530 contests, 6376 were Regular contests, of which 3698 wins,
    2670 loses and 8 ties.

    The remaining 154 contests were Post Season, of which 93 wins and 61 loses.

    So my question is what is the correct method of calculating the probability of each subset? Simple right! Thanks in advance,

    Thumper

  2. #2
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Calculating the Probability of a Subset

    Probabilities are just the number of events of interest divided by the total number of opportunities for an event. So if you have a subset you just use the new total sample number as the denominator and the number of events per subset as the numerator.


    I believe it is really as simple as that.

  3. #3
    Points: 22, Level: 1
    Level completed: 43%, Points required for next Level: 28

    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Calculating the Probability of a Subset

    Quote Originally Posted by hlsmith View Post
    Probabilities are just the number of events of interest divided by the total number of opportunities for an event. So if you have a subset you just use the new total sample number as the denominator and the number of events per subset as the numerator.


    I believe it is really as simple as that.
    So based on that the following would be true;

    SEASON WINS CONTESTS PROBABILITY
    Regular 3698 6368 0.58071608
    Post 93 154 0.603896104
    Population 3791 6522 0.581263416

    Here is another subset from the same population, there were 16 games played at a line between -18 and -26 that were all Wins. The calculated Probability based on you reply is 1.00 I would not assume that 100% of all contests at that Line would result in a favorable outcome.

    So maybe what I'm really asking is how to factor in an adjustment to Probability for Statistical Significance? So if that then makes sense how do I proceed?

    Thumper

  4. #4
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Calculating the Probability of a Subset

    Excellent question! What you now need to incorporate are confidence intervals around your probabilities. The typical one used is 95% CI. You also need to update your probabilities when new information becomes available. It may be 100%, but eventually they will lose when the subset approaches infinity.

  5. #5
    Points: 22, Level: 1
    Level completed: 43%, Points required for next Level: 28

    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Calculating the Probability of a Subset

    Quote Originally Posted by hlsmith View Post
    Excellent question! What you now need to incorporate are confidence intervals around your probabilities. The typical one used is 95% CI. You also need to update your probabilities when new information becomes available. It may be 100%, but eventually they will lose when the subset approaches infinity.
    Let me start out by saying thanks for the help. Okay I understand what you are saying. So a quick Google search leads me to a simple calculator of CI. (http://www.gifted.uconn.edu/siegle/r...calculator.htm)

    The inputs needed are;

    Confidence Level: Input 95% or 99%
    Sample Size: (Is this the subset Wins value?)
    Population: (Is the Total Population of 6530 or the Population of the subset 154 for the Post or 6368 for the Regular Season Contests?)
    Percentage: (Assume this is the Probability of Wins but for the Total Population 58.21% or the subset Probability 60.39% or 58.07% respectively?)
    Confidence Interval: Output or the +/- percentage

    My next question is, on the same page is another calculator tilted "Determine Sample Size." I assume and hope you will correct me if I'm not interpreting this correctly, that is the sample size is not equal to or greater than this calculator's output then the sample would not be considered statistically significant?

    Then my final question is what are the actual formulas for both of these calculators so I can implement into my Excel worksheet?

    Thanks for the help, Thumper

  6. #6
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Calculating the Probability of a Subset

    Ignore sample size for now. I apologize for the quick email before. You need to look up binomial confidence interval. Binomial means each event has two outcomes (yes or no) . The formula is something simular to p +\- 2(sqaure of p times (1-p)).

    Good luck.

  7. The Following User Says Thank You to hlsmith For This Useful Post:

    dofdear (09-27-2015)

  8. #7
    Points: 22, Level: 1
    Level completed: 43%, Points required for next Level: 28

    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Calculating the Probability of a Subset

    Quote Originally Posted by hlsmith View Post
    Ignore sample size for now. I apologize for the quick email before. You need to look up binomial confidence interval. Binomial means each event has two outcomes (yes or no) . The formula is something simular to p +\- 2(sqaure of p times (1-p)).

    Good luck.
    Quote Originally Posted by hlsmith View Post
    Ignore sample size for now. I apologize for the quick email before. You need to look up binomial confidence interval. Binomial means each event has two outcomes (yes or no) . The formula is something simular to p +\- 2(sqaure of p times (1-p)).

    Good luck.
    I really appreciate your guidance. So I've done some reading and research on the subject of Binomial CI. Is the Adjusted Wald CI adequate for my project? I found a pretty simple implementation for Excel at http://www.measuringux.com/adjustedwald.htm and the downloadable file is at http://www.measuringux.com/adjustedwald.xls . If you agree then can we go back to the question of sample size or statistical significance?

    So with a population of 6530 and 3791 wins, let say there are multiple subsets under consideration, Regular Season 6368 with 3698 wins and Post season 154 with 93 wins. Adjusted Wald does not take into consideration that the subsets are from a larger population of data. Again, another subset previously mentioned is the Line between -18 and -26 where there were 16 contests and all were wins. Can calculation of Sample Size contextualize the subset against the set population? Or is there another way to get there?

    Let's face it, on it's face, a 100% probability looks attractive but when you consider the low sample size some caution needs to be employed, hence the wager would be reduced appropriately.

    Thanks again, Thumper

  9. #8
    Points: 22, Level: 1
    Level completed: 43%, Points required for next Level: 28

    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Calculating the Probability of a Subset


    hismith, Thanks so much for your assistance. So here is where I'm at;

    1. Using Jenks Natural Breaks to group / classify the data
    2. Creating a logical set of subsets, some of which we previously discussed included are Line by week, by Regular and Post Season, contests that were within Division and Conference, day of week and time of day (afternoon or evening), etc.
    3. In process of implementing the Adjusted Wald CI and calculation of sample size with know population enhancement.
    4. Developing a matrix or decision tree to evaluate future contests that will feed into a Kelly Criterion wager strategy.
    So does this all make sense so far? But this is where I run into at least a mental block.

    The formula for Kelly is f* =(bp-q)/b b = the Decimal odds -1 p = the probability of success q = the probability of failure (1-p) and f* is the fraction of the current bankroll to wage.

    So while I can determine the CI and essentially validate it with sample size, the Kelly Criterion uses only probability in determining the wager. So going back to one of the subset examples previously provided, the Line between -18 and -26 where there were 16 contests and all were wins results in a 100% probability. I actually modified this example to broaden the class to between -15 and -26 where there were 20 contests and 16 wins. And according to the Sample Size output 20 is an adequate sample size. The problem is that the Kelly is suggesting to wager 60% of the available bankroll. Way to much in my view. Suggestions???

    Thumper

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats