EDIT: Please see my last post for my latest revelation/question regarding this problem!
Hi all, first time poster here...
I'm creating a probability table for a board game. My attempt at solving this problem is below and would like others' opinion. I'm not a statistician so I've clawed through online references to arrive at my draft solution. Though with the more research I do, the more I think I may be using the wrong formula. Yet intuitively it feels right. As I'd like to share my product with the gaming community I want to be sure I've got this right!
The Problem:
The game has a mechanism that requires a player to draw three tokens from a bag. Of the three tokens that are drawn, two must be the same. What is the probability? Without going into extraneous detail of the mechanism itself, here's more info:
- The bag can have "X" number of tokens in it at the start of the draw (between 6 and 11 tokens, though I assume that doesn't matter here)
- There are two categories of tokens inside the bag. For the purpose of this question, let's call them "Hit" and "Miss" tokens. The mix ratio of Hit-Miss tokens varies during the game, for each player. For example, for any given draw, Player A may have 2 Hit & 4 Miss tokens in their bag, while Player B may have 4 Hit & 3 Miss tokens.
- The game mechanism calls for a player to draw three tokens from their bag (always three). Of the three tokens drawn, EDIT ADD: at least "Y" tokens must be "Hits" in order to succeed. The game will specify "Y" (either 1,2, or 3). As it's normally two hits that must be drawn for success, I'll use two for this question.
The goal is to build a probability table as a reference for the players so they can determine their chances of success before deciding to draw. As it can be either one hit, or two hits, or three that's needed for success, I assume the easiest way to present the data is to have a separate table for each of the three conditions.
My Draft Solution:
Parameters:
- Calculate the probability of scoring EDIT ADD: at least two Hits from THREE tokens drawn (i.e., am not concerned about displaying the probability of scoring two Hits from the first two drawn).
Given:
- "X" = total number of tokens in the bag at the start of the draw (i.e., "Hit" + "Miss")
- "Y" = total number of "Hit" tokens in the bag at the start of the draw
- Possible successful outcomes are: (a)Hit-Miss-Hit or, (b)Miss-Hit-Hit
Example used (most typical at the start of the game):
"X" = 7
"Y" = 3
For this example, My Draft Solution is: .372 (The probability of scoring two hits out of three draws is 37.2%
My Draft Method:
I found two concepts on the web that I sort of bastardized into my own formula. For each possible outcome, I used what I understand to be the conditional probability formula: P(A and B) = P(A) x P(B|A). And as there are two possible outcomes, I also used the "A or B" formula: P(A or B) = P(A) + P(B) - P(A and B), where in this case, P(A and B) = 0.
So this is what I came up with (using a simple format I created for Excel, except I use cell references in place of the "X" & "Y" below. Again "X" is the # of tokens in the bag, and "Y" is the number of favorable tokens):
=(("Y"/"X")*(("Y"-1)/("X"-2)))+((("Y")/("X"-1))*(("Y"-1)/("X"-2)))
My logic, using this example of 7 tokens in the bag, of which 3 are "Hits" and "4" are misses, follows. Thusly I write this mix of tokens inside the bag as, "7 : 3: 4".
Remember, there are two possibilities for success, (a) & (b):
For Success Possibility (a) (drawing Hit-Miss-Hit)
At the start of draw one, the token mix in the bag is 7 : 3: 4 ... thusly I calculate the probability of getting a "Hit" as 3/7 =.429
At the start of draw two, the mix is 6 : 2 : 4 ... a "Miss" is drawn.
At the start of draw three, the mix is now 5 : 2 : 3 ... I calculate the probability of getting a "Hit" as 2/5=.4
Thusly for Success Possibility (a), I multiply .429*.4 to get a .172 probability of drawing two "Hit" tokens.
For Success Possibility (b) (drawing Miss-Hit-Hit)
At the start of draw one, the token mix in the bag is 7 : 3: 4 ... a "Miss" is drawn.
At the start of draw two, the mix is 6 : 3 : 3 ... I calculate the probability of getting a "Hit" as 3/6 =.5
At the start of draw three, the mix is 5 : 2 : 3 ... the probability of getting a "Hit" is 2/5=.4
Thusly for Success Possibility (b), I multiply .5*.4 to get a .2 probability of drawing two "Hit" tokens.
As success is an "either (a) or (b)" condition I applied the formula: P(A or B) = P(A) + P(B) - P(A and B), where again in this case, P(A and B) = 0. So I ended up with adding .172 and .2 to arrive at a final solution of .372.
Did I arrive at the correct solution?
If so, I can easily transform the above method into an Excel table (may not be very elegant behind the scenes), but will get the job done!
On the other hand, I have some doubt if I'm using the correct method, as with more research I did last night, I came across formulas that require calculating the total number of possible combinations of drawing seven tokens form the bag over the total number of possible combinations of how all the three "Hit" tokens in the bag could end up...something like this:
7!
7!-3!
or something like that... and then running with a formula from there. I came across formulas such as:
P(B|A)=P(A and B)/P(A) or,
P(B|A)=P(B|A)P(A)/P(B|A)P(A)+P(B|Ac)P(Ac)
As I intuitively think I already arrived at the correct solution, my head started to hurt when I came across the above factorial and probability equations. I wonder if they're the way I'm supposed to go? Or perhaps I did go that way without realizing it? If my draft solution is wrong and or uses the wrong method, then I really appreciate any input!
Thanks in advance!
Barry
Last edited by bgm1961; 08-07-2014 at 07:01 PM.
bgm1961 (08-06-2014)
Thank you for your reply! And yes, you understand the problem correctly. I realized afterward that I was perhaps mistaken in neglecting Hit-Hit-Miss as a third possible outcome, so I appreciate that you pointed that out. I discounted it because I saw it as a separate problem (i.e., scoring two "Hits" from only two draws). Now that I'll go back and add the H-H-M outcome to the other two, of course I'll get a different solution. I wonder if it'll be correct using my method.
Speaking of "my method", thank you for your hypergeometric distribution formula!
And yes, "Elementary but complicated" is my middle name! As I'm not a mathematician by any stretch, I admit that I'll have to do some homework to learn how to employ your formula. For instance, are the x and y in your formula the same as the "x" and "y" I use in my problem represent the total tokens and favorable tokens in the bag?
Thanks again!
EDIT: OK, I did some homework on the formula so I answered the above question. I see that you already adjusted the formula to use the variables and parameters from my example. Thanks! Now let me figure out how to run it and I'll post my result here.
NEXT DAY EDIT: Per BGM's recommendation I spent some time last night learning a little about the Hypergeometric Distribution function, and about pmf and cdf. I'm posting some comments and a follow-up question below.
Last edited by bgm1961; 08-07-2014 at 11:33 AM. Reason: Update with new info
BGM, thank you again for the Hypergeometric Distribution formula!
FOLLOW-UP QUESTION:
You recommended that I use the probability mass function result. Now that I have some idea of what that means, I wonder if I shouldn't be using the Upper Cumulative result instead? Using my problem as an example... as I understand the pmf it returns the probability of drawing exactly two "hit" tokens from the bag. Wouldn't I be more interested in learning the probability of drawing at least two tokens, since I need at least two?
Upon learning how to use the formula you presented, I first worked-it out by hand. Then with several online calculators. Then I found the HYPGEOM.DIST function in Excel. The answer was the same (.343). I was really excited that I could simply use the function built into Excel to build my table. BUT THEN, I noticed one of the results returned by an online calculator actually matched the result I got from my original example method (.372)! That result was the Upper Cumulative of the hypergeometric dist... it was described as the, "probability of drawing 2 successes or more from a sample of 3". Wait... this is what I want, isn't it? (I edited my original post to convey this).
Sadly, the HYPGEOM.DIST function in Excel doesn't appear to return the upper cumulative result. I thought my method which I asked about in my original post would work, as it certainly did for my sample problem (Population size=7, Successes in lot=3, Sample size=3, Successes in sample=2). But as soon as I changed the successes in lot from 3 to 4, it broke (my method returned a .743, while the online calc returned a .629)! It got even worse when I changed the population size to 6 and successes in lot to 4... my method returned a 1.10! The online calc returned a .8... which of course sounds more like it. So I've proven to myself that my method, as I presented it in the original post, is broken.
BOTTOM LINE: So I'll stick with the Hypergeometric Dist function, but would like an opinion on using the upper cumulative result instead of the pmf. And are there any short-cuts on using Excel to calculate that?
New: See Edit below.
Thanks again!
bgm1961
EDIT: Several hours later and I think I figured out the math! By studying the results (and especially the chart) returned by the online calculator it dawned on me that the Hypergeometric Dist function returns a distribution [chart]. The PMF is simply the result of the specific sampling I asked for. So, the Upper Cumulative result is the same as summing the pmf results for each successful sampling within the sample range. I.e., As I stated above, I believe that what I want is the upper cumulative result, not the pmf. So for the problem I presented in my original post, I'm interested in the probability of drawing at least two "Hit" tokens (vice exactly) from a sampling of three tokens drawn. For this problem, the hypergeometric pmf returns .343, and the upper cumulative is .372. But with my lack of mathematical skills, I couldn't figure out how to calculate the upper cumulative in Excel. So what I discovered is that I can also run the hypergeometric dist function, except adding one to the successful samples... so I plug-in 3 instead of 2 for "x". The pmf now returned is .0286. So I add that to .343 = .372, and get the same as the upper cumulative! And I know Excel has several cumulative functions, so I'll mess around with those.
Sorry for my simpleton-style naïve babbling, but thanks to BGM throwing that formula at me (of which I didn't have a clue what it was or what it meant). As you can tell, I'm not a mathematical student. Still, because of his help, I've taught myself how to use the hypergeometric dist function (just barely though, I understand) and have my problem solved! I'd still be stuck in "elementary but complicated" if it wasn't for this forum.
NEXT (and last) DAY EDIT: This is what I use in Excel to provide the Hypergeometric Dist Upper Cululative (x=successes in sample, n=sample size, M=successes in population, N=population size):
=HYPGEOM.DIST(x,n,M,N,FALSE)+(1-HYPGEOM.DIST(x,n,M,N,TRUE)) .
Now I can build that probability table!
Thanks, BGM for your help!
Last edited by bgm1961; 08-08-2014 at 05:24 PM.
Tweet |