View Full Version : Bayes Theorem / Contingency Tables


JohnM
11-09-2005, 11:40 AM
In this example, I will walk you through the classic Bayes Theorem question concerning diagnostic tests. Rather than memorize a bunch of formulas related to Bayes Theorem and Conditional Probability, I will attempt to help you understand what is going on by using a 2x2 Contingency Table.

Usually, in this type of problem, you are given a few different probabilities for events, and you are asked to fill in the missing piece(s). One strategy for tackling this is to memorize the various formulas for conditional probabilities, or we can simply start by setting the population size to a very large number, say 1,000,000.

Using this population size and the probabilities given by the problem, we can begin to fill in some boxes in the 2x2 Contingency Table, and merely use addition and subtraction to fill in the missing numbers. Then, whatever probabilities are asked of us in the problem, we can easily determine them by picking out the right numbers from the table and doing division.

Here is an example:

(1) A certain disease occurs in the population at a rate of 0.02%
(2) If a person has the disease, the probability that the test for this disease will turn out positive is 98%
(3) If a person does not have the disease, the probability that the test for this disease will turn out positive is 3%

Let A = the event that someone in the population has the disease
then A' = the event that someone in the population does not have the disease

Let B = the event that the diagnostic test turns out positive
then B' = the event that the diagnostic test turns out negative

OK, now, let's say the size of the population is 1,000,000
From (1) above, P(A) = 0.02% of 1,000,000 which is (0.0002 * 1000000) = 200
then P(A') = 1000000 - 200 = 999800
From (2) above, P(B|A) = 98% of 200 which is (0.98 * 200) = 196
From (3) above, P(B|A') = 3% of 999800 which is (0.03 * 999800) = 29994

Now, let's start plugging values into our contingency table.

Click on the thumbnail image on the left - you'll notice that I've entered 1,000,000 in the lower right-hand corner, box (9), since it is the total of our population that has been tested for the disease.

Box (7) is simply the total of the people in the population that have the disease, or 200. Box (1), the condition B|A, is 196. Box (2) represents condition B|A', and is 29,994.

The easiest way to progress through this problem is to fill in box (8), which is (9)-(7). Now, you just need to fill in the remaining boxes, keeping in mind that the rows need to add up to the marginal values (far right-hand column), and the columns need to add up to the marginal values also (bottom row). Click on the thumbnail image on the right to see the completed table.

Now that the table is completely filled in, we can answer any question posed to us regarding a simple probability or a conditional probability - and all we need to do is some division.

Simple or "Marginal" Probabilities

P(A) = probability that a person in the population has the disease
we already know this from the given information, but we can compute it by dividing cell(7) by cell (9) --> = 200/1000000 = 0.0002 or 0.02%

P(B) = probability that the diagnostic test turns out positive, regardless of whether the person has the disease or not
= cell (3) / cell (9) = 30190 / 1000000 = 0.0302 or 3.02%

Conditional Probabilities

P(B|A) = given that the person has the disease, the probability that the test will turn out positive
= cell (1) / cell (7) = 196/200 = 0.98 or 98%
also known as the Sensitivity of the test

P(B|A') = given that the person does not have the disease, the probability that the test will turn out positive
= cell (2) / cell (8) = 29994/999800 = 0.03 or 3%
also known as the False Positive Rate

P(B'|A) = given that the person has the disease, the probability that the test will turn out negative
= cell (4) / cell (7) = 4/200 = 0.02 or 2%
also known as the False Negative Rate

P(B'|A') = given that the person does not have the disease, the probability that the test will turn out negative
= cell (5) / cell (8) = 969806/999800 = 0.97 or 97%
also known as the Specificity of the test

P(A|B) = given that the test turned out positive, the probability that the person has the disease
= cell (1) / cell (3) = 196/30190 = 0.0065 or 0.65%
also known as the Positive Predictive Value (PPV) of the test

P(A'|B) = given that the test turned out positive, the probability that the person does not have the disease
= cell (2) / cell (3) = 0.9935 or 99.35%

P(A|B') = given that the test turned out negative, the probability that the person has the disease
= cell (4) / cell (6) = 4/969810 = 0.000004

P(A'|B') = given that the test turned out negative, the probability that the person does not have the disease
= cell (5) / cell (6) = 969806/969810 = 0.999996 or virtually 100%
also known as the Negative Predictive Value (NPV) of the test

zmogggggg
10-25-2008, 11:56 AM
Out of curiosity, are these conditionals referred to by these names often in industry?

JohnM
10-25-2008, 02:43 PM
It's my understanding that they are, especially in the biomedical fields.

zmogggggg
10-25-2008, 02:46 PM
Interesting to know thanks :)

JohnM
10-25-2008, 06:01 PM
Here's an example: http://www.musc.edu/dc/icrebm/sensitivity.html

zmogggggg
10-26-2008, 01:17 PM
excellent example thanks

:wave: