# Determining the probability that a sample is accurate

#### Headbanger

##### New Member
I recently read an article concerning a new HIV vaccine developed in Thailand. The article can be found here http://www.popsci.com/scitech/artic...ial-deploys-first-ever-successful-hiv-vaccine

Here is the bottom line information as far as I can figure out:

The experiment was done with two groups: A control group who took a placebo, and a group that took the vaccine.

Both groups consisted of 8,201 individuals (half of 16,402).

Over the course of 3 years, 74 individuals from the control group contracted HIV, whereas 51 individuals from the vaccine group contracted HIV.

My questions are as follows -

Is it possible to mathematically determine the probability that the differences in numbers is purely an anomaly?

Would the difference in results between the vaccine group and the control group be sufficient enough that a reasonable person should expect the vaccine to be effective against HIV?

#### BioStatMatt

##### TS Contributor
Headbanger,

Yes, statistics help us determine whether an observed phenomenon occurs by chance, or because of some other factor, for example, treatment with a vaccine. There are several statistical methods that are used to answer the very question you pose. I will mention one that is quite common in biomedical experiments and clinical trials. I centers around the odds-ratio.

Lets say we estimate the probability that an individual in the placebo group develops HIV as
p1 = n11/n12 = 74/8201
and the corresponding probability for the vaccine group as
p2 = n21/n22 = 51/8201
Then the odds for the placebo group is
o1 = p1 / (1-p1)
and the odds for the vaccine group is
o2 = p2 / (1-p2)
In this case o2 = 0.00626, and o1 = 0.00911. We say that the "odds" of developing HIV, given that you are in the placebo group, is 0.00911. The odds-ratio is just that, the ratio of the two odds, or
o1 / o2 = 1.455. We say that the "odds" of developing HIV is 1.455 times larger in the placebo group than in the vaccine group.

Of course, this is just a summary measure for what you already know. To test if this value is "statistically significant", or not completely due to change, we can apply a statistical test to the odds-ratio, or the log(odds-ratio).

Under the hypothesis that there is no difference in the probability of developing HIV between the two groups, we would expect that the odds-ratio be close to the value 1, and the log(odds-ratio) to be close to the value 0. In more advanced statistics, we show that the probability distribution of the log(odds-ratio) under this hypothesis is approximately normal (Gaussian) with mean m=0 and standard deviation sd=sqrt(1/n11 + 1/n12 + 1/n21 + 1/n22) = sqrt(1/51 + 1/8201 + 1/8201 + 1/74).

With this information, we can construct a confidence interval for log(odds-ratio) or the odds-ratio, or conduct a statistical test. The 95% confidence interval for odds-ratio in this example is

( exp( log(odds-ratio) - 1.96*sd ), exp( log(odds-ratio) + 1.96*sd ) )
=( exp( 1.455 - 1.96*0.1827 ), exp( 1.455 + 1.96*0.1827 ) )
=( 1.017, 2.081 )

Hence, we are 95% confident that the "true" odds-ratio lies within these bounds. Since the lower bound is greater than 1, we can conclude, with 95% confidence, that observing this odds-ratio was NOT entirely due to chance.

Hope this gets you on the right path.
BioStatMatt