# Probability based on Poisson distribution

##### New Member
I have a question about a biological problem. In a tube, I have concentrated DNA that I will dilute out in new tubes testing different dilutions. I want to determine the probability of obtaining 1 DNA molecule per new tube. I was always told that using a dilution that results in 20-30% of the tubes to be DNA positive and the rest to be DNA negative gives you a 95% probability that each tube contains only 1 DNA molecule. I was trying to calculate this myself but I am having a hard time.
I assumed that U = 0.3 and x = 1
P = (e ^ -0.3) * (0.3 ^ 1) / 1! = 0.22
This is not the 95% I thought it should be..any ideas? Thank you so much!

#### GretaGarbo

##### Human
Is the situation such that you have diluted the original concentration many times (maybe 1000 times) so that you now have, say 10 test tubes, and 2 or 3 of then have at least 1 DNA, and the others 7 or 8 have no DNA? (Maybe the DNA causes a change in colour so that you can notice that there is DNA there).

Of course if you have 10 or 100 test tubes matters for the certainty/uncertainty.

Also please note that "at least one" means "one or more". So that P(Y>=1) = 1 - P(Y=0).

##### New Member
Correct, say 10 test tubes, and 2 or 3 of then have at least 1 DNA, and the others 7 or 8 have no DNA. However, it is the probability of one molecule only (not at least one molecule).

#### GretaGarbo

##### Human
Correct, say 10 test tubes
Skip the hypothetical talk. Tell us about how many test tube they usually have. That is what the 95% probability is based on.

How do you know that there are any DNA in the tube? Is it because they duplicates many times and you can e.g. se a change in colour? Then it is about "at least 1".

#### GretaGarbo

##### Human
It seems to be true that for n = 10 test tubes and 3 has got 1 or more colour change and 7 has none, that the likelihood is about 95% that the lambda value (the parameter in the Poisson distribution) is less than 1.

(But maybe we have lost the original poster.)

##### New Member
Thank you so much Greta. How did you arrive at that 95% value? Yes N=10 and 3 have DNA (color change) while 7 have none. I am trying to understand how the calculation was performed.

#### GretaGarbo

##### Human
Well I just made an R program to estimate the lambda parameter in the Poisson distribution. You need to install the R program and run the code. I hope that the explanations in the code is sufficient.

Maybe it can have some interest for others about a likelihood interval.

Code:
# This is an R program

# formulation of the problem
# if you have n=10 test tubes and get changed colour in 3 of 10, ie y=3
# and no changed colour in n-y = 10-3 = 7

# When there is a change in colour there can be 1 or more DNA molecules in the test tube
# compute the lambda from a Poisson distribution

# P(Y = y) = lamb^y*exp(-lamb)/factorial(y)
# P(Y = 0) = exp(-lamb)
# P(Y >=1) = 1- P(Y = 0) = 1 - exp(-lamb)
#
# probability for 7 no-change-of-colour, ie 7 with Y=0
# P(Y = 0)^7 = (exp(-lamb))^7
#
# probability for 3 change-of-colour,
# ie (P(Y >=1))^3  = (1 - exp(-lamb))^3

# probability for 3 change-of-colour AND 7 no-change-of-colour
#  (1 - exp(-lamb))^3 * (exp(-lamb))^7
#

# give a sequence of lambda values, here called lamb_sekv:
lamb_sekv <- seq(from=0.0, to=2, by= 0.025)
lamb_sekv

# compute the Likelihood (called "like") for each of lambd_sekv:
like <- ((1 - exp(-lamb_sekv))^3) * ((exp(-lamb_sekv))^7)

# plot the liklihood values for the lambda values
plot(lamb_sekv, like, type ="l", xlim = c(0, 1.5))
abline(v=0.30, col="red")
abline(v=0.35, col="blue")
# also: draw red line at the intuitive value of 0.30 = 3 of 10
# also draw a blue line for what looks like the max likelihood

# look at it on narrow scale:
plot(lamb_sekv, like, type ="l", xlim = c(0.15, 0.50))
abline(v=0.30, col="red")
abline(v=0.35, col="blue")

# print out the different lambda values and the likelihood
cbind(lamb_sekv,  like)

# compute the relative likelihood, so that the curve will have a max=1
# 2.222396e-03 is the mx value of the variable "like"
like_rel <- like/(2.222396e-03)

# Plot the relative likelihood
plot(lamb_sekv, like_rel, type ="l", xlim = c(0, 1.5))
abline(v=0.30, col="red")
abline(v=0.35, col="blue")
abline(h=0.15, col="black")

# the black line shows where the relative likelihood is 0.15
# the relative likelihood at 0.15 has got a value under 1 on lamba
# this is called a "likelihood interval", not a "confidence interval"
# but it can be interpreted as confidence interval
# explanations in Pawitan "In all likelihood"

# which values for lamb_sekv is larger than 0.15
cbind(lamb_sekv,  like_rel)

# the 0.15 (or more exact 0.146607) comes from (the likelihood ratio test) :
# -2 log(like) = 3.84
# log(like)    = -3.84/2, like=exp(-3.84/2)
# exp(-3.84/2)
#  0.146607

# so the estimated Poisson intensity (ie the lambda)
# is somewhat less than 1
# the lambda is estimated 0.35 and between 0.08 to 0.93

It is interesting that this method is similar to what the microbiologist call the "most probabel number (MPN)" (an ancient title). Nowdays it would be called a maximum likelihood estimate from the Poisson distribution. This method was actually mentioned in R A Fishers "Foundations of Mathematical Statistics" in 1922.