Confidence intervals for proportions: approximating a discrete distribution with a co

trinker

ggplot2orBust
#1
I saw on this website http://onlinestatbook.com/2/estimation/proportion_ci.html

The following quote about calculating a CI for a proportion:

To correct for the fact that we are approximating a discrete distribution with a continuous distribution (the normal distribution), we subtract 0.5/N from the lower limit and add 0.5/N to the upper limit of the interval.
Giving: \(p \pm Z_{.95}\sqrt{\frac{p(1-p)}{N}} \pm \frac{.5}{N}\)

Where (it appears) \(N\) is the sample size.

Another website:

http://stattrek.com/estimation/confidence-interval-proportion.aspx?Tutorial=AP

has the approximation simply as:

\(p \pm Z_{.95}\sqrt{\frac{p(1-p)}{n}}\)

Where \(n\) is the sample size.

The first approach can give values larger than 1 or smaller than 0. Here the vector has all 1s though it was possible to have gotten a zero. The result with te first formula above gives the following CI:

Code:
set.seed(10)
x <- sample(0:1, 100, TRUE, c(.001, .999))

[0.995, 1.005]
This seems bad (> 1). Do we indeed need to:

correct for the fact that we are approximating a discrete distribution with a continuous distribution
With...

\(\pm \frac{.5}{N}\)
 

JesperHP

TS Contributor
#2
Re: Confidence intervals for proportions: approximating a discrete distribution with

In this instance you have specifically chosen probabilities p=0.001 that are extreme compared to sample size 100. The expected number of succeses is ceiling(99.9)=100. When approximating a binomial the poisson dist. is better for "very small" p and the normal will work best for p around 0.5. The way I see it you are choosing p=0.001 thereby creating a problem for the normal distribution approximation that more relates to the problem of approximating a bounded distribution taking values in 0,1,..,100 with an unbounded -inf,inf than it relates to the problem of approximating something discrete with a continuous distribution. The continuity correction does not solve the first type of these problems and the fact that the problem exists is no news.

So to answer the question I guess you could do a simulation study on samples where you in the first place actually would use a normal distribution or where the problem of approximation is a problem relating to what the correction is intended to correct.

As an alternative you could simple invent the Trinker continuity approximation where in the case you get a result below 0 you round up to zero and in the case you get a value above 1 you round down :)
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
Re: Confidence intervals for proportions: approximating a discrete distribution with

I believe I recall seeing this correction before, but I think Jesper hit the nail on the head.