95% confidence interval using standard error of a proportion

#1
Hi, firstly apologies my stats knowledge is limited.

I am looking at linear binomial regression.

I have a study that has a yes or no output so binomial. Out of 50 cases I get a yes rate of 49 so a probability of success (p) value of 98%. I want to work out the 95% confidence of this probability of success. My tutor has stated that I should use the standard error of a proportion:

1570377001989.png

Using this formula gives me a standard error rate of 0.019799

I believe I then add the p value to the standard error rate to work out the confidence interval which in this case = 96-100%

However is that at a 95% confidence interval, I read that doing the above calculation give 68% CI and I need to multiple the SE by 1.96. However it doing so the confidence intervals would be higher than 100% which is impossible?

Any help would be most appreciated.
 
#3
Hi thank you so much for getting back to me:

I wondered if you could help me with a worked example.


I presume you are referring to the following formulae when you suggest using beta distribution confidence interval:
1570441667076.png

From the original information:

Alpha = 0.05
K= 49
n=50
Betainv = not sure how you know/calculate this?
I'm guessing the (,) indicate to multiple.

Again thank you so much for your help.
 
#4
Betainv is the cumulative distribution function.
Beta distribution has two shape parameters (alfa and beta), so you need to calculate the cumulative distribution function Betainv(0.975, alfa, beta).
 

obh

Active Member
#5
Hi,

I think that the sample size of 50 is small enough to use the binomial distribution instead of any other approximation distribution.
Since the distribution is discrete the confidence level won't be exactly the required level.

On the other hand, maybe sample size of 50 is big enough to use the normal distribution even with a high ratio??
It would be interesting to run a simulation to check.
 
#6
Betainv is the cumulative distribution function.
Beta distribution has two shape parameters (alfa and beta), so you need to calculate the cumulative distribution function Betainv(0.975, alfa, beta).
Hi thank you, sorry this is still a bit beyond me can I just check so for calculating Betainv is α = # of successes in n trials and β = # of failures in n trials.
Which is straight forward to calculate in excel.

So where does this formulae come into play?
1570791028071.png
 
Last edited:
#7
I would recommend using the exact method. Do it in Excel.

=1-BINOM.DIST(48,50,0.90,TRUE)

This formula calculates the probability of getting from 0 to 48 successes in 50 trials if the probability on each trial is 0.9. This is then subtracted from 1 to give the probability of getting *more* that 48 successes (since you had more, namely 49).

You then fool around with the 0.90 proportion until you find a value that gives exactly 0.05 probability of getting greater than 48/50. That value of p is your lower bound 95% one-tailed CI. I would do this one-tailed -- the upper bound is basically 1.

For example, using 0.9 I get 0.033. So, if the true proportion was 0.9, I would get more than 48 only 3.3% of the time. If the true proportion was 0.9, you would get an observed value of 48 or less 1 - 0.033 or 96.7% of the time. By definition, 0.9 is the lower bound of a one-tailed 0.967 CI.

You want a value such that the observed data would occur only 5% of the time or more, thus you can be (sort of) 95% sure the true value is no bigger than it. That's how we interpret CI (it isn't technically correct, of course).

But of course, you want a 0.95 CI. Somewhere around 0.91. Play with it!
 
#9
Hi, I've had a chance to look through your comment and again it's really helpful.

Within this test I agree that with 48 successes out of 50 the upper bound is basically 1 however if I had a similar test which has say 38 success out of 50 how do I go about finding the upper bound?

Can you explain a bit more why this is only "sort of 95% sure" and "technically not correct".
 
Last edited:
#10
Hi,

If you want to use a binomial distribution:

p=38/50=0.76

P( x > 44 ) = 0.0106530. 44/50=0.88
P( x > 45 ) = 0.0279590. too big, bigger than 0.025 (0.05/2).

P(x<33)=0.0384254. too big
P(x<32)= 0.0190879.. 32/50=0.64

So the confidence interval includes the edges is: [0.64,0.88]

But I'm not sure if this is the best method. http://users.stat.ufl.edu/~aa/articles/agresti_coull_1998.pdf

Generally, I would only expect that the accurate method will always be more correct than any approximation.
I can't say the attached article is very clear to me.

You may say that since it is discrete distribution it makes sense to get only CI from discrete values.
The problem I can see with using the binomial distribution is that the discrete value is based on estimated probability, so actually thew values between the discrete results of the binomial are also possible values.
So because of the discrete results, the binomial CI produces a bigger confidence interval with a bigger actual confidence level which is bigger than the required confidence level.

The method called exact (that checkthebias mentioned) is actually Clopper-Pearson based on the Beta distribution, which is as I understand also an approximation?
I checked in R (library(Hmisc), there are also other options) and didn't see they use the binomial distribution as an option. (Normal / Clopper-Pearson, Wilson)

In https://www.rdocumentation.org/packages/Hmisc/versions/4.2-0/topics/binconf they write :
"Following Agresti and Coull, the Wilson interval is to be preferred and so is the default."

So what is the most accurate method for confidence interval? @Miner @Dason

Why no binomial? is it what I wrote above?
 
Last edited: