Confidence Intervals for Proportions -- Two ways to estimate population value?

#1
Hi everyone,

I'm wondering if anyone could help shed some light on why there appear to be two different ways that different textbooks show to calculate confidence intervals for proportions.

Within the formula I know, there is P, which is the population parameter/value we are trying to estimate using the sample statistic/value.

The way I am familiar with is to always use 0.5 as the value for P (the population parameter), as by doing so, the numerator will always end up having a value of 0.25 -- because we are multiplying P by ( 1 - P ).

This is good, because if we were to use any other value than 0.5, the expression will decrease in value (i.e. be less than 0.25). Setting P at 0.5 ensures that the expression P( 1-P ) will be at its maximum possible value and therefore the interval we construct will be at maximum width. This is the most conservative possible solution to the dilemma posed by having to assign a value to P in the estimation equation.

There are some other textbooks, however, that in place of P in the equation, use the sample value (Ps). This is not how I learned it, so this seems odd to me, because we don't know if the sample value accurately represents the population value -- though we know from what we know about the sampling distribution that this is likely to be close to the population value (the sampling distribution mean).

Why are there these two different ways about going to calculate a confidence interval for proportions? And which is better?

Thank you to anyone who is able to help, and I apologize if this question is a repeat and has been asked already.

Thanks,
Frodo
 
#3
Hi Greta,

Thank you very much for your response. Perhaps you could identify the relevant section of the wikipedia page for me? I'm finding the page very confusing. I recognize the first formula the page presents, and then it loses me from there. And I am not sure where it addresses my query around the two different ways of estimating the interval, using either P-mu or Ps, as I've highlighted.

Thanks again so much.

Best,
Frodo
 

Miner

TS Contributor
#4
Hi everyone,

I'm wondering if anyone could help shed some light on why there appear to be two different ways that different textbooks show to calculate confidence intervals for proportions.

Within the formula I know, there is P, which is the population parameter/value we are trying to estimate using the sample statistic/value.

The way I am familiar with is to always use 0.5 as the value for P (the population parameter), as by doing so, the numerator will always end up having a value of 0.25 -- because we are multiplying P by ( 1 - P ).

This is good, because if we were to use any other value than 0.5, the expression will decrease in value (i.e. be less than 0.25). Setting P at 0.5 ensures that the expression P( 1-P ) will be at its maximum possible value and therefore the interval we construct will be at maximum width. This is the most conservative possible solution to the dilemma posed by having to assign a value to P in the estimation equation.
The second formula that you cited (always use 0.5) is probably a worst case "a rule of thumb" that the textbook author decided to use to simplify the math. I have never run across it before, and would not attempt to justify it.
 
#5
The second formula that you cited (always use 0.5) is probably a worst case "a rule of thumb" that the textbook author decided to use to simplify the math. I have never run across it before, and would not attempt to justify it.
Hi Miner,

Thanks for your input. That's interesting - most textbooks I have looked at use 0.5 for the estimate of the population value, since we don't know the population value. Though others I have looked use an equation where the sample value is used in its place, as I mentioned.

I still wonder why there is the difference in methods - why some texts would use one over the other. The explanation that it is to simplify the math doesn't strike me as the reason, as the math is really quite simple either way.

Best,
Frodo
 

CowboyBear

Super Moderator
#6
Yeah it's a very standard procedure. The margin of error in a political poll is invariably based on the assumption of a proportion of 0.5.
 
#7
Thanks for your input. That's interesting - most textbooks I have looked at use 0.5 for the estimate of the population value, since we don't know the population value. Though others I have looked use an equation where the sample value is used in its place, as I mentioned.

Sometimes the 0.5 value is used in design phase, before you have any data. It is just used to ge a clue of the length of the confidence interval for different sample sizes. You want a short confidence interval but you must be able to afford the big sample size.

But once you get the data, the estimated proportion should be used. That is the whole point with the investigation. You use the data not only to get a point estimate, but also to get an interval estimate.

Look at the wikipedia link I gave above. None of them use the 0.5 value. If the textbook says use the 0.5 value it is just wrong.
 

CowboyBear

Super Moderator
#8
Yep, sorry Greta - that'll teach me to reply quickly to stuff without checking the topic carefully. Using the 0.5 assumption sort of makes sense when reporting a general margin of error for a poll (e.g., a political poll where there are actually proportions for several parties being estimated), but doesn't make any sense when calculating a confidence interval for a specific estimated proportion.
 
#9
Sometimes the 0.5 value is used in design phase, before you have any data. It is just used to ge a clue of the length of the confidence interval for different sample sizes. You want a short confidence interval but you must be able to afford the big sample size.

But once you get the data, the estimated proportion should be used. That is the whole point with the investigation. You use the data not only to get a point estimate, but also to get an interval estimate.

Look at the wikipedia link I gave above. None of them use the 0.5 value. If the textbook says use the 0.5 value it is just wrong.
Wow; that is so surprising to me. That's too bad that so many widely used textbooks would incorrectly show students how to estimate a proportion. Even more so, I'm really surprised that the professors I've had would use those textbooks! I've included an example here for you to see:





I, and many other students I know, have been taught to construct interval estimates for proportions this way. I always thought we calculated the margin of error this way because it is based on the standard deviation of the sampling distribution.

Thanks for your input and for letting me know the correct way to calculate them.

Best,
Frodo