# Thread: Calculating variance when proportions are not of population

1. ## Calculating variance when proportions are not of population

I've been racking my brains on this and can't find a solution that gives me that warm fuzzy feeling inside and I was hoping that somebody here could help. Hopefully this hypothetical situation will help explain what I am going for:

Say I collect some consumer purchase data where I ask 500 people where they bought their power drill and how many power drills they purchased. I ask the quantity because I want stores that tend to sell in bulk to be properly represented in the data. Suppose with all the multiple purchases the number of drills purchased is around 700. Of the drills purchased, 140 were purchased at retailer A (so 20% of the drills were bought there). If I wanted to calculate the standard deviation of this proportion, how would I go about doing it?

I know that for a population proportion the SD = sqrt(P*(1-P)/n) where P would be the proportion who purchased at retailer A and n would be the number of people surveyed... but the proportion I am working with is of the number of drills... not of the number of people sampled. So how do I calculate the variance of a proportion when it is not a proportion of the population, but a proportion of a specific variable (drill purchases in this case)?

Is there a simple solution to this? Any help would be appreciated!

2. ## Re: Calculating variance when proportions are not of population

Hi potterbro,

If I got it right, then your ssample size would be 700, which is the number of drills purchased, since you are not interested in the consumer but in the purchase. Beacuses of this, your observational unit is not the person, but the purchase itself in this case, so the sample size is the total number of purchases. Just consider that the proportion was calculated as ( Purchases in Store A / n), where n is the total number of drills bought. With that consideration in mind you shouldn't have problems with the analysis. Notice that you can also analyze other variables measured in people, just remember to identify the unit you are measuring in each occasion.

3. ## The Following User Says Thank You to terzi For This Useful Post:

potterbro (09-25-2012)

4. ## Re: Calculating variance when proportions are not of population

Thanks Terzi!

Sorry it took so long to get back to this issue... I got pulled into something else and only now had the chance to look back into this issue.

Something still doesn't feel right about just treating the number of drills as my sample size. Let me give a more extreme example that may explain my predicament. Suppose instead of drills I asked about photo prints. Say I ask 500 people where they had their photos purchased and how many photos they had printed. Each person could have any number of photos printed and the numbers can get huge. Suppose that the total number of photos purchased by these 500 people is 100,000. Suppose that 20,000 of the photos were purchased at retailer A (20%). If I were to calculate the standard deviation using the regular old proportion standard deviation of SD = sqrt(P*(1-P)/n) I would get a value of 0.13%. Now, if each person instead only bought one photo and 20% were purchased at retailer A the standard deviation would be 1.8%... over ten times higher. In both scenarios I asked only 500 people. Should I really expect the scenario where people are entering huge quantities to have less variation than when they enter small quantities? It seems to me like the standard deviation is being artificially shrunk down because of the large quantities involved.

Thoughts?

5. ## Re: Calculating variance when proportions are not of population

Does anybody have any thoughts on this? I'm just curious if there is an established methodology for dealing with the variance or standard deviation of a proportion when it is a proportion not of the number of people sampled but of some unit of their answers... whether it is the percentage of purchases made at retailer A or the percentage of dollars made at retailer A.

Thanks!

6. ## Re: Calculating variance when proportions are not of population

Hi again potterbro,

There's a problem with your line of thought. The formula you used, SD = sqrt(P*(1-P)/n) is not the formula for the standard deviation of a variable, it is the standard error of a proportion, which is different. The standard error is based on the sampling distribution of a statistic and it is always affected by your sample size (a bigger sample size, will reduce standard errors). The variance of a proportion is just estimated with the expression P*(1-P). Notice that the standard error is what you use to build confidence intervals and the variance is a component in the formula to obtain the standard error.

I hope this helps you

7. ## Re: Calculating variance when proportions are not of population

Thank you so much for your help!

Just one last thing that might help me move where I need to go... is the variance of a weighted proportion (where the individual data is weighted) also P*(1-P)?

8. ## Re: Calculating variance when proportions are not of population

No, the variance of the proportion must be corrected when using weights. Almost any statistical package has an option for including weights so the correction can be done automatically.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts