+ Reply to Thread
Results 1 to 7 of 7

Thread: Calculating variance when proportions are not of population

  1. #1
    Points: 229, Level: 4
    Level completed: 58%, Points required for next Level: 21

    Posts
    8
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Calculating variance when proportions are not of population



    I've been racking my brains on this and can't find a solution that gives me that warm fuzzy feeling inside and I was hoping that somebody here could help. Hopefully this hypothetical situation will help explain what I am going for:

    Say I collect some consumer purchase data where I ask 500 people where they bought their power drill and how many power drills they purchased. I ask the quantity because I want stores that tend to sell in bulk to be properly represented in the data. Suppose with all the multiple purchases the number of drills purchased is around 700. Of the drills purchased, 140 were purchased at retailer A (so 20% of the drills were bought there). If I wanted to calculate the standard deviation of this proportion, how would I go about doing it?

    I know that for a population proportion the SD = sqrt(P*(1-P)/n) where P would be the proportion who purchased at retailer A and n would be the number of people surveyed... but the proportion I am working with is of the number of drills... not of the number of people sampled. So how do I calculate the variance of a proportion when it is not a proportion of the population, but a proportion of a specific variable (drill purchases in this case)?

    Is there a simple solution to this? Any help would be appreciated!

    Thanks in advance!

  2. #2
    TS Contributor
    Points: 3,913, Level: 39
    Level completed: 76%, Points required for next Level: 37
    terzi's Avatar
    Location
    Poza Rica, Mexico
    Posts
    378
    Thanks
    2
    Thanked 24 Times in 24 Posts

    Re: Calculating variance when proportions are not of population

    Hi potterbro,

    If I got it right, then your ssample size would be 700, which is the number of drills purchased, since you are not interested in the consumer but in the purchase. Beacuses of this, your observational unit is not the person, but the purchase itself in this case, so the sample size is the total number of purchases. Just consider that the proportion was calculated as ( Purchases in Store A / n), where n is the total number of drills bought. With that consideration in mind you shouldn't have problems with the analysis. Notice that you can also analyze other variables measured in people, just remember to identify the unit you are measuring in each occasion.
    Statisticians are engaged in an exhausting but exhilarating struggle with the biggest challenge that philosophy makes to science: how do we translate information into knowledge

  3. The Following User Says Thank You to terzi For This Useful Post:

    potterbro (09-25-2012)

  4. #3
    Points: 229, Level: 4
    Level completed: 58%, Points required for next Level: 21

    Posts
    8
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Calculating variance when proportions are not of population

    Thanks Terzi!

    Sorry it took so long to get back to this issue... I got pulled into something else and only now had the chance to look back into this issue.

    Something still doesn't feel right about just treating the number of drills as my sample size. Let me give a more extreme example that may explain my predicament. Suppose instead of drills I asked about photo prints. Say I ask 500 people where they had their photos purchased and how many photos they had printed. Each person could have any number of photos printed and the numbers can get huge. Suppose that the total number of photos purchased by these 500 people is 100,000. Suppose that 20,000 of the photos were purchased at retailer A (20%). If I were to calculate the standard deviation using the regular old proportion standard deviation of SD = sqrt(P*(1-P)/n) I would get a value of 0.13%. Now, if each person instead only bought one photo and 20% were purchased at retailer A the standard deviation would be 1.8%... over ten times higher. In both scenarios I asked only 500 people. Should I really expect the scenario where people are entering huge quantities to have less variation than when they enter small quantities? It seems to me like the standard deviation is being artificially shrunk down because of the large quantities involved.

    Thoughts?

  5. #4
    Points: 229, Level: 4
    Level completed: 58%, Points required for next Level: 21

    Posts
    8
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Calculating variance when proportions are not of population

    Does anybody have any thoughts on this? I'm just curious if there is an established methodology for dealing with the variance or standard deviation of a proportion when it is a proportion not of the number of people sampled but of some unit of their answers... whether it is the percentage of purchases made at retailer A or the percentage of dollars made at retailer A.

    Thanks!

  6. #5
    TS Contributor
    Points: 3,913, Level: 39
    Level completed: 76%, Points required for next Level: 37
    terzi's Avatar
    Location
    Poza Rica, Mexico
    Posts
    378
    Thanks
    2
    Thanked 24 Times in 24 Posts

    Re: Calculating variance when proportions are not of population

    Hi again potterbro,

    There's a problem with your line of thought. The formula you used, SD = sqrt(P*(1-P)/n) is not the formula for the standard deviation of a variable, it is the standard error of a proportion, which is different. The standard error is based on the sampling distribution of a statistic and it is always affected by your sample size (a bigger sample size, will reduce standard errors). The variance of a proportion is just estimated with the expression P*(1-P). Notice that the standard error is what you use to build confidence intervals and the variance is a component in the formula to obtain the standard error.

    I hope this helps you
    Statisticians are engaged in an exhausting but exhilarating struggle with the biggest challenge that philosophy makes to science: how do we translate information into knowledge

  7. #6
    Points: 229, Level: 4
    Level completed: 58%, Points required for next Level: 21

    Posts
    8
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Calculating variance when proportions are not of population

    Thank you so much for your help!

    Just one last thing that might help me move where I need to go... is the variance of a weighted proportion (where the individual data is weighted) also P*(1-P)?

  8. #7
    TS Contributor
    Points: 3,913, Level: 39
    Level completed: 76%, Points required for next Level: 37
    terzi's Avatar
    Location
    Poza Rica, Mexico
    Posts
    378
    Thanks
    2
    Thanked 24 Times in 24 Posts

    Re: Calculating variance when proportions are not of population


    No, the variance of the proportion must be corrected when using weights. Almost any statistical package has an option for including weights so the correction can be done automatically.
    Statisticians are engaged in an exhausting but exhilarating struggle with the biggest challenge that philosophy makes to science: how do we translate information into knowledge

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats