+ Reply to Thread
Results 1 to 6 of 6

Thread: Sample mean differs from population mean due to weighting

  1. #1
    Points: 45, Level: 1
    Level completed: 90%, Points required for next Level: 5

    Posts
    3
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Sample mean differs from population mean due to weighting




    Hi folks,

    The disclaimer first: I'm not primarily trained in statistics, so this problem might sound very naive to the experienced statisticians. Your expertise is what I'm looking for and any help is highly appreciated.


    Now the problem:
    I am supposed to calculate total chocolate consumption in a population. Further, I need to calculate per capita chocolate consumption in the overall population, per capita chocolate consumption in rural and urban areas of the population, per capita chocolate consumption by age groups and by income groups.

    A primary survey was conducted in a representative random sample of the population wherein annual chocolate consumption data was collected. The sample was created using a two-tier stratification. Tier I was by geographical region and Tier II was by rural and urban within each geographical region. This is illustrated as follows:

    Sample size: 1200 persons
    Number of regions: 4
    Sample size per region:300

    Within each region the sample was further subdivided into rural and urban based on the population mix of that region.

    Once the survey data was received, aggregations were done as follows:

    Total consumption (T) = sum of consumption in each region (T1 + T2 + T3 +T4)

    Consumption in a region = consumption in region's urban stratum + consumption in region's rural stratum [eg. T1 = Tu1 + Tr1, etc]

    Consumption in the region's urban/rural stratum = (Consumption in the specific stratum / sample size in the specific stratum) * Population of the specific stratum
    [eg. Tu1 = Cu1/Su1*Pu1 ; Tr1 = Cr1/Sr1*Pr1]


    Now that we have the total consumption, calculating overall per capita consumption is fairly simple... Y = T/P

    Per capita consumption by rural/urban stratum is calculated as:
    Yu = (Tu1 + Tu2 + Tu3 +Tu4) / (Pu1 + Pu2 + Pu3 + Pu4)

    Yr = (Tr1 + Tr2 + Tr3 + Tr4) / (Pr1 + Pr2 + Pr3 + Pr4)

    The problem is, when I try to calculate per capita consumption by income group and by age group, I don't have data on distribution of population by income and age. Hence I resort to the crude method of using unweighted sum of consumption in the sample for these calculations. This is leading to problems, eg. if consumption of chocolates is far higher in one of the geographical regions compared to all others, the overall per capita consumption falls outside the range of per capita consumption by age.

    For example, in my data, the results are as follows:

    Age group Per capita chocolate consumption
    0 - 7 2.75
    8-12 4.07
    13 - 19 4.86
    20 - 35 7.42
    36 - 45 7.65
    46 - 60 8.58
    Above 60 10.88

    whereas in overall stratified sample, the per capita consumption is only 1.7 units.

    Similar discrepency is observed in distribution by income groups also.



    I understand that the source of problem is non availability of age and income distribution data in the various geographical and rural/urban strata. However, I hope a statistical solution to this problem exists.

    I would highly appreciate if anyone could advise on this and point me to some resources that I can refer.

    Thanks,
    Sumeet

  2. #2
    Points: 1,974, Level: 26
    Level completed: 74%, Points required for next Level: 26

    Location
    New Zealand
    Posts
    227
    Thanks
    3
    Thanked 48 Times in 47 Posts

    Re: Sample mean differs from population mean due to weighting

    Strange things can happen with averages, like Simpson's paradox, but in this case the difference seems much too large. Can you explain where the age data came from, and how you calculated the averages?

  3. #3
    Points: 45, Level: 1
    Level completed: 90%, Points required for next Level: 5

    Posts
    3
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Sample mean differs from population mean due to weighting

    Quote Originally Posted by katxt View Post
    Strange things can happen with averages, like Simpson's paradox, but in this case the difference seems much too large. Can you explain where the age data came from, and how you calculated the averages?
    Hi katxt,
    Thank you for your time and effort.

    The age data came from the primary survey (interviews with random households) I had conducted.

    Caluclation of averages

    Total consumption was calculated as a sum of consumption in various strata. Per capita consumption in the overall sample (which yielded a result of 1.7 units) was calculated by dividing this total consumption by total population (which is the sum of population of all strata).

    Per capita consumption within age brackets were calculated as simple mean from the sample (eg. sum of consumption reported by 8-12 year olds divided by number of persons in that age bracket). I know this is not the ideal way to do it, but I don't have the age distribution data in the overall population. The solution for my problem could perhaps lie here- if you could suggest me another more accurate way to find the average consumption within different age brackets.

    I have tried to explain the calculations in more detail the original post. Reproducing here...
    Sample size: 1200 persons
    Number of regions: 4
    Sample size per region:300

    Within each region the sample was further subdivided into rural and urban based on the population mix of that region.

    Once the survey data was received, aggregations were done as follows:

    Total consumption (T) = sum of consumption in each region (T1 + T2 + T3 +T4)

    Consumption in a region = consumption in region's urban stratum + consumption in region's rural stratum [eg. T1 = Tu1 + Tr1, etc]

    Consumption in the region's urban/rural stratum = (Consumption in the specific stratum / sample size in the specific stratum) * Population of the specific stratum
    [eg. Tu1 = Cu1/Su1*Pu1 ; Tr1 = Cr1/Sr1*Pr1]


    Now that we have the total consumption, calculating overall per capita consumption is fairly simple... Y = T/P

    Per capita consumption by rural/urban stratum is calculated as:
    Yu = (Tu1 + Tu2 + Tu3 +Tu4) / (Pu1 + Pu2 + Pu3 + Pu4)

    Yr = (Tr1 + Tr2 + Tr3 + Tr4) / (Pr1 + Pr2 + Pr3 + Pr4)

    The problem is, when I try to calculate per capita consumption by income group and by age group, I don't have data on distribution of population by income and age. Hence I resort to the crude method of using unweighted sum of consumption in the sample for these calculations. This is leading to problems, eg. if consumption of chocolates is far higher in one of the geographical regions compared to all others, the overall per capita consumption falls outside the range of per capita consumption by age.

    Looking forward to your views.

    Regards,
    rsindore

  4. #4
    Points: 1,974, Level: 26
    Level completed: 74%, Points required for next Level: 26

    Location
    New Zealand
    Posts
    227
    Thanks
    3
    Thanked 48 Times in 47 Posts

    Re: Sample mean differs from population mean due to weighting

    I assume that they are random samples of 300 out of strata of P1, P2, P3, and P4. So the samples can be scaled up into reasonable estimates of the full strata values by multiplying by Pi/300.
    If we look at one group, say 20 to 25, we will have n1, n2, ... in each strata with chocolate totals of T1, T2, ...
    Our best estimate of the average consumption of this group over the entire population is (total consumed by the group)/(total number in the group)
    = (T1*P1/300 + T2*P2/300 + ... )/(n1*P1/300 + n2*P2/300 + ... ).
    The 300s all cancel in this case so we have = (T1*P1 + T2*P2 + ... )/(n1*P1 + n2*P2 + ... ) for the average for the group.
    For the total average consumption, the groups are all n = 300 points so the overall average is = (T1*P1 + T2*P2 + ... )/(P1 + P2 + ... )/300

  5. The Following User Says Thank You to katxt For This Useful Post:

    rsindore (09-19-2017)

  6. #5
    Points: 45, Level: 1
    Level completed: 90%, Points required for next Level: 5

    Posts
    3
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Thumbs up Re: Sample mean differs from population mean due to weighting

    Quote Originally Posted by katxt View Post
    I assume that they are random samples of 300 out of strata of P1, P2, P3, and P4. So the samples can be scaled up into reasonable estimates of the full strata values by multiplying by Pi/300.
    If we look at one group, say 20 to 25, we will have n1, n2, ... in each strata with chocolate totals of T1, T2, ...
    Our best estimate of the average consumption of this group over the entire population is (total consumed by the group)/(total number in the group)
    = (T1*P1/300 + T2*P2/300 + ... )/(n1*P1/300 + n2*P2/300 + ... ).
    The 300s all cancel in this case so we have = (T1*P1 + T2*P2 + ... )/(n1*P1 + n2*P2 + ... ) for the average for the group.
    For the total average consumption, the groups are all n = 300 points so the overall average is = (T1*P1 + T2*P2 + ... )/(P1 + P2 + ... )/300

    Understood and implemented.
    This is indeed the solution I was looking for... Thanks again

    In case I need to communicate this to someone briefly, is there a specific term used for this kind of calculation? eg. "The workaround to this problem was found using _______ method."?

  7. #6
    Points: 1,974, Level: 26
    Level completed: 74%, Points required for next Level: 26

    Location
    New Zealand
    Posts
    227
    Thanks
    3
    Thanked 48 Times in 47 Posts

    Re: Sample mean differs from population mean due to weighting


    Perhaps you could call it "weighted means".

  8. The Following User Says Thank You to katxt For This Useful Post:

    rsindore (09-27-2017)

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats