+ Reply to Thread
Results 1 to 10 of 10

Thread: Bemused by student question.

  1. #1
    Points: 1,974, Level: 26
    Level completed: 74%, Points required for next Level: 26

    Location
    New Zealand
    Posts
    227
    Thanks
    3
    Thanked 48 Times in 47 Posts

    Bemused by student question.




    We have taken a simple random sample to estimate the mean of a population.
    A student asked me - The numbers near the middle of our sample will clearly be nearer the population mean than the numbers at the extremes of the sample. So why don't we calculate a weighted mean, giving more weight to the numbers in the middle? Wouldn't that give a more accurate estimate?
    I don't think so, but why not?

  2. #2
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Bemused by student question.

    hi,
    I had the same question a while back and could not give a good answer. Today, I think, the answer is that the premisse is false, the population mean is uniformly distributed across the confidence interval , i.e. it could be anywhere, with the same probability.

    regards

  3. #3
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts

    Re: Bemused by student question.

    The first reply that comes to my mind would be another question in turn: what would be the threshold to be used to bracket that "middle" group?
    http://cainarchaeology.weebly.com/

  4. #4
    Points: 6,387, Level: 52
    Level completed: 19%, Points required for next Level: 163
    Junes's Avatar
    Location
    Netherlands
    Posts
    246
    Thanks
    17
    Thanked 25 Times in 20 Posts

    Re: Bemused by student question.

    If we assume the distribution is symmetrical, then it doesn't really matter how we weigh the extremes vs. the middle (because highs and lows balance each other out).

    If the distribution is not symmetrical, then weighing the middle numbers more heavily will produce a biased estimate of the mean (lower with right-tailed and higher with left-tailed). The "middle" of a distribution is not necessarily the same as its mean. It's basically the same with median vs mean.
    Last edited by Junes; 10-23-2016 at 01:28 PM.

  5. #5
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Bemused by student question.

    Oops,
    I just simulated 100 000 samples of 100 from an standard normal distribution and calculated the distance between the lower CI and the true mean. I expected a uniform distribution but it is a nice normal one, so the population mean is indeed close to the middle of the confidence interval. The distribution of the distance between the true mean and the lower confidence limit is normal with a mean of 2 (I used the formula cli=mean-2*std) and a standard deviation of 0.17.

    So, to gianmarcos question, it would be possible to have pretty good limits to where the true mean is located INSIDE the confidence interval. This seems pretty strange to me, do I miss something?

    Code: 
    SimLength=100000
    
    res=numeric(length=SimLength)
    for(i in 1:SimLength){
      v=rnorm(100)
      stdv=sd(v)
      meanv=mean(v)
      cl=meanv-2*stdv
      res[i]=-cl
    }
    
    hist(res)

  6. #6
    Points: 6,387, Level: 52
    Level completed: 19%, Points required for next Level: 163
    Junes's Avatar
    Location
    Netherlands
    Posts
    246
    Thanks
    17
    Thanked 25 Times in 20 Posts

    Re: Bemused by student question.

    Quote Originally Posted by rogojel
    So, to gianmarcos question, it would be possible to have pretty good limits to where the true mean is located INSIDE the confidence interval. This seems pretty strange to me, do I miss something?
    I'm not sure if I understand what you mean, but let me try. The confidence interval will give you an interval that will contain the population parameter with x% of samples. If you make x smaller, you can get a smaller CI. With a normal distribution, the shape of the sampling distribution will also be normal (but note that you need to divide stdv by sqrt(100) = 10 to get the standard error, not that it matters much for the overall picture).

    However, within the frequentist framework the population mean is assumed to be fixed. So a single CI either contains it or it doesn't, there's no probability. If what we're looking at is repeated sampling from the same normal distribution, then sure, you can talk about the probabilities of different samples (not: the population mean). But in reality we deal with just one sample.

    Of course, there is the Bayesian credible interval, which I think offers more answers to the kind of questions you want. This, for instance:

    Quote Originally Posted by rogojel
    I think, the answer is that the premisse is false, the population mean is uniformly distributed across the confidence interval , i.e. it could be anywhere, with the same probability.
    is very Bayesian and hence not compatible with the confidence interval, I think (a population mean doesn't have a distribution in the frequentist framework).

  7. The Following User Says Thank You to Junes For This Useful Post:

    rogojel (10-23-2016)

  8. #7
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Bemused by student question.

    hi Jules,
    I think you misunderstand what I did. The experiment was to take samples from a population where the true mean is known, then to calculate the sample mean, sample stddev and the position of the true mean inside the confidence interval. This is of course impossible in a real sampling situation, but the goal of the experiment was to verify my statement, that the true mean can be anywhere in the confidence interval with equal probability - and this turned out to be wrong, the original statement of the question seems to be true. That is, the true value is generally closer to the center of the confidence interval then to the edge - and this has no bayesian flavor to it

    BTW my formulation was really a bit bayesian but I intended it as a shorthand formulation to the frequentist one - that the limits of the confidence intervals are distributed in a way that the middlepoint of the interval is always close to the one true mean value.
    Regards

  9. #8
    Points: 6,387, Level: 52
    Level completed: 19%, Points required for next Level: 163
    Junes's Avatar
    Location
    Netherlands
    Posts
    246
    Thanks
    17
    Thanked 25 Times in 20 Posts

    Re: Bemused by student question.

    Ah, thanks! Now I understand it better.

    Yeah, I think it depends on the original distribution and the sample size, but in typical situations I think that's correct.

  10. #9
    Human
    Points: 12,666, Level: 73
    Level completed: 54%, Points required for next Level: 184
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,360
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: Bemused by student question.

    Quote Originally Posted by katxt View Post
    The numbers near the middle of our sample will clearly be nearer the population mean than the numbers at the extremes of the sample.
    If you give zero weight to all observations except the middle one, then that will be the median. The median throws all information except the middle one, so of course that will at a cost. The efficiency, the variance of the median divided by the variance of the mean, will be about 3.14/2 (for the normal distribution). So the median will have about 50% larger variance.

    But the median will be more robust, i.e. less sensitive to outliers.

    Quote Originally Posted by katxt View Post
    So why don't we calculate a weighted mean, giving more weight to the numbers in the middle? Wouldn't that give a more accurate estimate?
    You can, for example throw away the 5% largest and the 5% smallest values. That is called a trimmed mean. It will be more robust but still using most if the information in the sample. (But then the distribution theory, that is based on the assumption of exact normality, will not be valid.)

    An R example:
    Code: 
    set.seed(384)
    x <- rnorm(20)
    x
    mean(x)
    # [1] -0.2905987
    mean(x, trim = 0.10)
    # [1] -0.2822947

    But still, the mean is a sufficient statistics for the normal distribution (and for all or at least most distributions in the exponential family). The mean is best linear unbiased for the standard assumptions.

  11. The Following 2 Users Say Thank You to GretaGarbo For This Useful Post:

    Junes (10-23-2016), victorxstc (10-23-2016)

  12. #10
    Points: 1,974, Level: 26
    Level completed: 74%, Points required for next Level: 26

    Location
    New Zealand
    Posts
    227
    Thanks
    3
    Thanked 48 Times in 47 Posts

    Re: Bemused by student question.


    Thanks for all the comments. I have thought about this some more and I'm starting to get a glimmer.
    Here’s a simple scenario. We will limit our weighting to the median alone.
    Let’s take a sample of n from a normal distribution with sd = s.
    We boost our sample by weighting the middle number on the grounds that it is closer to the mean than the extremes are. In effect we are adding some number of medians (say k) to the original sample and finding the mean of the new lot of n + k numbers.The se of the mean is s/sqrt(n). The se of the median is about 1.25xs/sqrt(n) (for normal data).
    The revised mean = n/(n+k).sample mean + k/(n+k).sample meadian
    It is now clear that the se of the new mean lies somewhere between that of the mean, and that of the median. If we have no medians, k = 0, the se is s/sqrt(n). If we have a very large number of the medians k = 1000000, the se approaches 1.25xs/sqrt(n). So any medians at all increase the se and make the sample less accurate.
    I don’t know this for a fact, but I presume that the se of any particular order statistic is greater than that of the mean. If so, no weighting scheme is going to improve things.
    kat

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats