# Thread: Bemused by student question.

1. ## Bemused by student question.

We have taken a simple random sample to estimate the mean of a population.
A student asked me - The numbers near the middle of our sample will clearly be nearer the population mean than the numbers at the extremes of the sample. So why don't we calculate a weighted mean, giving more weight to the numbers in the middle? Wouldn't that give a more accurate estimate?
I don't think so, but why not?

2. ## Re: Bemused by student question.

hi,
I had the same question a while back and could not give a good answer. Today, I think, the answer is that the premisse is false, the population mean is uniformly distributed across the confidence interval , i.e. it could be anywhere, with the same probability.

regards

3. ## Re: Bemused by student question.

The first reply that comes to my mind would be another question in turn: what would be the threshold to be used to bracket that "middle" group?

4. ## Re: Bemused by student question.

If we assume the distribution is symmetrical, then it doesn't really matter how we weigh the extremes vs. the middle (because highs and lows balance each other out).

If the distribution is not symmetrical, then weighing the middle numbers more heavily will produce a biased estimate of the mean (lower with right-tailed and higher with left-tailed). The "middle" of a distribution is not necessarily the same as its mean. It's basically the same with median vs mean.

5. ## Re: Bemused by student question.

Oops,
I just simulated 100 000 samples of 100 from an standard normal distribution and calculated the distance between the lower CI and the true mean. I expected a uniform distribution but it is a nice normal one, so the population mean is indeed close to the middle of the confidence interval. The distribution of the distance between the true mean and the lower confidence limit is normal with a mean of 2 (I used the formula cli=mean-2*std) and a standard deviation of 0.17.

So, to gianmarcos question, it would be possible to have pretty good limits to where the true mean is located INSIDE the confidence interval. This seems pretty strange to me, do I miss something?

Code:
``````SimLength=100000

res=numeric(length=SimLength)
for(i in 1:SimLength){
v=rnorm(100)
stdv=sd(v)
meanv=mean(v)
cl=meanv-2*stdv
res[i]=-cl
}

hist(res)``````

6. ## Re: Bemused by student question.

Originally Posted by rogojel
So, to gianmarcos question, it would be possible to have pretty good limits to where the true mean is located INSIDE the confidence interval. This seems pretty strange to me, do I miss something?
I'm not sure if I understand what you mean, but let me try. The confidence interval will give you an interval that will contain the population parameter with x% of samples. If you make x smaller, you can get a smaller CI. With a normal distribution, the shape of the sampling distribution will also be normal (but note that you need to divide stdv by sqrt(100) = 10 to get the standard error, not that it matters much for the overall picture).

However, within the frequentist framework the population mean is assumed to be fixed. So a single CI either contains it or it doesn't, there's no probability. If what we're looking at is repeated sampling from the same normal distribution, then sure, you can talk about the probabilities of different samples (not: the population mean). But in reality we deal with just one sample.

Of course, there is the Bayesian credible interval, which I think offers more answers to the kind of questions you want. This, for instance:

Originally Posted by rogojel
I think, the answer is that the premisse is false, the population mean is uniformly distributed across the confidence interval , i.e. it could be anywhere, with the same probability.
is very Bayesian and hence not compatible with the confidence interval, I think (a population mean doesn't have a distribution in the frequentist framework).

7. ## The Following User Says Thank You to Junes For This Useful Post:

rogojel (10-23-2016)

8. ## Re: Bemused by student question.

hi Jules,
I think you misunderstand what I did. The experiment was to take samples from a population where the true mean is known, then to calculate the sample mean, sample stddev and the position of the true mean inside the confidence interval. This is of course impossible in a real sampling situation, but the goal of the experiment was to verify my statement, that the true mean can be anywhere in the confidence interval with equal probability - and this turned out to be wrong, the original statement of the question seems to be true. That is, the true value is generally closer to the center of the confidence interval then to the edge - and this has no bayesian flavor to it

BTW my formulation was really a bit bayesian but I intended it as a shorthand formulation to the frequentist one - that the limits of the confidence intervals are distributed in a way that the middlepoint of the interval is always close to the one true mean value.
Regards

9. ## Re: Bemused by student question.

Ah, thanks! Now I understand it better.

Yeah, I think it depends on the original distribution and the sample size, but in typical situations I think that's correct.

10. ## Re: Bemused by student question.

Originally Posted by katxt
The numbers near the middle of our sample will clearly be nearer the population mean than the numbers at the extremes of the sample.
If you give zero weight to all observations except the middle one, then that will be the median. The median throws all information except the middle one, so of course that will at a cost. The efficiency, the variance of the median divided by the variance of the mean, will be about 3.14/2 (for the normal distribution). So the median will have about 50% larger variance.

But the median will be more robust, i.e. less sensitive to outliers.

Originally Posted by katxt
So why don't we calculate a weighted mean, giving more weight to the numbers in the middle? Wouldn't that give a more accurate estimate?
You can, for example throw away the 5% largest and the 5% smallest values. That is called a trimmed mean. It will be more robust but still using most if the information in the sample. (But then the distribution theory, that is based on the assumption of exact normality, will not be valid.)

An R example:
Code:
``````set.seed(384)
x <- rnorm(20)
x
mean(x)
# [1] -0.2905987
mean(x, trim = 0.10)
# [1] -0.2822947``````

But still, the mean is a sufficient statistics for the normal distribution (and for all or at least most distributions in the exponential family). The mean is best linear unbiased for the standard assumptions.

11. ## The Following 2 Users Say Thank You to GretaGarbo For This Useful Post:

Junes (10-23-2016), victorxstc (10-23-2016)

12. ## Re: Bemused by student question.

Here’s a simple scenario. We will limit our weighting to the median alone.
Let’s take a sample of n from a normal distribution with sd = s.
We boost our sample by weighting the middle number on the grounds that it is closer to the mean than the extremes are. In effect we are adding some number of medians (say k) to the original sample and finding the mean of the new lot of n + k numbers.The se of the mean is s/sqrt(n). The se of the median is about 1.25xs/sqrt(n) (for normal data).
The revised mean = n/(n+k).sample mean + k/(n+k).sample meadian
It is now clear that the se of the new mean lies somewhere between that of the mean, and that of the median. If we have no medians, k = 0, the se is s/sqrt(n). If we have a very large number of the medians k = 1000000, the se approaches 1.25xs/sqrt(n). So any medians at all increase the se and make the sample less accurate.
I don’t know this for a fact, but I presume that the se of any particular order statistic is greater than that of the mean. If so, no weighting scheme is going to improve things.
kat

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts