Why does sample std dev underestimate population std dev?
The subject line is too short for me to word out my question properly:
Why does (square root of unbiased estimator for population variance) underestimate population standard deviation?
Refering to this wikipedia page Unbiased estimation of standard deviation, it says that "it follows from Jensen's inequality that the square root of the sample variance is an underestimate".
I do know that for the concave square root function, Jensen's inequality says that the square root of the mean > mean of the square root.
So, how do we conclude that the square root of the sample variance underestimates population standard deviation?
Since we know from Jensen's inequality that square root of the mean > mean of the square root, does "square root of sample variance" somehow relate to "mean of the square root" while "population standard deviation" somehow relates to "square root of the mean" How so? This does not make any sense.
Re: Why does sample std dev underestimate population std dev?
Dason, thank you! Now I understand the mathematical proof. I would however like clarification on practical purposes:
If I were to conduct experiments and obtain multiple samplings from a population, would first calculating for each sampling followed by calculating be equivalent to ?
Also then, would first calculating the unbiased estimate of population variance over all samplings by followed by taking the square root be equivalent to ?
If so, then one would perform the second case and always get the unbiased estimator for population standard deviation since ?! Something has to be wrong somewhere.
Re: Why does sample std dev underestimate population std dev?
BGM, thanks, I understand the mathematical proof now. Can you also comment on practical approaches to getting the unbiased estimator for population standard deviation?
Re: Why does sample std dev underestimate population std dev?
Actually, this bias is not a big concern because consistency is much more important.
As shown in the wiki article, you may correct the bias by multiplying a constant before, i.e. . E.g. if the random sample is normal, this constant as and therefore this mean is asymptotically unbiased.
When the sample size is small, both biased and unbiased estimator are not reliable.
When the sample size is large, if the biased estimator is consistent (asymptotically unbiased), then it actually make no difference with the unbiased estimator in terms of bias.
Re: Why does sample std dev underestimate population std dev?
BGM: I agree that practically, the bias might not be important.
Dragan: Thanks for the correction factor.
Now, I would just like to know for peace of mind, how the conclusion that is lesser than the true population standard deviation came about. Can someone explain starting from this expression, how is considered "mean of square root" and thus can invoke Jensen's inequality? If expectation is to be used in the explanation, can you please start from this expression and show me how you get to the expectation?
Re: Why does sample std dev underestimate population std dev?
I think I understand now.
The crucial bit was realizing that in the math we are dealing with random variables which have distributions characterized by means and variances. Hence, and each has its own distribution with its own mean and variance.
is from a distribution where the mean is . is from a distribution where the mean is .
When we draw samples from a population and calculate the sample variance, the value we obtain is a realized variate from the distribution. When we square root this number, we get a realized variate from the distribution. Since the mean of this distribution is lesser than or equal to the population std dev, this number we calculated would tend to underestimate the pop std dev and thus correction is (theoretically) required.
Re: Why does sample std dev underestimate population std dev?
Johnc is asking an excellent question here. The confusion arises not from any misunderstanding of Jensen's Inequality, which is straightforward. Confusion arises in applying the math to the problem.
The definition of s^2 looks like an expectation itself--it is the average over i. So at first glance, it appears that when we take its square root, we are taking "the square root of the average", which is greater than the average of the square root, which is what we really want, and therefore an over-estimate, exactly the opposite conclusion!!
The crucial point is this: In the response "E[sqrt(s^2)] <= sqrt(E[s^2])", what is the expectation E being taken over? It is not over the i (the samples). This is what Johnc is getting at when he describes "multiple experiments". Of course, s^2 does obey a distribution, as he points out in his last post, a distribution whose mean is sigma^2. We would like to know sigma, the square root of its mean, but when we simply take sqrt(s^2), what we get, on average (that's the expectation!), is less.