Why does sample std dev underestimate population std dev?

johnc

New Member
#1
The subject line is too short for me to word out my question properly:

Why does (square root of unbiased estimator for population variance) underestimate population standard deviation?

Refering to this wikipedia page Unbiased estimation of standard deviation, it says that "it follows from Jensen's inequality that the square root of the sample variance is an underestimate".

I do know that for the concave square root function, Jensen's inequality says that the square root of the mean > mean of the square root.

So, how do we conclude that the square root of the sample variance underestimates population standard deviation?

Since we know from Jensen's inequality that square root of the mean > mean of the square root, does "square root of sample variance" somehow relate to "mean of the square root" while "population standard deviation" somehow relates to "square root of the mean" How so? This does not make any sense.
 

Dason

Ambassador to the humans
#2
\(E[s^2] = \sigma^2\)

sqrt is concave so by Jensen's inequality for any positive random variable X we have
\(E[\sqrt{X}] \leq \sqrt{E[X]}\)

If we take \(X = s^2\) then this says
Our sample estimate of the standard deviation =\(E[\sqrt{s^2}] \leq \sqrt{\sigma^2} = \sigma\)
 

johnc

New Member
#3
Dason, thank you! Now I understand the mathematical proof. I would however like clarification on practical purposes:


If I were to conduct experiments and obtain multiple samplings from a population, would first calculating [TEX] s=\sqrt{\frac{1}{n-1}\sum_{i=1}^n{(x_i-\overline{x})^2}[/TEX] for each sampling followed by calculating [TEX]\overline{s}=\frac{1}{N}\sum_{j=1}^N{s_j}[/TEX] be equivalent to [TEX] E[\sqrt{s^2}][/TEX]?


Also then, would first calculating the unbiased estimate of population variance over all samplings by [TEX] \overline{s^2}=\frac{1}{N}\sum_{j=1}^N{\sqrt{\frac{1}{n-1}\sum_{i=1}^n{(x_{i,j}-\overline{x}_j)^2}}[/TEX] followed by taking the square root [TEX] \sqrt{\overline{s^2}}[/TEX] be equivalent to [TEX] \sqrt{E[s^2]}[/TEX]?


If so, then one would perform the second case and always get the unbiased estimator for population standard deviation since [TEX] \sqrt{E[s^2]}= \sqrt{\sigma^2}=\sigma[/TEX]?! Something has to be wrong somewhere.
 

BGM

TS Contributor
#4
By Law of Large number,

\( \bar{S} = \frac {1} {N} \sum_{j=1}^N S_j \to E \) as \( N \to \infty \)

Also

\( E[\bar{S}] = \frac {1} {N} \sum_{j=1}^N E[S_j] = E < \sqrt{E[S^2]} \)

so \( \bar{S} \) is a biased estimator of \( \sigma \)
 

johnc

New Member
#5
BGM, thanks, I understand the mathematical proof now. Can you also comment on practical approaches to getting the unbiased estimator for population standard deviation?
 

BGM

TS Contributor
#6
Actually, this bias is not a big concern because consistency is much more important.

As shown in the wiki article, you may correct the bias by multiplying a constant before, i.e. \( E[cS] = \sigma \). E.g. if the random sample is normal, this constant \( c \to 1 \) as \( n \to \infty \) and therefore this mean \( S \) is asymptotically unbiased.

When the sample size is small, both biased and unbiased estimator are not reliable.
When the sample size is large, if the biased estimator is consistent (asymptotically unbiased), then it actually make no difference with the unbiased estimator in terms of bias.
 

Dragan

Super Moderator
#7
Also, if you assume a normal population, you can get an unbiased estimate of the sample standard deviation as:

\( E\left [ \left ( 1+\frac{1}{4n-1} \right )\hat{\sigma } \right ]=\sigma \)
 

johnc

New Member
#8
BGM: I agree that practically, the bias might not be important.
Dragan: Thanks for the correction factor.

Now, I would just like to know for peace of mind, how the conclusion that [TEX] s=\sqrt{\frac{1}{n-1}\sum_{i=1}^n{(x_i-\overline{x})^2}}[/TEX] is lesser than the true population standard deviation came about. Can someone explain starting from this expression, how [TEX] s[/TEX] is considered "mean of square root" and thus can invoke Jensen's inequality? If expectation is to be used in the explanation, can you please start from this expression and show me how you get to the expectation?

Thank you!
 

Dason

Ambassador to the humans
#9
I thought my response earlier took care of that? Anywho I'll add in a few more details I guess

Let \(s^2 = \frac{1}{n-1}\sum_{i=1}^n(x_i - \bar{x})^2\)

Then we know that \(E[s^2] = \sigma^2\)

We take our estimate of the standard deviation to be the square root of the the estimator of the variance:

\(s = \sqrt{\frac{1}{n-1}\sum_{i=1}^n(x_i - \bar{x})^2}\)

Square root is concave so by Jensen's inequality for any positive random variable X we have
\(E[\sqrt{X}] \leq \sqrt{E[X]}\)

If we take \(X = s^2\) then this says

The expected value of our sample estimate of the standard deviation:
\(E = E[\sqrt{s^2}] \leq \sqrt{E[s^2]} = \sqrt{\sigma^2} = \sigma\) where the inequality follows from Jensen's inequality.
 
#10
I think I understand now.

The crucial bit was realizing that in the math we are dealing with random variables which have distributions characterized by means and variances. Hence, [TEX] s^2[/TEX] and [TEX] s=\sqrt{s^2}[/TEX] each has its own distribution with its own mean and variance.

[TEX] s^2[/TEX] is from a distribution where the mean is [TEX] E[s^2]=\sigma^2[/TEX].
[TEX] s=\sqrt{s^2}[/TEX] is from a distribution where the mean is [TEX] E[\sqrt{s^2}]\leq\sqrt{E[s^2]}=\sigma[/TEX].

When we draw samples from a population and calculate the sample variance, the value we obtain is a realized variate from the [TEX] s^2[/TEX] distribution. When we square root this number, we get a realized variate from the [TEX] s[/TEX] distribution. Since the mean of this distribution is lesser than or equal to the population std dev, this number we calculated would tend to underestimate the pop std dev and thus correction is (theoretically) required.

Please tell me if my above reasoning is correct!
 

SJP

New Member
#11
Johnc is asking an excellent question here. The confusion arises not from any misunderstanding of Jensen's Inequality, which is straightforward. Confusion arises in applying the math to the problem.

The definition of s^2 looks like an expectation itself--it is the average over i. So at first glance, it appears that when we take its square root, we are taking "the square root of the average", which is greater than the average of the square root, which is what we really want, and therefore an over-estimate, exactly the opposite conclusion!!

The crucial point is this: In the response "E[sqrt(s^2)] <= sqrt(E[s^2])", what is the expectation E being taken over? It is not over the i (the samples). This is what Johnc is getting at when he describes "multiple experiments". Of course, s^2 does obey a distribution, as he points out in his last post, a distribution whose mean is sigma^2. We would like to know sigma, the square root of its mean, but when we simply take sqrt(s^2), what we get, on average (that's the expectation!), is less.