Illustrating Variance of Sample Mean in Very Small Population

#1
I'm working through some stats background I never got in highschool to make sure I fully understand some concepts on the fundamental level. While exploring the central limit theorem, and trying to understand why Var(X¯)= σ^2/n, I hit some trouble. I'm working with concrete numbers in a super small population to see it in action, but when I calculate the variance of the sample mean in the way that's more intuitive to me, it does not give the same result as the sigma-squared-over-n calculation.

Let's say we have a very small population of {2, 4, 9}. We could imagine those are ages of three children. So first, the population stats:
μ = (2+4+9)/3 = 5
σ^2 =
((2-5)^2 + (4-5)^2 + (9-5)^2) / 3
= ((-3)^2 + (-1)^2 + 4^2) / 3
= (9 + 1 + 4) / 3
= 8.67

This part is clear. But now, let's say I take a sample of n=2 from that population. There are only 3 possible samples:
  • {2, 4}
  • {2, 9}
  • {4, 9}
If I'm looking for Var(X¯), I assume I would find the mean of each sample (which is X¯), then find the variance of of those values. These are the means:
  • {2, 4}; mean = 3
  • {2, 9}; mean = 5.5
  • {4, 9}; mean = 6.5
That is, X¯ = {3, 5.5, 6.5}. And the average among those, or the E(X¯), is 5, which I understand is the same as μ.

This is where things stop making sense. If I apply my "normal" approach to variance among these sample means, I get this:

Var(X¯) =
((3-5)^2 + (5.5-5)^2 + (6.5-5)^2) / 3
= ((-2)^2 + (0.5)^2 + (1.5)^2) / 3
= ( 4 + 0.25 + 2.25) / 3
= 2.17

BUT if I use the formula σ^2/n, the result is = 8.67/2 = 4.33. That's twice the result I got when applying my "normal" approach to variance. What is wrong in my math?

Any guidance would be VERY appreciated!
 

Dason

Ambassador to the humans
#2
Variance rules assume what you have is a random variable. For a finite population this means it assumes you're sampling *with replacement*. If you go through your exercise again but assume that you can sample the same values more than once when creating your samples you'll get 9 possible samples and things should work when you use this sample space. When you have a large enough population it doesn't actually matter if you don't sample with replacement but the typical rule of thumb is that if you're sampling more than 5% of the population without replacement you need to use a finite sample correction factor.