Biased and Unbiased Variance

#1
Hello everyone,

I did an binomial experiment to flip 10 coins and count the total number of heads.This was repeated 200 times.However at the end of the 200 tries, i noticed that the biased sample variance is actually closer to my population variance.Why is this so? Isn't the unbiased variance supposed to be closer to the population variance? The population variance is 2.5, while the biased variance is 2.7,unbiased variance is 2.77.

Also, if instead of 10 coins, we flip 5 coins each time, what will be the difference theoretically?
 

Dason

Ambassador to the humans
#2
Keep in mind that you're dealing with a random variable. There is variation. I wrote some code to simulate this process

Code:
experiment <- function(n, k = 10){
  # Flip a coin k times and count the number of heads, do that n times
  vals <- rbinom(n, k, 0.5)
  # This gives the unbiased estimate of variance
  sampvar <- var(vals)
  # Need to multiply by (k-1)/k to get the biased estimate
  popvar <- sampvar * (k-1)/k
  # Return the results
  result <- c(unbiased = sampvar, biased = popvar)
  return(result)
}

# Do the experiment where you flip a coin 10 times and repeat that 200 times...
# But do that whole process 1000 times and record the results
test <- replicate(1000, experiment(200, 10))

# > summary(test["unbiased",])
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 1.789   2.329   2.494   2.500   2.660   3.552 
# > summary(test["biased",])
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 1.610   2.096   2.244   2.250   2.394   3.197
The results are at the bottom. So on average if you do this experiment the unbiased result will be 2.5 and the biased result will be 2.25.
 
#3
Keep in mind that you're dealing with a random variable. There is variation. I wrote some code to simulate this process

Code:
experiment <- function(n, k = 10){
  # Flip a coin k times and count the number of heads, do that n times
  vals <- rbinom(n, k, 0.5)
  # This gives the unbiased estimate of variance
  sampvar <- var(vals)
  # Need to multiply by (k-1)/k to get the biased estimate
  popvar <- sampvar * (k-1)/k
  # Return the results
  result <- c(unbiased = sampvar, biased = popvar)
  return(result)
}

# Do the experiment where you flip a coin 10 times and repeat that 200 times...
# But do that whole process 1000 times and record the results
test <- replicate(1000, experiment(200, 10))

# > summary(test["unbiased",])
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 1.789   2.329   2.494   2.500   2.660   3.552 
# > summary(test["biased",])
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 1.610   2.096   2.244   2.250   2.394   3.197
The results are at the bottom. So on average if you do this experiment the unbiased result will be 2.5 and the biased result will be 2.25.
We did this experiment manually, we really went to flip it 200 times.And we are required to explain our results. Is there any reason why the Biased variance may be more accurate?
 

Dason

Ambassador to the humans
#4
Because it just so happened that this time it ended up being closer. On average it will underestimate the true population variance but that doesn't mean that it will always be worse than the unbiased version. Just like it's possible for me to flip an unbiased coin 4 times and get 4 heads and then flip a biased coin (with a 75% chance of heads) and get 2 heads, 2 tails. That doesn't mean that using the biased coin is typically a better estimator of the true probability of heads on the other coin - it just happened this time that it was better. But in practice we don't know the things we are trying to estimate so when choosing estimators we tend to choose ones that do better on average (which the unbiased estimator of the variance does).