Central limit theorem effect of number of samples vs sample size

lken

New Member
#1
Hey guys,

For an assignment I've been asked to investigate the CLT when sampling from different distributions using different sample sizes. Do to this in R, I've created histograms of the binomial, poisson, and uniform distribution each with 10, 100, 1000 samples of sample sizes of 10, 20, 50,and 100 (with a long-winded code that am also going to try to fix up at a later data).

As I understand, as sample size increases, the distribution of means approximates a normal distribution. I can see this clearly in my histograms when I increase the number of samples (or the number of replicates of 10, 20, 50, and 100 that I take), but not when I increase the sample sizes. Increasing the sample sizes from 10 to 50, for example, does not seem to bring in the tails of the distribution or approximate the normal distribution any further. My guess was that increasing both sample size and replicates helped increase the tendency towards meeting the CLT. Am I missing something?

Thanks,
lken
 

Dragan

Super Moderator
#2
Well, no, you're not really missing anything.

However, understand that how quickly the convergence to the normal approximation is based on the underlying distribution that is being sampled from to compute the sample means - for any particular sample size.

Keep in mind, that if I draw 2 million pseudo-random deviates from a Student t-distribution with 25 degrees of freedom, that the histogram will look like a normal distribution - but the most used traditional tests used to test for normality e.g.,Kolmogrov-Smirnov, Shapiro-Wilks, Anderson-Darlington will reject a null hypothesis assuming normality at a 0.05 level.
 

lken

New Member
#3
So if the underlying distribution is more normal (ei a uniform distribution), the means will approximate a normal distribution faster than if I take it from a poisson distribution. So taking replicates from small sample size from a poisson distribution may not approximate normal, and increasing that sample size to 50 or 100 may still not be large enough to approximate normal until you replicate it a large number of times. I think it's becoming a bit less vague. Thanks for your help!