I was always impressed by the central limit theory to apply even if the population is not normally distributed, and then surprised that to use it I have to test if it is.
But remember that the central limit theorem is an
asymptotic theorem, which means its properties are only supposed hold as
n goes to
infinity. So yeah, in theory, if you have an infinite sample (or a very large one) the properties of the theorem will kick in and you can use the normal distribution as a reference to obtain p-values. But, for practical purposes, it is hard to tell how big does n have to be before these asymptotic properties kick in and, in general, these change from case to case.
For instance, take your case of the one-sample z-test and run it a small simulation with a small sample size:
For a population coming form standard normal distribution where the null hypothesis is true you can see something like:
Code:
library(BSDA)
pval<-double(10000)
for (i in 1:10000){
a<- rnorm(20, mean=0, sd=1)
pval[i]<-z.test(a, mu=0, sigma.x=1)$p.val
}
sum(pval<.05)/10000
[1] 0.0502
So the nominal Type 1 error rate of 5% is just off by .0002, which is very small. So we're good here.
Try the same scenario but we're switching our samples from a normal distribution to a chi-square distribution with 1 degree of freedom (so very skewed):
Code:
pval<-double(10000)
for (i in 1:10000){
a<- rchisq(20, df=1)
pval[i]<-z.test(a, mu=1, sigma.x=sqrt(2))$p.val
}
sum(pval<.05)/10000
[1] 0.0444
Uhm... what do we see here? When the Type 1 error rate should be 5% it is now 4.44% I mean, it's not horrible but it is still *not* 5%. However, if we bump n from 20 to say 100, see what happens to both tests:
Code:
pval<-double(10000)
for (i in 1:10000){
a<- rnorm(100, mean=0, sd=1)
pval[i]<-z.test(a, mu=0, sigma.x=1)$p.val
}
sum(pval<.05)/10000
[1] 0.0498
Code:
pval<-double(10000)
for (i in 1:10000){
a<- rchisq(100, df=1)
pval[i]<-z.test(a, mu=1, sigma.x=sqrt(2))$p.val
}
sum(pval<.05)/10000
[1] 0.0501
I mean, there is obviously going to be some variability in the empirical p-values (they're simulated, of course). But you can at least see that by the time you reach a sample of n=100, the properties of the central limit theorem (for the case of the z-test) have kicked in and we can ignore the distributional shape from which the data comes from. More complicated tests or statistical methods will require larger and larger samples before the Central Limit Theorem kicks in.
But many books/sites then also say in order to perform a one-sample z-test one assumption is that the population is normally distributed (for example here)
Yeah, and a lot of textbooks aimed for methodology courses (particularly in the social sciences, which is the area I come from) are notorious for relying on procedures that perhaps made sense back in the 1970s or just prefer a cookbook approach to statistical analysis without engaging in any critical thinking. It shouldn't come as a surprise then that, after years of questionable statistical practice, psychology is finding itself in the midst of
its own crisis of replicablity.