I often see that for parametric tests one assumption is that the population should be normally distributed. However if I understand correctly the central limit theory shows that the distribution of the sample means (the sampling distribution) has a normal distribution, irrespective of the distribution of the variable itself. If this is correct, why should I then still test for normality?
Thanks in advance for your reply.
Thanks for the quick reply Dragan, but not really sure what you meant or perhaps you misunderstood my question.
I understand parametric tests are preferred over non-parametric tests (such as the ones that use ranking), but perhaps to make my question a bit more clear, let's simply use a one-sided z-test as an example. I understand the sample size has to be reasonable large (often said n > 30) in order for the central limit theory to be applied. But many books/sites then also say in order to perform a one-sample z-test one assumption is that the population is normally distributed (for example here), and therefor one might want to test for this (for example with a Shapiro-Wilk Test). What I don't understand is where this assumption is coming from?
Picture a skewed random variable and its calculated standard error. Now you want to conduct a two-sided hypothesis test, how trustworthy may 95% confidence intervals be (in including the true estimate upon repeated sampling from the population)? Could one come to false conclusions based on using the constructed symmetrical confidence intervals? This is just a generic version of how non-normality can have issues.
Stop cowardice, ban guns!
Thanks hlsmith. Indeed I forgot about that standard deviation of the sampling distribution (standard error), is often estimated by using the sample standard deviation. However in case of a true z-test where you actually somehow know the population standard deviation, would a test for normality still be needed? I was always impressed by the central limit theory to apply even if the population is not normally distributed, and then surprised that to use it I have to test if it is.
But remember that the central limit theorem is an asymptotic theorem, which means its properties are only supposed hold as n goes to infinity. So yeah, in theory, if you have an infinite sample (or a very large one) the properties of the theorem will kick in and you can use the normal distribution as a reference to obtain p-values. But, for practical purposes, it is hard to tell how big does n have to be before these asymptotic properties kick in and, in general, these change from case to case.
For instance, take your case of the one-sample z-test and run it a small simulation with a small sample size:
For a population coming form standard normal distribution where the null hypothesis is true you can see something like:
So the nominal Type 1 error rate of 5% is just off by .0002, which is very small. So we're good here.Code:library(BSDA) pval<-double(10000) for (i in 1:10000){ a<- rnorm(20, mean=0, sd=1) pval[i]<-z.test(a, mu=0, sigma.x=1)$p.val } sum(pval<.05)/10000 [1] 0.0502
Try the same scenario but we're switching our samples from a normal distribution to a chi-square distribution with 1 degree of freedom (so very skewed):
Uhm... what do we see here? When the Type 1 error rate should be 5% it is now 4.44% I mean, it's not horrible but it is still *not* 5%. However, if we bump n from 20 to say 100, see what happens to both tests:Code:pval<-double(10000) for (i in 1:10000){ a<- rchisq(20, df=1) pval[i]<-z.test(a, mu=1, sigma.x=sqrt(2))$p.val } sum(pval<.05)/10000 [1] 0.0444
Code:pval<-double(10000) for (i in 1:10000){ a<- rnorm(100, mean=0, sd=1) pval[i]<-z.test(a, mu=0, sigma.x=1)$p.val } sum(pval<.05)/10000 [1] 0.0498
I mean, there is obviously going to be some variability in the empirical p-values (they're simulated, of course). But you can at least see that by the time you reach a sample of n=100, the properties of the central limit theorem (for the case of the z-test) have kicked in and we can ignore the distributional shape from which the data comes from. More complicated tests or statistical methods will require larger and larger samples before the Central Limit Theorem kicks in.Code:pval<-double(10000) for (i in 1:10000){ a<- rchisq(100, df=1) pval[i]<-z.test(a, mu=1, sigma.x=sqrt(2))$p.val } sum(pval<.05)/10000 [1] 0.0501
Yeah, and a lot of textbooks aimed for methodology courses (particularly in the social sciences, which is the area I come from) are notorious for relying on procedures that perhaps made sense back in the 1970s or just prefer a cookbook approach to statistical analysis without engaging in any critical thinking. It shouldn't come as a surprise then that, after years of questionable statistical practice, psychology is finding itself in the midst of its own crisis of replicablity.
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
Thanks spunky for the elaboration. I think I'm getting there
Small thing is:
With the normal distribution you mention 'only .0002' but for the chi-square the .0444, I guess you meant .0056 still more than the .0002 but not as much. Anyway I get what you're saying and thanks for that simulation.
So if I understand correct, in essence if a sample size would be large enough there indeed would be no need to test for normality, but since 'large enough' is a vague limit, it's better to simply test for it. I also came across this site but will have to read that more careful.
Wow, touched a nerve? Thanks for that article link, will definitely go through it.
Yeah, something like that. I mean, I really don't think testing for normality is kind of a big deal. The normal distribution is more of a mathematical framework to work with. Like the people on Cross Validated said, if you rely exclusively on tests of normality, you'll find out very quickly that pretty much nothing is normally distributed. But we know that already because we're working with real data from the real world. The interesting question is more along the lines of how much you can violate an assumption and still get away with reasonable conclusions. I feel that, for more practical purposes, it is useful to think about the assumptions as a frame of reference and then engage in some critical thinking (like with simulations) to see whether or not you can do or test whatever it is you're doing or testing.
It is kind of a big deal right now, but I like it because it shines the spotlight on those of us who do statistics in the social sciences... and I'm all about that spotlight because money doesn't grow on trees and I need a paycheque
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
ondansetron (04-06-2017)
I think Spunky has a good point. There's a ton of trash published and bad practices perpetuated, even by highly regarded researchers and journals. The problem is everywhere: psychology, medicine, public health, marketing, epidemiology, sociology, nursing...the list goes on. Part of the problem is that people who don't know what they're doing can publish papers or textbooks and the rest of the field doesn't know better or assumes that published means correct. People treat statistics as a set of calculations, too, just as they often think of mathematics as crunching numbers.
One of the most common problems I see with normality testing is that people don't know when it's appropriate and they often misinterpret the results. For example, they incorrectly conclude the data come from a normal distribution because the test was not significant.
Tweet |