+ Reply to Thread
Results 1 to 9 of 9

Thread: Why normality assumption needed?

  1. #1
    Points: 3,311, Level: 35
    Level completed: 74%, Points required for next Level: 39

    Posts
    46
    Thanks
    3
    Thanked 5 Times in 5 Posts

    Why normality assumption needed?




    I often see that for parametric tests one assumption is that the population should be normally distributed. However if I understand correctly the central limit theory shows that the distribution of the sample means (the sampling distribution) has a normal distribution, irrespective of the distribution of the variable itself. If this is correct, why should I then still test for normality?

    Thanks in advance for your reply.

  2. #2
    Super Moderator
    Points: 13,151, Level: 74
    Level completed: 76%, Points required for next Level: 99
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    2,014
    Thanks
    0
    Thanked 223 Times in 192 Posts

    Re: Why normality assumption needed?

    Quote Originally Posted by blubblub View Post
    I often see that for parametric tests one assumption is that the population should be normally distributed. However if I understand correctly the central limit theory shows that the distribution of the sample means (the sampling distribution) has a normal distribution, irrespective of the distribution of the variable itself. If this is correct, why should I then still test for normality?



    Thanks in advance for your reply.
    Because of small sample size(s). For example, rank-based tests in the context of ANOVA can be much more powerful then the parametric ANOVA F-test - especially when sample size(s) are small. That said, it is as much a concern for Power as it is Type I error.

  3. #3
    Points: 3,311, Level: 35
    Level completed: 74%, Points required for next Level: 39

    Posts
    46
    Thanks
    3
    Thanked 5 Times in 5 Posts

    Re: Why normality assumption needed?

    Thanks for the quick reply Dragan, but not really sure what you meant or perhaps you misunderstood my question.

    I understand parametric tests are preferred over non-parametric tests (such as the ones that use ranking), but perhaps to make my question a bit more clear, let's simply use a one-sided z-test as an example. I understand the sample size has to be reasonable large (often said n > 30) in order for the central limit theory to be applied. But many books/sites then also say in order to perform a one-sample z-test one assumption is that the population is normally distributed (for example here), and therefor one might want to test for this (for example with a Shapiro-Wilk Test). What I don't understand is where this assumption is coming from?

  4. #4
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Why normality assumption needed?

    Picture a skewed random variable and its calculated standard error. Now you want to conduct a two-sided hypothesis test, how trustworthy may 95% confidence intervals be (in including the true estimate upon repeated sampling from the population)? Could one come to false conclusions based on using the constructed symmetrical confidence intervals? This is just a generic version of how non-normality can have issues.
    Stop cowardice, ban guns!

  5. #5
    Points: 3,311, Level: 35
    Level completed: 74%, Points required for next Level: 39

    Posts
    46
    Thanks
    3
    Thanked 5 Times in 5 Posts

    Re: Why normality assumption needed?

    Quote Originally Posted by hlsmith View Post
    Picture a skewed random variable and its calculated standard error. Now you want to conduct a two-sided hypothesis test, how trustworthy may 95% confidence intervals be (in including the true estimate upon repeated sampling from the population)? Could one come to false conclusions based on using the constructed symmetrical confidence intervals? This is just a generic version of how non-normality can have issues.
    Thanks hlsmith. Indeed I forgot about that standard deviation of the sampling distribution (standard error), is often estimated by using the sample standard deviation. However in case of a true z-test where you actually somehow know the population standard deviation, would a test for normality still be needed? I was always impressed by the central limit theory to apply even if the population is not normally distributed, and then surprised that to use it I have to test if it is.

  6. #6
    TS Contributor
    Points: 22,378, Level: 93
    Level completed: 3%, Points required for next Level: 972
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Why normality assumption needed?

    Quote Originally Posted by blubblub View Post
    I was always impressed by the central limit theory to apply even if the population is not normally distributed, and then surprised that to use it I have to test if it is.
    But remember that the central limit theorem is an asymptotic theorem, which means its properties are only supposed hold as n goes to infinity. So yeah, in theory, if you have an infinite sample (or a very large one) the properties of the theorem will kick in and you can use the normal distribution as a reference to obtain p-values. But, for practical purposes, it is hard to tell how big does n have to be before these asymptotic properties kick in and, in general, these change from case to case.

    For instance, take your case of the one-sample z-test and run it a small simulation with a small sample size:

    For a population coming form standard normal distribution where the null hypothesis is true you can see something like:

    Code: 
    library(BSDA)
    pval<-double(10000)
    
    for (i in 1:10000){
    
    a<- rnorm(20, mean=0, sd=1)
    pval[i]<-z.test(a, mu=0, sigma.x=1)$p.val
    
    }
    
    sum(pval<.05)/10000
    
    [1] 0.0502
    So the nominal Type 1 error rate of 5% is just off by .0002, which is very small. So we're good here.

    Try the same scenario but we're switching our samples from a normal distribution to a chi-square distribution with 1 degree of freedom (so very skewed):

    Code: 
    pval<-double(10000)
    
    for (i in 1:10000){
    
    a<- rchisq(20, df=1)
    pval[i]<-z.test(a, mu=1, sigma.x=sqrt(2))$p.val
    
    }
    
    sum(pval<.05)/10000
    
    [1] 0.0444
    Uhm... what do we see here? When the Type 1 error rate should be 5% it is now 4.44% I mean, it's not horrible but it is still *not* 5%. However, if we bump n from 20 to say 100, see what happens to both tests:

    Code: 
    pval<-double(10000)
    
    for (i in 1:10000){
    
    a<- rnorm(100, mean=0, sd=1)
    pval[i]<-z.test(a, mu=0, sigma.x=1)$p.val
    
    }
    
    sum(pval<.05)/10000
    [1] 0.0498
    Code: 
    pval<-double(10000)
    
    for (i in 1:10000){
    
    a<- rchisq(100, df=1)
    pval[i]<-z.test(a, mu=1, sigma.x=sqrt(2))$p.val
    
    }
    
    sum(pval<.05)/10000
    [1] 0.0501
    I mean, there is obviously going to be some variability in the empirical p-values (they're simulated, of course). But you can at least see that by the time you reach a sample of n=100, the properties of the central limit theorem (for the case of the z-test) have kicked in and we can ignore the distributional shape from which the data comes from. More complicated tests or statistical methods will require larger and larger samples before the Central Limit Theorem kicks in.

    Quote Originally Posted by blubblub View Post
    But many books/sites then also say in order to perform a one-sample z-test one assumption is that the population is normally distributed (for example here)
    Yeah, and a lot of textbooks aimed for methodology courses (particularly in the social sciences, which is the area I come from) are notorious for relying on procedures that perhaps made sense back in the 1970s or just prefer a cookbook approach to statistical analysis without engaging in any critical thinking. It shouldn't come as a surprise then that, after years of questionable statistical practice, psychology is finding itself in the midst of its own crisis of replicablity.
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  7. #7
    Points: 3,311, Level: 35
    Level completed: 74%, Points required for next Level: 39

    Posts
    46
    Thanks
    3
    Thanked 5 Times in 5 Posts

    Re: Why normality assumption needed?

    Thanks spunky for the elaboration. I think I'm getting there

    Small thing is:
    Quote Originally Posted by spunky View Post
    ....
    For a population coming form standard normal distribution where the null hypothesis is true you can see something like:
    ...
    So the nominal Type 1 error rate of 5% is just off by .0002, which is very small. So we're good here.

    Try the same scenario but we're switching our samples from a normal distribution to a chi-square distribution with 1 degree of freedom (so very skewed):

    [1] 0.0444
    [/code]

    Uhm... what do we see here? When the Type 1 error rate should be 5% it is now 4.44% I mean, it's not horrible but it is still *not* 5%. ...

    ......
    With the normal distribution you mention 'only .0002' but for the chi-square the .0444, I guess you meant .0056 still more than the .0002 but not as much. Anyway I get what you're saying and thanks for that simulation.

    So if I understand correct, in essence if a sample size would be large enough there indeed would be no need to test for normality, but since 'large enough' is a vague limit, it's better to simply test for it. I also came across this site but will have to read that more careful.

    Quote Originally Posted by spunky View Post
    Yeah, and a lot of textbooks aimed for methodology courses (particularly in the social sciences, which is the area I come from) are notorious for relying on procedures that perhaps made sense back in the 1970s or just prefer a cookbook approach to statistical analysis without engaging in any critical thinking. It shouldn't come as a surprise then that, after years of questionable statistical practice, psychology is finding itself in the midst of its own crisis of replicablity.
    Wow, touched a nerve? Thanks for that article link, will definitely go through it.

  8. #8
    TS Contributor
    Points: 22,378, Level: 93
    Level completed: 3%, Points required for next Level: 972
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Why normality assumption needed?

    Quote Originally Posted by blubblub View Post
    Thanks spunky for the elaboration. I think I'm getting there

    Small thing is:


    With the normal distribution you mention 'only .0002' but for the chi-square the .0444, I guess you meant .0056 still more than the .0002 but not as much. Anyway I get what you're saying and thanks for that simulation.

    So if I understand correct, in essence if a sample size would be large enough there indeed would be no need to test for normality, but since 'large enough' is a vague limit, it's better to simply test for it. I also came across this site but will have to read that more careful.[/URL]
    Yeah, something like that. I mean, I really don't think testing for normality is kind of a big deal. The normal distribution is more of a mathematical framework to work with. Like the people on Cross Validated said, if you rely exclusively on tests of normality, you'll find out very quickly that pretty much nothing is normally distributed. But we know that already because we're working with real data from the real world. The interesting question is more along the lines of how much you can violate an assumption and still get away with reasonable conclusions. I feel that, for more practical purposes, it is useful to think about the assumptions as a frame of reference and then engage in some critical thinking (like with simulations) to see whether or not you can do or test whatever it is you're doing or testing.



    Quote Originally Posted by blubblub View Post
    Wow, touched a nerve? Thanks for that article link, will definitely go through it.
    It is kind of a big deal right now, but I like it because it shines the spotlight on those of us who do statistics in the social sciences... and I'm all about that spotlight because money doesn't grow on trees and I need a paycheque
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  9. The Following User Says Thank You to spunky For This Useful Post:

    ondansetron (04-06-2017)

  10. #9
    Points: 1,741, Level: 24
    Level completed: 41%, Points required for next Level: 59

    Posts
    230
    Thanks
    37
    Thanked 68 Times in 59 Posts

    Re: Why normality assumption needed?


    Quote Originally Posted by blubblub View Post
    Wow, touched a nerve?
    I think Spunky has a good point. There's a ton of trash published and bad practices perpetuated, even by highly regarded researchers and journals. The problem is everywhere: psychology, medicine, public health, marketing, epidemiology, sociology, nursing...the list goes on. Part of the problem is that people who don't know what they're doing can publish papers or textbooks and the rest of the field doesn't know better or assumes that published means correct. People treat statistics as a set of calculations, too, just as they often think of mathematics as crunching numbers.

    One of the most common problems I see with normality testing is that people don't know when it's appropriate and they often misinterpret the results. For example, they incorrectly conclude the data come from a normal distribution because the test was not significant.

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats