+ Reply to Thread
Results 1 to 15 of 15

Thread: Normal distribution of sample or population?

  1. #1
    Points: 74, Level: 1
    Level completed: 48%, Points required for next Level: 26

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question Normal distribution of sample or population?




    Hello,

    for using parametric tests, is it required that the sample data are normally distributed or is it sufficient to know from other similar types of experiments that the population data are normally distributed?
    In samples, which are obviously smaller than the population, a few extreme values may ruin the normal distribution but the population as a whole can still have a normal distribution.

  2. #2
    TS Contributor
    Points: 12,501, Level: 73
    Level completed: 13%, Points required for next Level: 349
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,491
    Thanks
    162
    Thanked 334 Times in 314 Posts

    Re: Normal distribution of sample or population?

    It depends on what you want to do - e.g. the t-test is quite robust against deviations from normality. If you do an ANOVA, only the residuals have to be normal .. etc
    regards

  3. #3
    Points: 74, Level: 1
    Level completed: 48%, Points required for next Level: 26

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Normal distribution of sample or population?

    Quote Originally Posted by rogojel View Post
    It depends on what you want to do - e.g. the t-test is quite robust against deviations from normality.
    I have already heard that several times about the t-test. I just don't know what "quite robust" exactly means. How do I know when the deviation is OK and when it is too much? Is there a rule of thumb for this?

    If you do an ANOVA, only the residuals have to be normal .. etc
    regards
    Actually, t-tests (standard paired/unpaired, unpaired with Welch correction) and ANOVA (mainly one-way) as well as (post-hoc) multiple comparison tests are what I am interested in.

  4. #4
    TS Contributor
    Points: 1,924, Level: 26
    Level completed: 24%, Points required for next Level: 76

    Posts
    255
    Thanks
    39
    Thanked 71 Times in 62 Posts

    Re: Normal distribution of sample or population?

    Quote Originally Posted by rogojel View Post
    It depends on what you want to do - e.g. the t-test is quite robust against deviations from normality. If you do an ANOVA, only the residuals have to be normal .. etc
    regards
    I believe with ANOVA, the distributions of Y values need to each be normally distributed with a common variance rather than the error term (generalization of a t-test). The error term needs a normal distribution in the context of an ordinary least squares regression. The part the confuses me a bit is that a simple regression using only a qualitative variable is equivalent to an ANOVA with that same independent variable as a factor. However, I think the context and the goal of the inference can maybe help with this "discrepancy". At one point, I believe I heard that a normal distribution of errors implies a normal distribution of Y values, but I could be mistaken, but I had always learned ANOVA assumptions with respect to the DV and regression assumptions with respect to the error term.

  5. #5
    TS Contributor
    Points: 17,949, Level: 85
    Level completed: 20%, Points required for next Level: 401
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,561
    Thanks
    56
    Thanked 644 Times in 606 Posts

    Re: Normal distribution of sample or population?

    for using parametric tests, is it required that the sample data are normally distributed or is it sufficient to know from other similar types of experiments that the population data are normally distributed?
    For parametric tests not the unconditional data should be sampled from a normally distributed population, but the residuals of the model (e.g. from a regression equation, or from an ANOVA model) should be a sample from a normally distributed population.

    But even this assumption is needed only for small samples. If n > 30, according to the central limit theorem the test statistics are not biased, even if the residuals are from a non-normal population.

    HTH

    Karabiner
    »Jetzt kann mich der Führer mal am Arsch lecken.« (Ernst Kuzorra, 1941)

  6. #6
    TS Contributor
    Points: 15,152, Level: 79
    Level completed: 61%, Points required for next Level: 198
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,185
    Thanks
    34
    Thanked 408 Times in 366 Posts

    Re: Normal distribution of sample or population?

    Quote Originally Posted by ondansetron View Post
    I believe with ANOVA, the distributions of Y values need to each be normally distributed with a common variance rather than the error term (generalization of a t-test). The error term needs a normal distribution in the context of an ordinary least squares regression. The part the confuses me a bit is that a simple regression using only a qualitative variable is equivalent to an ANOVA with that same independent variable as a factor. However, I think the context and the goal of the inference can maybe help with this "discrepancy". At one point, I believe I heard that a normal distribution of errors implies a normal distribution of Y values, but I could be mistaken, but I had always learned ANOVA assumptions with respect to the DV and regression assumptions with respect to the error term.
    The following is a very clear and understandable explanation: Checking the Normality Assumption for an ANOVA Model

  7. #7
    TS Contributor
    Points: 1,924, Level: 26
    Level completed: 24%, Points required for next Level: 76

    Posts
    255
    Thanks
    39
    Thanked 71 Times in 62 Posts

    Re: Normal distribution of sample or population?

    Quote Originally Posted by Miner View Post
    The following is a very clear and understandable explanation: Checking the Normality Assumption for an ANOVA Model
    Glad to see I wasn't far off and the explanation helps link them! It's seems like one of those obvious things when you think of the assumption and how it plays out-- essentially exactly what is done in that article. Thank you. The one bone I pick with the explanation is that errors and residuals are not the same thing, at least as I was taught in my stats courses and in the statistics books I have read. The error is a theoretical quantity that is unobservable while the residual is an observable, sample estimate of the error of prediction. Again, that's how I was taught by a couple of statisticians and different books. Although, it's somewhat of a smaller point. What are your thoughts on that?

    Quote Originally Posted by Karabiner View Post

    But even this assumption is needed only for small samples. If n > 30, according to the central limit theorem the test statistics are not biased, even if the residuals are from a non-normal population.

    HTH

    Karabiner
    It's helpful to keep in mind that "small" and 30 are not exactly hard lines-- much larger samples may be needed for data that are drawn from increasingly non-normal distributions. Thirty may be more than plenty for a slightly non-normal distribution, but 10000 or more may be required for something from a multimodal, heavily skewed distribution.
    Last edited by ondansetron; 11-10-2017 at 04:17 PM.

  8. #8
    TS Contributor
    Points: 15,152, Level: 79
    Level completed: 61%, Points required for next Level: 198
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,185
    Thanks
    34
    Thanked 408 Times in 366 Posts

    Re: Normal distribution of sample or population?

    Quote Originally Posted by ondansetron View Post
    The one bone I pick with the explanation is that errors and residuals are not the same thing, at least as I was taught in my stats courses and in the statistics books I have read. The error is a theoretical quantity that is unobservable while the residual is an observable, sample estimate of the error of prediction. Again, that's how I was taught by a couple of statisticians and different books. Although, it's somewhat of a smaller point. What are your thoughts on that?
    I think the confusion arises because you have both residuals and errors. See the attached images. You have a table of residuals, which are the differences between the observed and predicted values. But you also have the ANOVA table that has an error term that is an aggregate of the residuals. The two are definitely related, but as you said they are different.
    Attached Thumbnails Attached Thumbnails Click image for larger version

Name:	ANOVA.jpg‎
Views:	4
Size:	93.2 KB
ID:	6839   Click image for larger version

Name:	Residuals.jpg‎
Views:	4
Size:	117.4 KB
ID:	6840  

  9. #9
    TS Contributor
    Points: 17,949, Level: 85
    Level completed: 20%, Points required for next Level: 401
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,561
    Thanks
    56
    Thanked 644 Times in 606 Posts

    Re: Normal distribution of sample or population?

    Quote Originally Posted by ondansetron View Post
    It's helpful to keep in mind that "small" and 30 are not exactly hard lines-- much larger samples may be needed for data that are drawn from increasingly non-normal distributions. Thirty may be more than plenty for a slightly non-normal distribution, but 10000 or more may be required for something from a multimodal, heavily skewed distribution.
    This statement is a bit surprising. What do you mean by "is required" - required for what? The test statistics calcuated are not (much) affected by non-normality of the residuals, if n > 30.

    With kind regards

    Karabiner
    »Jetzt kann mich der Führer mal am Arsch lecken.« (Ernst Kuzorra, 1941)

  10. #10
    Devorador de queso
    Points: 97,539, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent PosterActivity Award
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,987
    Thanks
    309
    Thanked 2,640 Times in 2,255 Posts

    Re: Normal distribution of sample or population?

    They're saying there is nothing magical about n=30 and depending on the characteristics of the population you may need larger sample sizes to get the distribution of the test statistic to be approximately normal.
    I don't have emotions and sometimes that makes me very sad.

  11. #11
    TS Contributor
    Points: 17,949, Level: 85
    Level completed: 20%, Points required for next Level: 401
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,561
    Thanks
    56
    Thanked 644 Times in 606 Posts

    Re: Normal distribution of sample or population?

    No, nothing magical with 30. I have not yet seen any simulation which required more than n = 30 to 40 or so, in order to deliver approxiately normally distributed test statistics, even with markedly non-normal distributions of the residuals, e.g. uniform, extemely skewed, bimodal. But anyway. My problem is the "10000 or more" notion, which really is surprising (at least for me).

    With kind regards

    Karabiner
    »Jetzt kann mich der Führer mal am Arsch lecken.« (Ernst Kuzorra, 1941)

  12. #12
    Devorador de queso
    Points: 97,539, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent PosterActivity Award
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,987
    Thanks
    309
    Thanked 2,640 Times in 2,255 Posts

    Re: Normal distribution of sample or population?

    Yeah that might be a bit extreme... Or it may not depending on what context you're talking about. Extremely rare events modeled using logistic regression? We need very large sample sizes to get that to work well and for any inferences based on normal theory to be valid.
    I don't have emotions and sometimes that makes me very sad.

  13. #13
    TS Contributor
    Points: 1,924, Level: 26
    Level completed: 24%, Points required for next Level: 76

    Posts
    255
    Thanks
    39
    Thanked 71 Times in 62 Posts

    Re: Normal distribution of sample or population?

    Dason explained what I meant, however the 10,000 number was more of an illustrative figure. Although, in the case of something drawn from a Cauchy distribution, there is no sample size that is large enough to get a roughly normal sampling distribution, as far as I am aware. It is a good example of CLT noncompliance.

  14. The Following User Says Thank You to ondansetron For This Useful Post:

    rogojel (11-11-2017)

  15. #14
    TS Contributor
    Points: 17,949, Level: 85
    Level completed: 20%, Points required for next Level: 401
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,561
    Thanks
    56
    Thanked 644 Times in 606 Posts

    Re: Normal distribution of sample or population?

    Hard cases make bad law...
    »Jetzt kann mich der Führer mal am Arsch lecken.« (Ernst Kuzorra, 1941)

  16. #15
    TS Contributor
    Points: 12,501, Level: 73
    Level completed: 13%, Points required for next Level: 349
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,491
    Thanks
    162
    Thanked 334 Times in 314 Posts

    Re: Normal distribution of sample or population?


    Quote Originally Posted by ondansetron View Post
    Dason explained what I meant, however the 10,000 number was more of an illustrative figure. Although, in the case of something drawn from a Cauchy distribution, there is no sample size that is large enough to get a roughly normal sampling distribution, as far as I am aware. It is a good example of CLT noncompliance.
    Luckily we do not get such variables often in real life - but it is a good reminder to not sample ratios of variables

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats