+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 16

Thread: 2 samples t-test: normal or not?

  1. #1
    Points: 34, Level: 1
    Level completed: 68%, Points required for next Level: 16

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    2 samples t-test: normal or not?




    Hi.
    For a 2 sample t-test, where I'm checking the assumptions, I'm getting contradictory (at least to me) results regarding normality, between histograms Q-Q plots and tables.
    Looking at the histograms they seem normal, looking at the plots one of the samples is normal, but looking at the p-values none of them are normal.
    All the results I got are attached in this post.
    If they are not normal, I should proceed with a non-parametric test, correct? Even if one of the samples is normal?
    How is the size of N important here?
    Thanks!
    Attached Files

  2. #2
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: 2 samples t-test: normal or not?

    Both samples need to be normal in order for the small sample parametric hypothesis tests to be valid. If, on the other hand, the samples are dependent, then it is the values (residuals) given after taking the difference between the variables for each observation that is of interest. If you have a "large" sample, then you can rely on asymptotic results and forthgo with parametric hypothesis testing. I hope this clarifies things a little bit.

  3. #3
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: 2 samples t-test: normal or not?

    many test for normality have weak power so you should reject the null, but don't. The QQ plot is arguably the best way to determine normality
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  4. #4
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: 2 samples t-test: normal or not?

    How did you get SAS to kick out that figure with the normal line. Can you share the basic code?
    Stop cowardice, ban guns!

  5. #5
    Points: 34, Level: 1
    Level completed: 68%, Points required for next Level: 16

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: 2 samples t-test: normal or not?

    Quote Originally Posted by hlsmith View Post
    How did you get SAS to kick out that figure with the normal line. Can you share the basic code?
    Hi. Do you mean the normal curve in the histogram?

    PROC UNIVARIATE DATA = … ;
    VAR … ;
    CLASS … ;
    QQPLOT … /NORMAL (MU=EST SIGMA = EST);
    HISTOGRAM … /NORMAL;
    RUN;
    PROC GCHART DATA = … ;
    HBAR … /TYPE=MEAN SUMVAR=…
    FREQLABEL=‘…’ MEANLABEL=‘…’
    ERRORBAR=BARS CLM=95 NOFRAME;
    /* CLM – confidence level */
    RUN;

  6. #6
    Points: 34, Level: 1
    Level completed: 68%, Points required for next Level: 16

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: 2 samples t-test: normal or not?

    Quote Originally Posted by Englund View Post
    Both samples need to be normal in order for the small sample parametric hypothesis tests to be valid. If, on the other hand, the samples are dependent, then it is the values (residuals) given after taking the difference between the variables for each observation that is of interest. If you have a "large" sample, then you can rely on asymptotic results and forthgo with parametric hypothesis testing. I hope this clarifies things a little bit.
    Thanks. The samples are independent. I'm assuming the samples are not normal and the variances are unequal (you can look at the results attached in this reply -- new example I'm using -- and check if I'm right) and proceeding to a non-parametric test (Wilcoxon).
    Attached Files

  7. #7
    Points: 34, Level: 1
    Level completed: 68%, Points required for next Level: 16

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: 2 samples t-test: normal or not?

    But my question is (from attached file results2.doc in the previous reply post): are the samples normal or not?

  8. #8
    TS Contributor
    Points: 12,501, Level: 73
    Level completed: 13%, Points required for next Level: 349

    Posts
    951
    Thanks
    0
    Thanked 103 Times in 100 Posts

    Re: 2 samples t-test: normal or not?

    Quote Originally Posted by deltango View Post
    Looking at the histograms they seem normal, looking at the plots one of the samples is normal, but looking at the p-values none of them are normal.
    The histogram for FE college looks good enough, but the histogram for Sixth Form College does not look normal.
    Your sample of Sixth Form College is not normal.
    As for FE college, I would say it looks good enough for a t-test but you may want to explore some more, as the Q-Q plot indicates the tails are off.
    All things are known because we want to believe in them.

  9. #9
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: 2 samples t-test: normal or not?

    Normality primarily influences the statistical tests (the p values). But if you have very non-normal data, and especially outliers in the tails, you should ask why this is occuring. Commonly that, like outlier analysis, tells you something important about your data you would miss otherwise.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  10. #10
    Points: 34, Level: 1
    Level completed: 68%, Points required for next Level: 16

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: 2 samples t-test: normal or not?

    Quote Originally Posted by Mean Joe View Post
    The histogram for FE college looks good enough, but the histogram for Sixth Form College does not look normal.
    Your sample of Sixth Form College is not normal.
    As for FE college, I would say it looks good enough for a t-test but you may want to explore some more, as the Q-Q plot indicates the tails are off.
    Yes, thank you. Everyone is giving me great insights, thank you all!
    So, having one non normal sample, it means that I can not compare the two samples' means by a t-test, but through a non-parametric test, right?

  11. #11
    TS Contributor
    Points: 12,501, Level: 73
    Level completed: 13%, Points required for next Level: 349

    Posts
    951
    Thanks
    0
    Thanked 103 Times in 100 Posts

    Re: 2 samples t-test: normal or not?

    Yes, try a non-parametric test. It should confirm that the means are not equal (mean of FE college = 190, mean of Sixth Form college = 555).
    All things are known because we want to believe in them.

  12. #12
    TS Contributor
    Points: 17,749, Level: 84
    Level completed: 80%, Points required for next Level: 101
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,540
    Thanks
    56
    Thanked 640 Times in 602 Posts

    Re: 2 samples t-test: normal or not?

    Unfortunately, there are no non-parametric tests for the equality of
    means. One can only compare rank sums (Wilcoxon) or medians
    (Median test).

    What bothers me with regard to the normality assumption: in case of
    small samples there's the necessary assumption that both samples are
    drawn from normally distributed populations. The problems with tests
    of significance regarding normality are quite known (e.g. lack of power
    in the small samples case; or lack of indication whether a "significant"
    violation of the assumption is serious), but why is it recommended
    to use graphical methods? A Q-Q plot can show us whether there is a
    marked deviation from normality in the sample, but how do we
    know whether this deviation indicates non-normality in the population?
    Any references or explanations for this?

    With kind regards

    K.

  13. #13
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: 2 samples t-test: normal or not?

    It makes sense that there is no means test for non-parametrics which don't assume interval data.

    I have never seen the issue of population normality addressed. In practice it is impossible to ever know what the population distribution is so I am not sure what value it would be to know this. More generally all the assumptions of statistics pertain, as far as I know, to the sample and are not to a population. I assume that if the sample meets the assumptions of a given method than you can use it for analysis regardless of the population. You are conducting analysis on sample not the population even if ultimately you hope the results pertain to the population. So why would it matter if the assumptions pertain to the population for analysis?

    But it is interesting that I have never seen population assumptions such as normality addressed. Having said all this it occurs to me that I run regression for example on populations. And test for assumptions like equal error variance or multicolinearity. Is that not required when you have the population?
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  14. #14
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: 2 samples t-test: normal or not?

    Quote Originally Posted by noetsi View Post
    I assume that if the sample meets the assumptions of a given method than you can use it for analysis regardless of the population. You are conducting analysis on sample not the population even if ultimately you hope the results pertain to the population. So why would it matter if the assumptions pertain to the population for analysis?

    But it is interesting that I have never seen population assumptions such as normality addressed.
    I would like to disagree. It is the population characteristics that is of interest, not the sample characteristics. We only use sample data because that is usually the only thing we have at hand. In the classical linear regression model, for example, we only care whether \varepsilon_j \forall j, j=1,2,...,N is normal. We only use the estimated residuals to evaluate whether \varepsilon \sim N seems plausible.

    If the sample data itself would be of interest, why would we then perform hypothesis tests? Wouldn't it be enough to just evaluate whether all the sample moments exactly match the moments of a normally distributed variable then?

  15. #15
    TS Contributor
    Points: 17,749, Level: 84
    Level completed: 80%, Points required for next Level: 101
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,540
    Thanks
    56
    Thanked 640 Times in 602 Posts

    Re: 2 samples t-test: normal or not?


    I have never seen the issue of population normality addressed.
    But we deal with the idea of statistically testing normality very often here.
    And testing is about making statements about the population.
    In practice it is impossible to ever know what the population distribution is so I am not sure what value it would be to know this.
    What we really or ultimately are interested in, are the sampling distributions of the
    test statistics, in order to perform the statistical tests. So, usually we not only are
    uninterested in the data distribution within the sample, but we even are uninterested
    in normality of the data distribution within the population. But AFAIK in case of a small
    sample we need the assumption of normality in the population from which the sample
    was drawn, in order to make correct statements about the sampling distribution of the
    calculated test statistic. With larger samples, the central limit theorem applies.

    Now, because the distribution within the sample is not of concern, or only to the
    degree it can be used to infer statements about the distribution in the population,
    my question was, how or why graphical methods, based just on sample data,
    can be used to make such inferences.

    With kind regards

    K.

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats