Check normality assumption of T test

jjx

New Member
#1
Maybe stupid questions, but if someone can clarify it, I will be very thankful.

The question is: if the normality assumption behind T test is „the sample mean is normal“, why can one decide whether this assumption is satisfied by checking the „normality of data“ at all?

sample size > 30 : according to central limit theorem „sample mean is normal“ independent of data distribution

sample size < 30 ? : normality test says data is normal => sample mean is normal??
sample size < 30 ?: normality test says data is not normal => sample mean is not normal??

(Normality test itself has low power when dealing with small sample size?)
 

jjx

New Member
#2
PS: I am currently dealing with paired sample with sample size 15, and according to normality test non-normal data, but don’t have strong reason to think the population should not be normal....

Maybe it is just safer to use non-parametric alternative, but the above question is just disturbing me

I will really appreciate it if someone can help
 

fed2

Active Member
#3
its not so much about the normality of the data, so much as the 'speed of convergence' to normality of the mean. the more normal the data the faster the convergence ref the central limit theorem.
 

Dason

Ambassador to the humans
#4
So it's not actually true that the assumption is that the sample mean is normally distributed. The assumption really is that the conditional distribution of all the data is normally distributed. In practice this assumption isn't too important especially with a large enough sample size but the assumption is what it is.
 

jjx

New Member
#6
Hi fed2, thank you for your reply.

the more normal the data the faster the convergence ref the central limit theorem.
I am not sure whether I understood what you mean by “faster”, have normally distributed sample mean with “smaller” sample size?

Maybe I should rephrase my question: so in practice, particularly for small sample size (e.g. 15) how should one check the normality assumption? Based on Normality Test of Sampled Data, or based on nature of measured variable (e.g. consider population of body weight and height be normal). Why?

Thank you so much!
 

jjx

New Member
#7
Hi Dason, thank you.

(“The conditional distribution of all the data”: maybe I first need to understand what is exactly meant.. )

In practice this assumption isn't too important especially with a large enough sample size but the assumption is what it is.
I am currently dealing with small sample (size 15, because hard to collect). How should I check the normality assumption? Based on Normality Test of Sampled Data, or based on nature of measured variable (e.g. consider population of body weight and height be normal). Why?

Thanks a lot!
 

obh

Well-Known Member
#8
Hi jjx

A sample size of 15 may be too small to check for normality, the test power may be too weak.
A test for normality may only reject the normality of the data deviation from the normality is very strong.

So if you have a good reason to assume normality from other knowledge/researchers you should assume normality.
Otherize you may not rely on the normality.

For example:
F(8,8) is definitely no normal, and the CLT effect with only 15 is usually not strong enough.

But the SW test doesn't reject it when using a significance level of 0.05

Code:
> shapiro.test(rf(15,8,8))

    Shapiro-Wilk normality test

data:  rf(15, 8, 8)
W = 0.89254, p-value = 0.07327
1614638576545.png
 

jjx

New Member
#10
seriously though you should just run t-test unless the data is obscenely non-normal.
Thank you fed2.
In my case, it is paired sample of 15 pairs. If I test the normality of the differences between pairs, the histogram and qqplot do not look very normal, but various numerical normality can not reject the normality (can also due to the small sample size).
 

jjx

New Member
#11
So if you have a good reason to assume normality from other knowledge/researchers you should assume normality.
Otherize you may not relay on the normality.
Thank you obh, I have a question on this point (maybe due to my misunderstanding of the normality assumption).
If the assumption is the normality of population that the data is collected from, maybe another study with normal distribution of large sample should suggest the normality assumption being meat in this small sample size case. But the normality assumption is not about the normality of the population, but the sample mean of sample size n, right? Then what other knowledge can suggest the normality assumption is meat in this case of small sample size?
Thank you! (Forgive me, without systematic statistic knowledge, I just find information from different sources so contradictory.)
 
Last edited:

jjx

New Member
#12
Thank you all for your help!

I came across this paper and it cleared all my confusion! Maybe helpful for someone who has similar problems.

https://doi.org/10.1152/advan.00064.2017

Also I learned that I can use bootstrapping to estimate the theoretical distribution of some sample statistics e.g. sample mean.
 
Last edited:

obh

Well-Known Member
#14
Thank you obh, I have a question on this point (maybe due to my misunderstanding of the normality assumption).
If the assumption is the normality of population that the data is collected from, maybe another study with normal distribution of large sample should suggest the normality assumption being meat in this small sample size case. But the normality assumption is not about the normality of the population, but the sample mean of sample size n, right? Then what other knowledge can suggest the normality assumption is meat in this case of small sample size?
Thank you! (Forgive me, without systematic statistic knowledge, I just find information from different sources so contradictory.)
Hi jjx,

The normality assumption is for the differences, but if both groups' distribution is normal then the differences' distribution will also be normal.

Anyway, the t-test is robust to non-normality. (since it runs over the average, CLT...)
If the data is reasonably symmetrical then the average may be okay even for a sample size of 15.

For example, even with a sample size of 10, the average of the uniform distribution distributes normally.

Other tests may be more sensitive to the normality, for example, the F test is running over the ratio of the variances.