Why 30 as sample size?

#1
I've read that 30 is a good rule of thumb in most cases for an adequate sample size. Most places say that this is just an agreed upon approximation of an infinite normal distribution.

Can anyone explain well why 30 is this agreed-upon number?

Thanks very much...
 

Miner

TS Contributor
#2
An adequate sample size for what?

It depends very much on what you are trying to accomplish. For a hypothesis test (2-sample t-test) to detect a delta 3 x the std dev, a sample size of 5 might be sufficient, while to detect a delta of 0.1 x std dev might require over 100. If you only want to assess the population mean, 30 might be sufficient depending on how tight a confidence interval is required.
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
There is a general rule some people use for normality assumption, which is a sample size of 30. As alluded to, this may not be related to power, just possibly normality of residuals. Unsure but would assume it came about via simulations.
 

Dason

Ambassador to the humans
#5
There is a general rule some people use for normality assumption, which is a sample size of 30. As alluded to, this may not be related to power, just possibly normality of residuals. Unsure but would assume it came about via simulations.
It doesn't have to do with the normality of the residuals. Taking a larger and larger sample size won't change the distribution of the residuals. But that's alright because when we talk about using a sample size of at least 30 what we're trying to justify is that the sampling distribution of the sample mean is approximately normal. Now the closer the error terms are to normality the smaller a sample you need to get approximate normality of the sampling distribution but for most reasonable variables a sample size of about 30 is good enough to achieve approximate normality.
 

bugman

Super Moderator
#6
Hi Dason,

welcome to talkstats :welcome:

Please only post once in a thread. Double posting is venimous raptor behaviour and will not be tolerated.
:)
 
Last edited:

Mean Joe

TS Contributor
#7
From Googling, it seems to be that the argument focuses on looking at the pdf (probability density function) of the t-distribution and the normal distribution. Certainly, you can use limits to show that the t pdf tends toward the normal pdf. But why n=30?

This page says "at values of ν [the sample size] as small as 10 or 12, the graphs of [t pdf] are nearly indistinguishable from graphs of the standard normal probability density function, and by the time ν is as large as 29 or 30, results using the t-distribution agree with results from the standard normal distribution to within a percentage point or two, and so statisticians tend to use the standard normal probability tables in place of t-tables whenever the value of ν is larger than 29 or 30."

So they use a benchmark of "a percentage point or two" difference in pdf?

At n=30, the difference in CDF between the t and normal is 0.005 (as seen here).

So n=30 is good. Are you wondering why not n=10?
 

noetsi

No cake for spunky
#8
30 is I believe the point at which, because of the central limit theorem, the sampling distribution would be normal even though the population is not.

I always thought this was determined empirically as compared to theoretically.

Dason posted a joke about this topic once, a professor telling a student that at about 30 samples the data was equivalent to infinity :p The problem I have always had with this issue, is that it suggests you have 30 seperate samples when in fact you almost never will do this. You will have so many cases. As trinker noted the true issue is not normality, commonly, but statistical power although they seem different issues to me.