assuming a variable is/variables are normally distributed

#1
Ive read and also been told on here that if the sample size is > 30 I can assume the variable is normally distributed. I'm wondering if 30 isn't a bit small for that, though, plus have another question.

Say I wanted to determine what the perfect weight for a basketball player of a certain height is (e.g. 6'3).

Say I had a big enough sample size for this (obviously I dont and thus couldn't pinpoint it to an exact height, but might have to run a regression using height as a factor or something similar).

Could I simply look at tons of players of that height and assume all other factors (such as speed, length of arms relative to body height, hand-eye-coordination, ...) are normally distributed and thus (if all variables were statistically independent from weight) I would be able to only look at players 6'3 and judge at what weight they play most efficiently (by using something like a basketball efficiency score)? and thus determine what weight is best for this player?

I know this wouldnt be possible practically as quickness for example isnt statistically independent from body weight, etc. but technically would it be possible that way - if all variables that have to be considered were stat. independent from weight and I had a sample size of bigger than 30?

I realized my thread digressed a bit from the original title..and I have 2 questions:

a) would such an approach be possible (if all the assumptions I mentioned above were true)

b) would looking at 31 players really be enough? (Im really wondering if the assumption that a variable is normally distributed if there are more than 30 cases is a bit "optimistic" and thus a bad proxy)

I guess there's a reason why 30 is said to be the critical value when it comes to assuming normal distribution. Could somebody please try to explain this to me? Thanks!
 

JohnM

TS Contributor
#2
"Ive read and also been told on here that if the sample size is > 30 I can assume the variable is normally distributed. I'm wondering if 30 isn't a bit small for that, though, plus have another question."

Actually, this isn't true.

If the sample size is > 30 you can reasonably sure that the means of samples will be normally distributed, not the distribution of individual data points.

What happens is if a population is normal, then the distribution of means of samples will be normal as well. If a population is not normal, then the distribution of means of samples will approach a normal distribution as the sample size increases. Once it hits aropund n=30, it will be "close enough" to normal to begin assuming normality.
 

Xenu

New Member
#3
I guess there's a reason why 30 is said to be the critical value when it comes to assuming normal distribution. Could somebody please try to explain this to me? Thanks!

Actually, there's really no valid reason to view 30 as some sort of magic number. How many observation you need for the sample mean to be approximately normally distributed depends on the data. That said, for most 'everyday' situation, 30 observation is a reasonable rule of thumb and it will probably work well for making estimates about the mean heights of a basketball player for example.
 
Last edited: