# population normally distributed, sample not, parametric test allowesd?

#### BBD

##### New Member
population normally distributed, sample not, parametric test allowed?

Hi guys,

Imagine I know that my variable of interest (say blood glucose) is normally distributed in the population. But I have a small sample and after testing it I find that in the sample blood glucose is not normally distributed. Am I still allowed to do a parametric test (which requires a normal distribution of the sample data)?

I found a couple of things:
''Whatever the population looks like --normal, skewed, bimodal, whatever--a sample of individuals will display the same characteristics.'' This means that I know that my sample is SUPPOSED to be normally distributed if the population is normally distributed, but testing for a normal distribution says otherwise.
http://www.jerrydallal.com/LHSP/meandist.htm

''"If the population data are normally distributed, then we say that the
random variable Y_i follows a normal distribution. In this case the
estimator Ybar also follows a normal distribution."
-- Principles of econometrics, Carter Hill et al. third edition, p.507

Q1) Am I allowed to do a parametric test based on the fact that the variable is SUPPOSED to be normally distributed in my sample, or am I not allowed to do a parametric test based on the fact that testing points out that my sample is not normally distributed?

Q2) Now how do I know if the population is normally distributed? Imagine there is a large epidemiological study done with a large n and they found their sample is normally distributed. Can I conclude based on their findings that the population is likely normally distributed?

If you need me to eleborate on anything, please let me know. Thanks in advance.

Last edited:

#### krytellan

##### New Member
First, whether or not you can do a parametric test depends on which test you are wanting to do and what form of non-normality your data takes. Some tests handle some forms of non-normality better.

Secondly, and I am not speaking from book knowledge here, but trying to apply logic. It would seem that if a population is seen to be normal (which, FYI, hardly ever exists unless it is based on a standard score) and the sample which you have is not normal, that you indeed have a sample that differs from the population. I would think that is absolutely possible. Just because a coinflip is 50% doesn't mean that in 20 flips you will have 10 heads and 10 tails. In a million you might have a 50/50 split though.

#### BBD

##### New Member

Just because a coinflip is 50% doesn't mean that in 20 flips you will have 10 heads and 10 tails. In a million you might have a 50/50 split though.
Yes, I understand that in your sample the data an sich might not be normally distributed. However, the question basicly is: does that matter when you know the population is nomally distributed, for the assumption of a parametric test that the sample needs to be normally distributed?

I'll elaborate a little bit more to clear up what I exactly need to know:

Imagine I want to do a pilot study. Based on a low sample size, it is likely that a test for normal distribution finds that my sample is not normally distributed. Now imagine that I know that the population from which the sample is taken (say the elderly defined as 65+ years) is normally distributed.

''"If the population data are normally distributed, then we say that the
random variable Y_i follows a normal distribution. In this case the
estimator Ybar also follows a normal distribution."
-- Principles of econometrics, Carter Hill et al. third edition, p.507

we know that blood glucose in the sample SHOULD follow a normal distribution. So just because the test calculates that it is unlikely to be part of a normal distribution, we KNOW it is part of a normal distribution (right?).

Thus based on the data in the sample an sich (as calculated by a test for normality), it is unlikely that the sample is part of a normal distribution, but we already know it is part of a normal distribution, because it's part of population that is normally distributed. Therefore what the test says doesn't really matter, because the test tests how likely it is the sample is normally distributed, but we alread know that it is.

Does that makes sense, or am I totally missing something?

Because I have a follow up question. How do you know that the population is normally distributed? What if I have a study on the same poulation with a much larger sample size, and in their large sample size they found that the sample was normally distributed. Based on that, can I conclude that it is likely that the population is distributed?

#### Mr5

##### New Member
I agree with the previous two comments. A number of parametric tests are robust enough to handle it. I think that if you're worried about it, you simple state the skewedness (or whatever) in your reporting.

Disclaimer: I'm a novice.

#### Dason

What exactly are you trying to do that requires 'the sample to be normally distributed'? I can't think of anything off the top of my head that requires that specific requirement.

#### CowboyBear

##### Super Moderator
Hmm. Isn't it the sampling distribution of the particular statistic we're interested in that really matters, rather than the distribution of sample data?

Presumably then the sampling distribution depends on the distribution of datapoints in the population rather than in the sample...

#### Dason

Hmm. Isn't it the sampling distribution of the particular statistic we're interested in that really matters, rather than the distribution of sample data?

Presumably then the sampling distribution depends on the distribution of datapoints in the population rather than in the sample...
That's exactly why I was interested in what they were doing because I was fairly certain that it was the sampling distribution they really were interested in.

#### DancerTiffy

##### New Member
The way I see it is that the Central Limit Theorem states that a random variable can be sampled from any population, normally or distributed or not, and then the means of this random variable will be normally distributed.
If sample sizes are >30 then you can use normal distribution Z tables; otherwise use the T-distribution when the std. dev. is not known.
In short, the population does not have to be normally distributed is order to use sample statistics.