Distribution of a subset of the complete sample

ampws

New Member
Dear all,

My question can be easily misunderstood, so please allow me to briefly explain the experimental conditions.

I have measured a variable from 120 individuals in total, 60 of one genotype and 60 of another. The distribution of the complete sample of each of the two genotypes is non-normal. Thus, to compare their means I have either used non-parametric tests or transformed the data to have normal distribution in each genotype and then used parametric tests. However, in 20 of the 120 individuals (10 of one genotype and 10 of the other), after I measured the initial response, I added a drug and measured again. To compare their means before and after the addition of the drug I should follow some related measurements test, like repeated measurements ANOVA, using only the values of the 20 individuals.

My question is, should I check again for normality in those 20 values, or assume that they should have non-normal distribution like the complete sample?

If these 20 values have a normal distribution, is it valid to use parametric tests only for these 20 individuals, despite the fact that the larger sample of 120 is clearly non-normal?

Thank you very much in advance.

terzi

TS Contributor
Hi ampws,

If I understood correctly, you have a single variable measured for 120 individuals, separated in two groups. You then tested whether there was difference in those two groups. Now, for only 20 individuals you added a drug and measured again and want to test for differences between the initial state and the "after". I wouldn't rely on parametric tests that are based on normality assumptions for this case, first because of the sample size and also because of the non-normality found in the whole dataset. A safer approach would demand some non-parametric test for paired data, equivalent to a paired t-test, such as Wilcoxon's signed rank test. I'd recommend that over parametric tests, even if the subsample is approximately normally distributed.

Good luck

ampws

New Member
Thank you very much terzi for your answer. I am most grateful. One thing I still need to clarify though:

I wouldn't rely on parametric tests that are based on normality assumptions for this case, first because of the sample size and also because of the non-normality found in the whole dataset.
Even if the sample size was larger, would you still prefer to use non-parametric tests on the subsample because of the non-normality found in the whole dataset?

Again, thank you very much!

hlsmith

Less is more. Stay pure. Stay poor.
I would say 'yes' that is what they were alluding to, since the source had questionable normality. Wilcoxon's signed rank test should be a good fit for these data.

It was not clear to me, if you randomly gave some individuals the drug or how they were selected?