+ Reply to Thread
Results 1 to 5 of 5

Thread: Distribution of a subset of the complete sample

  1. #1
    Points: 85, Level: 1
    Level completed: 70%, Points required for next Level: 15

    Posts
    6
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Distribution of a subset of the complete sample




    Dear all,

    My question can be easily misunderstood, so please allow me to briefly explain the experimental conditions.

    I have measured a variable from 120 individuals in total, 60 of one genotype and 60 of another. The distribution of the complete sample of each of the two genotypes is non-normal. Thus, to compare their means I have either used non-parametric tests or transformed the data to have normal distribution in each genotype and then used parametric tests. However, in 20 of the 120 individuals (10 of one genotype and 10 of the other), after I measured the initial response, I added a drug and measured again. To compare their means before and after the addition of the drug I should follow some related measurements test, like repeated measurements ANOVA, using only the values of the 20 individuals.

    My question is, should I check again for normality in those 20 values, or assume that they should have non-normal distribution like the complete sample?

    If these 20 values have a normal distribution, is it valid to use parametric tests only for these 20 individuals, despite the fact that the larger sample of 120 is clearly non-normal?

    Thank you very much in advance.

  2. #2
    TS Contributor
    Points: 6,789, Level: 54
    Level completed: 20%, Points required for next Level: 161
    terzi's Avatar
    Location
    Mexico City, Mexico
    Posts
    420
    Thanks
    10
    Thanked 34 Times in 33 Posts

    Re: Distribution of a subset of the complete sample

    Hi ampws,

    If I understood correctly, you have a single variable measured for 120 individuals, separated in two groups. You then tested whether there was difference in those two groups. Now, for only 20 individuals you added a drug and measured again and want to test for differences between the initial state and the "after". I wouldn't rely on parametric tests that are based on normality assumptions for this case, first because of the sample size and also because of the non-normality found in the whole dataset. A safer approach would demand some non-parametric test for paired data, equivalent to a paired t-test, such as Wilcoxon's signed rank test. I'd recommend that over parametric tests, even if the subsample is approximately normally distributed.

    Good luck
    Statisticians are engaged in an exhausting but exhilarating struggle with the biggest challenge that philosophy makes to science: how do we translate information into knowledge

  3. The Following User Says Thank You to terzi For This Useful Post:

    ampws (05-23-2013)

  4. #3
    Points: 85, Level: 1
    Level completed: 70%, Points required for next Level: 15

    Posts
    6
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Distribution of a subset of the complete sample

    Thank you very much terzi for your answer. I am most grateful. One thing I still need to clarify though:

    Quote Originally Posted by terzi View Post

    I wouldn't rely on parametric tests that are based on normality assumptions for this case, first because of the sample size and also because of the non-normality found in the whole dataset.
    Even if the sample size was larger, would you still prefer to use non-parametric tests on the subsample because of the non-normality found in the whole dataset?

    Again, thank you very much!

  5. #4
    Omega Contributor
    Points: 38,374, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Distribution of a subset of the complete sample

    I would say 'yes' that is what they were alluding to, since the source had questionable normality. Wilcoxon's signed rank test should be a good fit for these data.

    It was not clear to me, if you randomly gave some individuals the drug or how they were selected?
    Stop cowardice, ban guns!

  6. The Following User Says Thank You to hlsmith For This Useful Post:

    ampws (05-27-2013)

  7. #5
    Points: 85, Level: 1
    Level completed: 70%, Points required for next Level: 15

    Posts
    6
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Distribution of a subset of the complete sample


    Thank you hlsmith for your reply.

    The individuals that were given the drug were randomly selected.

    Before posting this question I also thought that the larger sample is more trustworthy and defines what happens to random sub-samples. In fact, to my understanding that is the very essence of statistics: the distribution of the whole population should be similar to the distribution of any random sample. Otherwise, we would always require to measure every individual of the whole population. So, the statement above, should always apply. Then, if the distribution of random samples is different from the distribution of the whole population, this would be either because the sample size is too small, or because in fact the whole population is not actually one, but a mixture of at least two different populations. But, in our experimental setup we have no reason to believe that there are two or more populations of individuals. Of course, it would be best to check somehow this, but then it gets too complicated and in a fast screening of all of the values this does not seem to be the case.

    In any case, I think now I am convinced about what I should do. Thank you both very much for your replies.
    Last edited by ampws; 05-27-2013 at 02:31 AM.

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats