Re: Multiple samples of different size and not normally distributed: how to compare/s
Okey, now we have sorted out that you have a number of independent variables= experimental factors A, B, C (and maybe even more factors)
You also have a response variable D. That one measures “the quality or other characteristics”. But let us call that one Y1. Suppose that is “speed of program execution in seconds”. But you might have other attributes like Y2: “easiness to understand the user interface from the consummer”, and Y3: “easiness to input data in the software” and more dependent variables.
Then you have several dependent variables Y1, Y2, Y3 and three experimental factors A, B, and C.
But you have made many experiments and all should not be evaluated together.
Suppose that you had done some bacterial growth experiment (Y1) on sausage
and experimental factors A (temperature), B(saltiness) and C(pH). Then you can run that and just evaluate the result. But suppose that you also have done experiments on bacterial growth Y1 on apple juice with factors A, B, C.
Then you can not join these two experiment because they are two different biological systems although they have the same Y1 and A, B, C. They have to be evaluated separately.
In you case you need to separate out different “groups” of experiments into “sausage” and “apple” groups.
Then you run each on in analysis of variance (anova) with A, B, C as independent factors and each one Y1 at a time. Then run another anova with Y2 and so on.
When you are done with the “sausage” start with the “apples”.
The A, B, C don’t need to normally distributed, as have been said on this site a million times. (Search for residuals, normal )
It is the dependent variable Y1 that, conditionally on A, B, C, needs to be normally distributed or with some other known distribution. (Another way to say this is that the residuals should be normal.)
Ignore Duncans test! It is invalid anyway! It is from the 1950ies when they had no clear idea of what was meant by multiple inference. Tukeys hsd is good but I think you should ignore this multiple inference altogether. It just confuses you. Use standard significance test from the anova. If the p-value is less than 0.05 then it is statistically significant.
[Multiple inference is used in different degree in different sectors. In advanced epidemiology it is not used at all. So by ignoring that you are in good company.]
You should look if Y1 and Y2 are approximately normally distributed. If not do a transformation, like take log(Y1) or sqrt(Y2) (square root).
Sorry for writing so long.