# Thread: Comparing two groups - best test to use?

1. ## Re: Comparing two groups - best test to use?

P.S.
Originally Posted by gianmarco
As for the median test:
"(...) We suggest that the median test be "retired" from routine use and recommend alternative rank tests that have superior power over a relatively large family of symmetric distributions."
But the distributions in the present study are not symmetric but skewed?

2. ## The Following 2 Users Say Thank You to Karabiner For This Useful Post:

AbnormallyDistributed (09-04-2016), GretaGarbo (08-30-2016)

3. ## Re: Comparing two groups - best test to use?

Originally Posted by AbnormallyDistributed
What are the possible explanations for the differences in results?
'Notches can be added to the boxes. These are defined as +/-1.58*IQR/sqrt(n) which gives roughly 95% confidence that two medians are different.' (pasted from http://boxplot.tyerslab.com)

I think the answer lies in that 'roughly'.

4. ## The Following User Says Thank You to gianmarco For This Useful Post:

AbnormallyDistributed (09-04-2016)

5. ## Re: Comparing two groups - best test to use?

If the hospital admission length increases gradually with age, then I think that it would be more powerful to use a linear regression type of model.

6. ## The Following User Says Thank You to GretaGarbo For This Useful Post:

AbnormallyDistributed (09-04-2016)

7. ## Re: Comparing two groups - best test to use?

It seems like we have lost the original poster. I had some comments about the dependent variable, but ....anyway....

8. ## The Following User Says Thank You to GretaGarbo For This Useful Post:

AbnormallyDistributed (09-04-2016)

9. ## Re: Comparing two groups - best test to use?

Hi,

Thanks everyone. Apologies, I was unwell.

10. ## Re: Comparing two groups - best test to use?

With duration data like "hospital admission lengths" it is common to use the exponential distribution, the gamma distribution, the Weibull distribution or the lognormal distribution.

For duration data one often use survival analysis.

If you really want the median you can use the lognormal distribution. In the normal distribution the (population) mean and the (population) median is the same. So the mean is also an estimate of the median. If x_50 is an estimate of the median in the normal distribution then exp(x_50) will be an estimate of the median in the original scale.

11. ## Re: Comparing two groups - best test to use?

Yet another view about the issue under discussion.
One of the main problem stressed in the very first post was that the 2 distributions are not normally distributed; therefore, the OP was considering using the medians as measures of centeal tendency, and comparing them.

I gave another read to 2 publications in which permutation t-test is actually used in a context which seems very similar to the one described by the OP. In summary, while t-test should be avoided when distributions are strongly skewed, and when there is a strong unbalance in sample size between the two groups, permutation t-test may be put to work instead.

The two publications (cited below; only first author is cited) use the same dataset to make that point:
-Moore, Introduction to the Practice of Statistics (LINK)
-Chihiara, Mathematical Statistics with Resampling and R (LINK)

"The permutation test is useful even if we plan to use the two-sample t test. Rather than relying on Normal quantile plots of the two samples and the central limit theorem, we can directly check the Normality of the sampling distribution by looking at the permutation distribution. Permutation tests provide a “gold standard” for assessing two-sample t tests. If the two P-values differ considerably, it usually indicates that the conditions for the two-sample t don’t hold for these data. Because permutation tests give accurate P-values even when the sampling distribution is skewed, they are often used when accuracy is very important."
from Moore, Introduction to the Practice of Statistics
I happened to find in R the dataset they both use to illustrate the advantage of permuted t-test. It is contained in the package "resample"; the dataset name is "Verizon".
In the dataset, there is a strong unbalance of sample size, and the observations in the two groups are far from normal.

The t-test performed in R indicates that there is no significant difference at 0.05 level:
Code:
``````t = 1.9834, df = 22.346, p-value = 0.05975
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3618588 16.5568985
sample estimates:
mean of x mean of y
16.509130  8.411611``````
Unlike regular t-test, the p value obtained by permutation t-test (using 5000 permutaitons) is 0.016.
As pointed by Moore,
the strong skewness of the permuted distribution of mean differences implies that t tests will be inaccurate (Moore's words in bold).
The density plot of the permuted distribution of mean differences is attached (it is the output of a function I am currently working on).

In summary:
if the non-normality and skeweness of the data was the prime reason why the OP decided to use the median, then on the basis of the above literature and example (s)he may want to keep with t-test. But, (s)he may want to use permutation t test since this seems to outperfom the "regular" t-test when its assumption(s) are not met. (S)he could also compare the two tests, examine the permuted distribution of the mean differences, and decide to opt for the permuted version if the permuted distribution (being skewed) points to the regular t-test being inaccurate (sensu Moore above).

Hope this helps.
Gm