AbnormallyDistributed (09-04-2016), GretaGarbo (08-30-2016)
AbnormallyDistributed (09-04-2016), GretaGarbo (08-30-2016)
'Notches can be added to the boxes. These are defined as +/-1.58*IQR/sqrt(n) which gives roughly 95% confidence that two medians are different.' (pasted from http://boxplot.tyerslab.com)
I think the answer lies in that 'roughly'.
http://cainarchaeology.weebly.com/
AbnormallyDistributed (09-04-2016)
If the hospital admission length increases gradually with age, then I think that it would be more powerful to use a linear regression type of model.
AbnormallyDistributed (09-04-2016)
It seems like we have lost the original poster. I had some comments about the dependent variable, but ....anyway....
AbnormallyDistributed (09-04-2016)
Hi,
Thanks everyone. Apologies, I was unwell.
AD.
With duration data like "hospital admission lengths" it is common to use the exponential distribution, the gamma distribution, the Weibull distribution or the lognormal distribution.
For duration data one often use survival analysis.
If you really want the median you can use the lognormal distribution. In the normal distribution the (population) mean and the (population) median is the same. So the mean is also an estimate of the median. If x_50 is an estimate of the median in the normal distribution then exp(x_50) will be an estimate of the median in the original scale.
Yet another view about the issue under discussion.
One of the main problem stressed in the very first post was that the 2 distributions are not normally distributed; therefore, the OP was considering using the medians as measures of centeal tendency, and comparing them.
I gave another read to 2 publications in which permutation t-test is actually used in a context which seems very similar to the one described by the OP. In summary, while t-test should be avoided when distributions are strongly skewed, and when there is a strong unbalance in sample size between the two groups, permutation t-test may be put to work instead.
The two publications (cited below; only first author is cited) use the same dataset to make that point:
-Moore, Introduction to the Practice of Statistics (LINK)
-Chihiara, Mathematical Statistics with Resampling and R (LINK)
I happened to find in R the dataset they both use to illustrate the advantage of permuted t-test. It is contained in the package "resample"; the dataset name is "Verizon"."The permutation test is useful even if we plan to use the two-sample t test. Rather than relying on Normal quantile plots of the two samples and the central limit theorem, we can directly check the Normality of the sampling distribution by looking at the permutation distribution. Permutation tests provide a “gold standard” for assessing two-sample t tests. If the two P-values differ considerably, it usually indicates that the conditions for the two-sample t don’t hold for these data. Because permutation tests give accurate P-values even when the sampling distribution is skewed, they are often used when accuracy is very important."
from Moore, Introduction to the Practice of Statistics
In the dataset, there is a strong unbalance of sample size, and the observations in the two groups are far from normal.
The t-test performed in R indicates that there is no significant difference at 0.05 level:
Unlike regular t-test, the p value obtained by permutation t-test (using 5000 permutaitons) is 0.016.Code:t = 1.9834, df = 22.346, p-value = 0.05975 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.3618588 16.5568985 sample estimates: mean of x mean of y 16.509130 8.411611
As pointed by Moore,
The density plot of the permuted distribution of mean differences is attached (it is the output of a function I am currently working on).the strong skewness of the permuted distribution of mean differences implies that t tests will be inaccurate (Moore's words in bold).
In summary:
if the non-normality and skeweness of the data was the prime reason why the OP decided to use the median, then on the basis of the above literature and example (s)he may want to keep with t-test. But, (s)he may want to use permutation t test since this seems to outperfom the "regular" t-test when its assumption(s) are not met. (S)he could also compare the two tests, examine the permuted distribution of the mean differences, and decide to opt for the permuted version if the permuted distribution (being skewed) points to the regular t-test being inaccurate (sensu Moore above).
Hope this helps.
Gm
http://cainarchaeology.weebly.com/
Tweet |