coming20 (10-14-2011)
Hello, all. I'm far from a statistics expert, so excuse me for any inaccuracy that I might say.
I'm trying to compare two sample distributions which I know they re not following the Normal distribution (How do I know? Well I performed several normality tests -in fact i ve lost count- such as kolmogorov-smirnov lilliefors, Shapiro-Wilk test, Anderson-Darling test (this one said that only my first was normal), in matlab files i found on the web), in terms of their mean. To the best of my understanding the two populations are not paired. I dont want just an equality hypothesis that I want to reject over an unequality (As the Mann-Whitney U test does). I want to reject the null hypothesis over an alternative hypothesis giving an inequality (e.g μ1 < μ2), cause I really want to prove that the 1st distribution has a lower mean (or even meadian) than the 2nd one.
In MATLAB I ve seen that a two sample ttest does this, using a tail test defining the alternate hypothesis. However, both of its forms assume that the two distributions follow the Normal Distribution (one also assumes equal standard deviation)
So my question is: Is there such a test? Since I dont know what distribution my sample distributions follow, I think I must do a non-parametric test. However there doesn't seem to be one out there. I 've been advised to look for permutation tests which would check how many times a sample value would be lower than a mean or somthing like that, but as I said, these sounds too difficult to me, unless I'm explicitly guided.
To be even more elaborate, my random variable concerns distance D of some specific nodes of a graph to their nearest border (in edge hops). [D takes values from 0 to 1 since is normalized by radius]. I have split the graphs in two sets, concerning a result of an algorithm over a set of graphs, lets say 'positive' and 'negative'. I expected that the positive set would have most of its nodes near the graph border, i.e, more D values would be smaller comparing to the 'negative' set. A histograph visually verifies this. I didnt know any test to compare such a characteristic (although I think I ve read something similar somewhere). Then I also took the average of these distances D from each graph and made two new positive and negative sets of the average D distances.
I noticed from their histograms that these two distributions looked like Normal. Later I found out that based on the cental limit theorem (or something like that) the averages of sample observations following any distribution, follow a normal distribution for a large sample size. (Nevertheless, my average D distributions failed to pass normality tests.) I also noticed that these bell-like shapes were centered at different values, therefore I wanted to statistically assert this, and utterly, statistically verify that the positive distribution of average D, has a lower mean than the 'negative' one.
And to sum up, somewhere in wikipedia there's a list of tests where it says:
Name: Two-sample unpooled t-test
Assumptions: (Normal populations or n1 + n2 > 40) and independent observations and σ1 ≠ σ2 and (σ1 and σ2 unknown)
Now, if its true that the normality assumption can be relaxed for a large sample size, i think I'm done. Since I have already performed my two-sided tail t-test and it verified my speculations, since the null hypothesis μ1=μ2 was rejected only for alternate hypothesis concerning μ1 < μ2 and μ1 != μ2, but could NOT be rejected over the alternate hypothesis that μ1 > μ2, for both D and average D distributions! I've also met a comment somewhere that the normality assumption is written in books due to the very small samples examples always used (And to which sizes, ttest is supposed to apply to, i think)
So
1)any ideas about my original distributions and an implemented statistic proving that pos has more lower values than negative and vice-versa?
2)Any Non-parametric test comparing the mean for unknown sample distributions (therefore unknown variance etc), as the tail test of ttest does?
3)Is indeed the normality assumption relaxed in a case of a large sample? Does this weakens the test's accuracy? (note that my D distributions of positive and negative graphs have sizes 209826 and 11588 respectively and my average D distributions have 14958 and 1070 respectively)
Oh! and by the way I know there are transformations that might make my data follow normal (I guess splitting a stairs-like distribution, such as my 'positive', to two opposite stairs might do the job, but I am completely inexperienced and there doesnt seem to exist such an easy answer for 'negative' D distribution . And I also really dont have time to experiment right now)
Thank you all in advance!
P.S. Here are the two D distributions (for 'pos' and 'neg' graphs) as well as the corresponding average D distributions
D of pos
D of neg
average D of pos
average D of neg
![]()
Last edited by moudatsos; 09-29-2008 at 01:01 PM.
coming20 (10-14-2011)
I have the same problem and desperately needed an solution.
Experts, please help~
Not clear what you mean by "same problem". Perhaps you describe your problem/study (e.g. topic, objective,
study design, sample size, measurements taken).
Regards
K.
Permutation tests I think are actually quite a bit easier to understand conceptually than our typical parametric framework. I like the approach but there are a different set of problems that go along with trying to take that approach but one of the nice things about a permutation test is that it is (almost) always a valid type of test.
|
|