+ Reply to Thread
Results 1 to 4 of 4

Thread: Test statistic concerning mean comparison of UNKOWN sample distributions?

  1. #1
    Points: 2,988, Level: 33
    Level completed: 59%, Points required for next Level: 62

    Posts
    4
    Thanks
    0
    Thanked 1 Time in 1 Post

    Test statistic concerning mean comparison of UNKOWN sample distributions?



    Hello, all. I'm far from a statistics expert, so excuse me for any inaccuracy that I might say.

    I'm trying to compare two sample distributions which I know they re not following the Normal distribution (How do I know? Well I performed several normality tests -in fact i ve lost count- such as kolmogorov-smirnov lilliefors, Shapiro-Wilk test, Anderson-Darling test (this one said that only my first was normal), in matlab files i found on the web), in terms of their mean. To the best of my understanding the two populations are not paired. I dont want just an equality hypothesis that I want to reject over an unequality (As the Mann-Whitney U test does). I want to reject the null hypothesis over an alternative hypothesis giving an inequality (e.g μ1 < μ2), cause I really want to prove that the 1st distribution has a lower mean (or even meadian) than the 2nd one.

    In MATLAB I ve seen that a two sample ttest does this, using a tail test defining the alternate hypothesis. However, both of its forms assume that the two distributions follow the Normal Distribution (one also assumes equal standard deviation)

    So my question is: Is there such a test? Since I dont know what distribution my sample distributions follow, I think I must do a non-parametric test. However there doesn't seem to be one out there. I 've been advised to look for permutation tests which would check how many times a sample value would be lower than a mean or somthing like that, but as I said, these sounds too difficult to me, unless I'm explicitly guided.

    To be even more elaborate, my random variable concerns distance D of some specific nodes of a graph to their nearest border (in edge hops). [D takes values from 0 to 1 since is normalized by radius]. I have split the graphs in two sets, concerning a result of an algorithm over a set of graphs, lets say 'positive' and 'negative'. I expected that the positive set would have most of its nodes near the graph border, i.e, more D values would be smaller comparing to the 'negative' set. A histograph visually verifies this. I didnt know any test to compare such a characteristic (although I think I ve read something similar somewhere). Then I also took the average of these distances D from each graph and made two new positive and negative sets of the average D distances.
    I noticed from their histograms that these two distributions looked like Normal. Later I found out that based on the cental limit theorem (or something like that) the averages of sample observations following any distribution, follow a normal distribution for a large sample size. (Nevertheless, my average D distributions failed to pass normality tests.) I also noticed that these bell-like shapes were centered at different values, therefore I wanted to statistically assert this, and utterly, statistically verify that the positive distribution of average D, has a lower mean than the 'negative' one.

    And to sum up, somewhere in wikipedia there's a list of tests where it says:
    Name: Two-sample unpooled t-test
    Assumptions: (Normal populations or n1 + n2 > 40) and independent observations and σ1 ≠ σ2 and (σ1 and σ2 unknown)
    Now, if its true that the normality assumption can be relaxed for a large sample size, i think I'm done. Since I have already performed my two-sided tail t-test and it verified my speculations, since the null hypothesis μ1=μ2 was rejected only for alternate hypothesis concerning μ1 < μ2 and μ1 != μ2, but could NOT be rejected over the alternate hypothesis that μ1 > μ2, for both D and average D distributions! I've also met a comment somewhere that the normality assumption is written in books due to the very small samples examples always used (And to which sizes, ttest is supposed to apply to, i think)

    So
    1)any ideas about my original distributions and an implemented statistic proving that pos has more lower values than negative and vice-versa?
    2)Any Non-parametric test comparing the mean for unknown sample distributions (therefore unknown variance etc), as the tail test of ttest does?
    3)Is indeed the normality assumption relaxed in a case of a large sample? Does this weakens the test's accuracy? (note that my D distributions of positive and negative graphs have sizes 209826 and 11588 respectively and my average D distributions have 14958 and 1070 respectively)

    Oh! and by the way I know there are transformations that might make my data follow normal (I guess splitting a stairs-like distribution, such as my 'positive', to two opposite stairs might do the job, but I am completely inexperienced and there doesnt seem to exist such an easy answer for 'negative' D distribution . And I also really dont have time to experiment right now )

    Thank you all in advance!

    P.S. Here are the two D distributions (for 'pos' and 'neg' graphs) as well as the corresponding average D distributions

    D of pos


    D of neg


    average D of pos


    average D of neg
    Last edited by moudatsos; 09-29-2008 at 01:01 PM.

  2. The Following User Says Thank You to moudatsos For This Useful Post:

    coming20 (10-14-2011)

  3. #2
    Points: 586, Level: 11
    Level completed: 72%, Points required for next Level: 14

    Posts
    1
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Angry Re: Test statistic concerning mean comparison of UNKOWN sample distributions?

    I have the same problem and desperately needed an solution.

    Experts, please help~

  4. #3
    TS Contributor
    Points: 5,660, Level: 48
    Level completed: 55%, Points required for next Level: 90
    Karabiner's Avatar
    Location
    Schalke 04, Germany
    Posts
    858
    Thanks
    8
    Thanked 201 Times in 194 Posts

    Re: Test statistic concerning mean comparison of UNKOWN sample distributions?

    Not clear what you mean by "same problem". Perhaps you describe your problem/study (e.g. topic, objective,
    study design, sample size, measurements taken).

    Regards

    K.

  5. #4
    RotParaTon
    Points: 46,248, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Discussion EnderPosting AwardFrequent PosterCommunity AwardMaster Tagger
    Dason's Avatar
    Location
    Ames, IA
    Posts
    9,080
    Thanks
    211
    Thanked 1,608 Times in 1,378 Posts

    Re: Test statistic concerning mean comparison of UNKOWN sample distributions?


    Permutation tests I think are actually quite a bit easier to understand conceptually than our typical parametric framework. I like the approach but there are a different set of problems that go along with trying to take that approach but one of the nice things about a permutation test is that it is (almost) always a valid type of test.

+ Reply to Thread

Similar Threads

  1. Quantiative comparison of two distributions
    By ebutler in forum Statistics
    Replies: 1
    Last Post: 10-07-2009, 04:35 PM
  2. confidence interval when mean is unkown
    By laneswilets in forum General Discussion
    Replies: 1
    Last Post: 03-31-2009, 04:09 AM
  3. Replies: 0
    Last Post: 01-15-2008, 10:43 AM
  4. Replies: 1
    Last Post: 02-19-2006, 04:14 PM
  5. One-sample 't' statistic
    By mackerbuddy in forum Statistics
    Replies: 1
    Last Post: 12-04-2005, 07:52 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats