+ Reply to Thread
Page 1 of 4 1 2 3 4 LastLast
Results 1 to 15 of 46

Thread: A fundamental question!

  1. #1
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    A fundamental question!




    First sorry for my first-grade English! If you are OK with it, please read on.

    Second, sorry that my question is so basic, but sometimes I see very easy questions which are surprisingly and kindly answered by the community, so I'm trying my chance to learn something here.

    Can someone tell me what is the threshold for a multiple comparison testing? Based on my experience with some statisticians and reading medical/dental articles, first I thought that for being a "multiple comparison" case, a setup should follow the pattern of an ANOVA (or non-parametric alternatives). Later a friend told me every design with more than one test being performed is a case of multiple comparison, and should be adjusted properly.

    I have a basic but important question: was that true? I have read lots of articles in which more than (or sometimes much more than) a test have been performed, but without any multiple comparison corrections (except those corrected by the post-hoc tests). Actually I have seen the fixing of multiple comparison only in ANOVA-like designs.

    Lets assume that running more than one test increases the chance of obtaining a type I error. My fundamental question here is what must be the context where the number of tests are counted? An ANOVA? a study? all studies by a researcher? all studies in a day? or all studies? And how can someone decide what this limit is. My English and mathematics don't let me give precise scientific explanations, but my common sense still insists that this multiple comparison thing is not fully valid. OK I have read some basic articles with the convincing message that increasing the number of tests actually increases the number of P values < 0.05 by chance. However, is it "increasing the number of tests in a study (e.g., Bonferroni's correction)?" or is it "the number of tests in a part of a study (for example if we have 10 Freidmans in a study, should we fix the type-I error for each Freidman separately? or should we do it for all the possible pairwise comparisons? [the difference could be hug])"? or "in all studies"? or in what?! I.e., perhaps we might consider the whole body of research as one single study attempting to understand the world (the sample is a sophisticated composite of all the small samples). If this is true, then almost infinite number of statistical tests been and being performed can definitely disrupt all we are trying to elucidate from all these statistics (I mean all those P values < 0.05 can be actually results of millions of tests being done! and thus happened by chance. So we should work with alphas < 10^-1000 for example, rather than 0.05). My word seems to be pointless, but my question is exactly "how we can decide whether it is pointless or not?" I mean what is the logic behind this multiple-comparison decision? You might think this is so basic, but I could link to some articles published in accredited journals in which multiple comparison fixing has been done only for some specific test within a very larger setup; for example for the pairwise tests within the only significant Freidman test (out of 6 non-significant Freidmans as well as several other significant and non-significant tests such as Spearmans and chi-squares).

    Thank you very much for reading, and so very much for, well, discussing/replying.

  2. #2
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: A fundamental question!

    Quote Originally Posted by victorxstc View Post
    Second, sorry that my question is so basic
    I don't think this is a basic question at all

    Lets assume that running more than one test increases the chance of obtaining a type I error. My fundamental question here is what must be the context where the number of tests are counted? An ANOVA? a study? all studies by a researcher? all studies in a day? or all studies?
    ...or why not all the studies in a particular issue of a particular journal? Or all articles written by members of TalkStats? I don't think there's a good answer to this, really. I think the general decision rule applied by many social science researchers is probably "correct for multiple tests when SPSS prompts me to". I think I've heard suggestions that this friendly fellow may be less suspect to this particular problem, though...

  3. The Following User Says Thank You to CowboyBear For This Useful Post:

    victorxstc (10-31-2011)

  4. #3
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: A fundamental question!

    Thanks for more examples and the conditional probability hint to search for, and also for the relief that I haven't gone crazy!

    I think the general decision rule applied by many social science researchers is probably "correct for multiple tests when SPSS prompts me to"
    Then it can be accepted to limit this type of correction to ANOVA-like tests only (not all the tests within a study). Also, then each correction should be applied independently to the post hocs of each Kruskal-Wallis/Freidman/etc [again such a relief!].

  5. #4
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: A fundamental question!

    I have compared 4 means (+-SD) with 10 constant values (0 to 9) using a one-sample t. The P values computed show a consistent change. No random outliers have appeared. This makes me doubt the problem of multiple comparisons as a real problem. According to the descriptions, if it really existed, out of the 40 P values resulted from my 40 tests, some should be randomly out of the range of the other calculated Ps. But I see the P values steadily continue to get larger and larger.

    Can anyone kindly explain how and why the problem of multiple comparison didn't happen here?

  6. #5
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: A fundamental question!

    I am not sure this is pertinant to what you are asking, but family wise error (and thus the chance of a type 1 error) increases as you use the same data for multiple test.

  7. The Following User Says Thank You to noetsi For This Useful Post:

    victorxstc (11-08-2011)

  8. #6
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: A fundamental question!

    What if we calculate only one P value based on some data. For example I have a mean: 15 +- 2.2.... I can compare it with a constant value: 6 and record the P value.

    Then compare the same mean value with 100,000 constant values (from -50000 to +49999) and record the 100,000 P values.

    According to what I have cited here (first post), the first examination is an example of single comparison, but the second is a multiple comparison.

    Then I can go fetch the specific P value from the comparison between my mean value and the value 6, but in the second examination (the multiple tests). I guess the P values from the first and second examinations would be exactly the same.

    The problem of multiple comparisons says it would probably become smaller (type I error) in the second examination.

  9. #7
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: A fundamental question!

    What if there are multiple comparisons over time.

    For example what if I have some data and run some tests on it. Then delete the results, and revise my work and do some other tests on the same data. Then delete those and run some other ones. If I repeat this procedure for 100 times, is the chance of getting a false positive higher at the 100th test?

    Is it a multiple comparison? Common sense tells me it is, since I think there is no difference between 100 tests performed simultaneously on a dataset, or performed one by one on a dataset.

    ----

    And if there is a higher possibility to get a false positive at the 100th test, why not at the 1st test? Aren't they all involved in a single unit of multiple comparison? If so, how the nature knows I am gonna test my data for 100 times, so the nature can give me a higher chance of false-positive error at the first test too?

    I wish there were some good answers.

  10. #8
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: A fundamental question!

    As I understand familywise error the issue is not whether you compare anything over time, but calculating multiple statistics with the same data. But that raises a question I have no answer for. It is not uncommon to generate multiple t test, f value etc in a multiple regression run. But as far as I know you don't apply familywise corrections to deal with that.

    In honesty, if you are using ANOVA or t-test I would consider using a post hoc test like Tukey HSD that automatically corrects for familywise error.

  11. The Following User Says Thank You to noetsi For This Useful Post:

    victorxstc (11-08-2011)

  12. #9
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: A fundamental question!

    Thanks noetsi for kind answers.

    Quote Originally Posted by noetsi View Post
    But that raises a question I have no answer for. It is not uncommon to generate multiple t test, f value etc in a multiple regression run. But as far as I know you don't apply familywise corrections to deal with that.
    Till now I though the problem of multiple comparison is already corrected by the statistical package once calculating P values in a regression analysis. I didn't know it is compromised too.

    As I understand familywise error the issue is not whether you compare anything over time, but calculating multiple statistics with the same data.
    OK. By "over time" I mean "repeat" [of the tests on one single dataset]. But a repeat which lasts for sometime to finish. Otherwise all multiple comparisons are performed over time, as our statistical software do the process serially, thus one by one (but in milliseconds, instead of days). So whats the difference?

    Even lets talk about Bonferroni's correction in a Freidman's test. Assume we want to test all the subgroups involved in a Freidman's test with a Wilcoxon's test, and we have 100 pairwise comparisons. According to the formula, we should adjust the level of significance to 0.05/100.

    Lets assume we don't have a PC and wanna calculate all the Wilcoxon's P values manually. My question is how the nature knows in the very beginning of our tests that "there are 100 pairwise tests (on a single dataset) to come, so it should increase the false positive error rate for us by 100x"? Does it know this fact after we ran the 100th test? Or does it know this fact at first? What happens to the rate of false positive error, if we get tired and stop calculating the P values after 20 tests? If the nature has decided to give us 100x type I error (once we decided to run 100 tests), and we stop running more test after the 20th test, then the nature gets fooled! Otherwise, if the nature understands that we are running 100 tests based on counting our tests, then it won't be fooled if we stop in the middle of the process. But another problem emerges: at the first test, it would think we have only one test, so would not give us a higher chance of type I error, at the second test, it would say "OK this dataset has been tested twice, so I would double the rate of type I error for the researcher", and at the third test it would increase the error rate further. If so, the order of the tests done gets important and none of them would have a uniform level of type I error possibility.

    OK, the computer too calculates all these multiple comparisons, serially. No difference between our slow method and its fast method. So how nature can understand that the SPSS is going to run a 1000-time repeated multiple comparison, when the only first test has been performed and there remains 999 other tests to come in a microsecond. (microsecond for SPSS or hours for us, no matter how we feel about these time measures. The problem is the multiple comparisons are done only serially and over time).

    When digging into details, the problem of multiple comparison appears more and more confusing and somehow ridiculous to me.

  13. #10
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: A fundamental question!

    The points you raise, which I think are important ones, are beyond my expertise. I am confused by these type of issues as well. That is why I suggest Tukey's HSD - the software calculates this for you and the test is well accepted. So it apparently deals adequately with your issues (whatever the correct answers are).

  14. The Following User Says Thank You to noetsi For This Useful Post:

    victorxstc (11-08-2011)

  15. #11
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: A fundamental question!

    Thanks noetsi. The Tukey as well has its own limitations. A Tukey can fix the problem of multiple comparison within each ANOVA setup; but if there are other tests analyzing the same setup (for example other ANOVAs, or other types of tests), the Tukey would ignore them all. While according to the rule, all of those "other" tests are too involved in that multiple comparison problem.

    If I am on the right track, it appears to me this multiple comparison thing is more of a cliche (maybe even a wrong one) requested by journals, rather than something really scientific.
    Last edited by victorxstc; 11-08-2011 at 06:41 PM.

  16. #12
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: A fundamental question!

    I have set the level of significance at 0.01 or sometimes 0.001 in many articles of mine to address the multiple comparison problem, since most of the studies have at least 4 or 5 separate tests on the same data. However the reviewers have criticized this as definitely wrong and got surprised as if they have seen an alien! which this is another source of doubt about this multiple comparison issue.

  17. #13
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: A fundamental question!

    Just arbitrarily lowering the alpha probably isn't the best route. You could probably at least just justify it by using bonferonni. But you should at least mention that.

  18. The Following User Says Thank You to Dason For This Useful Post:

    victorxstc (11-09-2011)

  19. #14
    Pirate
    Points: 15,159, Level: 79
    Level completed: 62%, Points required for next Level: 191
    victorxstc's Avatar
    Posts
    875
    Thanks
    229
    Thanked 332 Times in 297 Posts

    Re: A fundamental question!

    Thanks Dason. I use the Bonferroni to lower the alpha (not arbitrarily), but yes they don't know it and should be mentioned. Hope it works this time.

  20. #15
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: A fundamental question!


    I would be worried about a reviewer if they've never heard of a bonferonni correction...

+ Reply to Thread
Page 1 of 4 1 2 3 4 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats