+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 28

Thread: bonferroni correction in multivariate regression

  1. #1
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    bonferroni correction in multivariate regression




    hi,
    I just read something that made me think about this, the p-values we calculate in a multiple regression have no adjustment for multiple test ( like a Bonferroni correction), right?

    Does this mean tjat more independent variables I have the more probable it is that I get false positives , factors that are significant at the 0.05 level but are in fact not related to my DV in any way?

    regards
    rogojel

  2. #2
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: bonferroni correction in multivariate regression

    Each variable will have its own ttest or wald test (whatever) set at 0.05. Multiple comparisons are usually an issue when conducting pairwise comparisons for subgroups. You should have independent rationale for each independent variable and it is not likely a subgroup of another variable.

    However, I wonder how this comes into play with large dummy coded categorical variables.
    Stop cowardice, ban guns!

  3. The Following User Says Thank You to hlsmith For This Useful Post:

    rogojel (11-07-2014)

  4. #3
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: bonferroni correction in multivariate regression

    The more independent variables you have, the more likely is that you get at least one false positive, if the true slopes are actually all zero.

    But the true slopes in real life are almost never zero anyway....

  5. The Following User Says Thank You to CowboyBear For This Useful Post:

    rogojel (11-07-2014)

  6. #4
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: bonferroni correction in multivariate regression

    hi,
    The pb. is I have no knowledge which coefficients should be zero , so basically the statement stays true, the more parameters I have in a model the less certain I can be that significant parameter is NOT a false positive.

    CowboyBear - I think it is not correct that "all slopes should be actually zero" , we should say " of all those parameters that are actually zero" with the added remark that we do not know which parameters belong to that set.

    regards
    rogojel

  7. #5
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: bonferroni correction in multivariate regression

    Quote Originally Posted by hlsmith View Post
    Each variable will have its own ttest or wald test (whatever) set at 0.05. Multiple comparisons are usually an issue when conducting pairwise comparisons for subgroups.

    However, I wonder how this comes into play with large dummy coded categorical variables.
    hi hlsmith,
    I think the problem is the same for dummy and continuous variables. IIRC we have a large number of tests, each with a false alarm probability of 5%. So, the probability that some of the significant results are false alarms will increase with the number of tests.

    The question I have now is, are the tests for the different parameters really independent? Or maybe there is some subtle dependence there that reduces the probability of false alarms somehow?

    This could have nice implications for data mining because it implies that regression will be increasingly iffy when applied to data with a large number of variables quite apart from the collinearity issues.

    regards
    rogojel

  8. #6
    Human
    Points: 12,672, Level: 73
    Level completed: 56%, Points required for next Level: 178
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,361
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: bonferroni correction in multivariate regression

    Yes, as the number of tests increases the risk of somewhere making an error increases.

    That is especially difficult in the genetics testing area, where they can be doing 10.000 or even millions of tests. One way to deal with that is with "false discovery rates".

  9. The Following User Says Thank You to GretaGarbo For This Useful Post:

    rogojel (11-07-2014)

  10. #7
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: bonferroni correction in multivariate regression

    I liked CB's comment that all included variables are presumed to be associated, and given that you may accidently slide 1/20 variables in spuriously that is not related - this is in regards to variables that are not associated, which shouldn't be the case since your variables are presumed associated. I may have messed that up a little
    Stop cowardice, ban guns!

  11. #8
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: bonferroni correction in multivariate regression

    Oh, I understand CB's comment better now, but I still do not agree. The point is, I do assume as per the null hypothesis that all variables are unrelated to my DV - so Indo not see how can I argue that they are sonehow rekated

    regards
    rogojel

  12. #9
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: bonferroni correction in multivariate regression

    That is just your null hypothesis, it usually always postulates no relationship. However, if we only tested non-related variables it would be insanity. So there is a rationale for the testing of the variable.

    I do not know how they treat this in data mining.
    Stop cowardice, ban guns!

  13. #10
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: bonferroni correction in multivariate regression

    Imagine a situation where we have 50 variables, say, relating to a Y. After a multiple regression you find 5 whose coefficients significantly differ from zero. What should be our conclusion?

    It seems to me that we can not say that the 5 parameters are linked to Y - and definitely not with a 95% confidence. Interestingly our confidence in the result should be higher if we only tested 25 variables...

    regards
    rogojel

  14. #11
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: bonferroni correction in multivariate regression

    Quote Originally Posted by rogojel View Post
    The pb. is I have no knowledge which coefficients should be zero
    Answer: Almost certainly none of them. The true parameter values could each range from anywhere from -\infty to \infty. Why do you think a value of exactly zero is so plausible that we should be so conservative about avoiding rejecting it?

    Unless there is a specific substantive or theoretical rationale to expect a slope parameter to be exactly zero, a null hypothesis of a zero slope is no more plausible than, say, an hypothesis that it is exactly equal to 0.75423.

    And in most research, there is not any substantive or theoretical reason to expect that the true slope parameters are exactly zero. Unless we are dabbling in tests of precognition or something!

    If you have a specific theory that implies that the true parameters should be exactly zero, then maybe standard null hypothesis tests could be useful. Aside from that, it's probably better to ignore them entirely and instead reports tests that tell us which parameter values are actually most likely. E.g., confidence intervals or Bayesian posterior probability distributions. If necessary, posterior probability distributions can also tell us how certain we can be that the true slope is in a particular direction, which is often more useful than knowing that it is not zero.

  15. #12
    Human
    Points: 12,672, Level: 73
    Level completed: 56%, Points required for next Level: 178
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,361
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: bonferroni correction in multivariate regression

    The original question was about p-values and error rates.

    I know that C-Bear does not like significance tests, but if the discussion is done with confidence interval, then maybe it will be more tolerable for him.

    (But, a confidence interval is the region that can not be rejected by a significance test. All parameter values outside of the confidence interval can be rejected by a significance test. There is a correspondence theorem of the link between tests and confidence intervals.)


    Regardless of if the parameter is zero or not, the confidence interval will cover the true parameter in, say, 95%. If we have, say, three statistically independent explanatory variables in a regression model, then the probability that all three confidence intervals will cover the true parameter will be multiplicative (due to the independence) so that: 0.95*0.95*0.95 = 0.86. If you do 10 tests you will have 0.95^10 = 0.59. So that the would be an error rate of (1-0.59)= 0.41

    Yes, the error rate would increase with the number of tests (which the original question was about).

    The other paper I linked to about about false discovery rates (FDR) is maybe heavy for a fast reading. This Efron paper talkes about FDR (slight summary on page 2)


    Quote Originally Posted by CowboyBear View Post
    Unless there is a specific substantive or theoretical rationale to expect a slope parameter to be exactly zero, a null hypothesis of a zero slope is no more plausible than, say, an hypothesis that it is exactly equal to 0.75423.
    It seems here that C-Bear want to come out as a Bayesian.

    Although I guess that most frequentist would not find it meaningful to discuss the "plausability" (that word is almost "probability", isn't it?) of a parameter value. But to say that it as "plausible"/probable that the parameter value "is exactly equal to 0.75423" as to be zero, that is a controversial statement. Fisher rejected the idea that in the case of no knowledge the different parameter values would have equal probability. That the prior would be flat. In some cases "we simply don't know" as John Maynard Keynes put it (who also was a great statistician).

    Quote Originally Posted by CowboyBear View Post
    And in most research, there is not any substantive or theoretical reason to expect that the true slope parameters are exactly zero.
    I would say that in most cases we have convictions that there are lots of things that are completely irrelevant. (How does the children's ice cream in Australia influence the well-being of the people in a hospital in Scotland?)

    Quote Originally Posted by CowboyBear View Post
    it's probably better to ignore them entirely and instead reports tests that tell us which parameter values are actually most likely......
    So he want to suggest a likelihood interval? Is he a Fisherian?

    Quote Originally Posted by CowboyBear View Post
    .... E.g., confidence intervals...
    But that is a Neyman-Pearson frequentist concept. Is that what he is?


    Quote Originally Posted by CowboyBear View Post
    ...or Bayesian posterior probability distributions. If necessary, posterior probability distributions can also tell us how certain we can be that the true slope is in a particular direction, which is often more useful than knowing that it is not zero.
    So he is a Bayesian after all.


  16. #13
    TS Contributor
    Points: 17,749, Level: 84
    Level completed: 80%, Points required for next Level: 101
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,540
    Thanks
    56
    Thanked 640 Times in 602 Posts

    Re: bonferroni correction in multivariate regression

    The original question is quite interesting. We perform significance tests
    all the time, so for the sake of discussion let us assume that they are
    meaningful, in that they test possibly "true" or nearly-true Null hypotheses.
    So, if we have a multifactorial ANOVA or a multiple regression, is there
    a built-in mechanism which prevents alpha inflation, caused by multiple
    singnificance tests concerning the respective factors/predictors? I
    simply don't know. If we have a regression with 4 uncorrelated predictors,
    why don't we have to adjust the significance level for the tests of
    the regression coefficients, whereas, if we would just perform 4
    correlations or 4 single regressions, we could at least discuss whether
    adjustement is necessary?

    With kind regards

    K.

  17. #14
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: bonferroni correction in multivariate regression

    Quote Originally Posted by GretaGarbo View Post
    It seems here that C-Bear want to come out as a Bayesian.
    Well sure, I guess I do edge that way But I don't think you have to be a Bayesian to be skeptical of the plausibility of point null hypotheses in most settings. There are Bayesians who tests point null hypotheses, and frequentists who don't!

    Although I guess that most frequentist would not find it meaningful to discuss the "plausability" (that word is almost "probability", isn't it?) of a parameter value.
    They wouldn't want to say "probability", sure! But surely frequentists are allowed to have beliefs about things? And surely if we go to all this trouble to test null hypotheses and avoid rejecting them without lots of evidence, there must be some implicit idea that the null hypothesis is something that is plausible? That we have some reasonable belief that it could be true? (Even if we don't use the word "probability" to describe our beliefs?)

    I would say that in most cases we have convictions that there are lots of things that are completely irrelevant. (How does the children's ice cream in Australia influence the well-being of the people in a hospital in Scotland?)
    Ok, so we could specify a null hypothesis that sales of ice cream to children in Australia have exactly no effect on the well-being of people in hospitals in Scotland. Now what if there is a news story about booming ice cream sales in the hot Australian summer on the BBC, and one person in a hospital in Scotland feels momentarily cheered by the thought of tasty ice cream? That would be all it takes for that null hypothesis, taken literally, to be false. The two variables are practically irrelevant to one another, but that isn't the same as an exactly zero relation.

    Yes, the error rate would increase with the number of tests (which the original question was about).
    Yeah, sorry, I didn't mean to make the discussion too off topic What you're saying here is intuitively obvious: The more tests we do, the bigger the chance that at least one will give the wrong answer. But that wasn't quite what rogojel asked about: He asked about a specific kind of error: False positives (Type 1 error) So the argument goes like this:

    If at least some (>1) of the multiple null hypotheses being tested are true
    Then testing these hypotheses will lead to a probability of making at least one incorrect rejection that is higher than the nominal alpha level.

    Firstly, the implication is technically true, but the whole argument is irrelevant unless the antecedent is true too. So we need to think critically about whether that could plausibly be the case.

    Secondly, the fact that we are focusing only on this kind of error - and not the inflated Type 2 error rate that would result from doing an "adjustment" - suggests to me again an implicit assumption that most of the null hypotheses being tested are actually true.

    I feel like we tend to have this implicit idea that until demonstrated otherwise, we should assume that variables are exactly unrelated; that this is a good "default" belief; that's it's what we should assume when we are ignorant (i.e., when we don't have data yet). And I think that idea is worth questioning!

  18. The Following User Says Thank You to CowboyBear For This Useful Post:

    rogojel (11-19-2014)

  19. #15
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: bonferroni correction in multivariate regression


    hi,
    sorry for the very late reaction, my internet connection is pretty much useless these days.

    If I understand correctly what CowboyBear says then I would disagree - the assumption that most of the coefficients are exactly zero ( that is they have no effect on my DV) seems to me to be reasonable.That seems to me to be equivalent to Occams razor or the paucity of effects principle from the DoE crowd.

    I guess it would be a way more complex universe in which large numbers of factors had small but ultimately measuranle effects on any DV.

    regards
    rogojel

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats