Thread: bonferroni correction in multivariate regression

1. bonferroni correction in multivariate regression

hi,

Does this mean tjat more independent variables I have the more probable it is that I get false positives , factors that are significant at the 0.05 level but are in fact not related to my DV in any way?

regards
rogojel

2. Re: bonferroni correction in multivariate regression

Each variable will have its own ttest or wald test (whatever) set at 0.05. Multiple comparisons are usually an issue when conducting pairwise comparisons for subgroups. You should have independent rationale for each independent variable and it is not likely a subgroup of another variable.

However, I wonder how this comes into play with large dummy coded categorical variables.

3. The Following User Says Thank You to hlsmith For This Useful Post:

rogojel (11-07-2014)

4. Re: bonferroni correction in multivariate regression

The more independent variables you have, the more likely is that you get at least one false positive, if the true slopes are actually all zero.

But the true slopes in real life are almost never zero anyway....

5. The Following User Says Thank You to CowboyBear For This Useful Post:

rogojel (11-07-2014)

6. Re: bonferroni correction in multivariate regression

hi,
The pb. is I have no knowledge which coefficients should be zero , so basically the statement stays true, the more parameters I have in a model the less certain I can be that significant parameter is NOT a false positive.

CowboyBear - I think it is not correct that "all slopes should be actually zero" , we should say " of all those parameters that are actually zero" with the added remark that we do not know which parameters belong to that set.

regards
rogojel

7. Re: bonferroni correction in multivariate regression

Originally Posted by hlsmith
Each variable will have its own ttest or wald test (whatever) set at 0.05. Multiple comparisons are usually an issue when conducting pairwise comparisons for subgroups.

However, I wonder how this comes into play with large dummy coded categorical variables.
hi hlsmith,
I think the problem is the same for dummy and continuous variables. IIRC we have a large number of tests, each with a false alarm probability of 5%. So, the probability that some of the significant results are false alarms will increase with the number of tests.

The question I have now is, are the tests for the different parameters really independent? Or maybe there is some subtle dependence there that reduces the probability of false alarms somehow?

This could have nice implications for data mining because it implies that regression will be increasingly iffy when applied to data with a large number of variables quite apart from the collinearity issues.

regards
rogojel

8. Re: bonferroni correction in multivariate regression

Yes, as the number of tests increases the risk of somewhere making an error increases.

That is especially difficult in the genetics testing area, where they can be doing 10.000 or even millions of tests. One way to deal with that is with "false discovery rates".

9. The Following User Says Thank You to GretaGarbo For This Useful Post:

rogojel (11-07-2014)

10. Re: bonferroni correction in multivariate regression

I liked CB's comment that all included variables are presumed to be associated, and given that you may accidently slide 1/20 variables in spuriously that is not related - this is in regards to variables that are not associated, which shouldn't be the case since your variables are presumed associated. I may have messed that up a little

11. Re: bonferroni correction in multivariate regression

Oh, I understand CB's comment better now, but I still do not agree. The point is, I do assume as per the null hypothesis that all variables are unrelated to my DV - so Indo not see how can I argue that they are sonehow rekated

regards
rogojel

12. Re: bonferroni correction in multivariate regression

That is just your null hypothesis, it usually always postulates no relationship. However, if we only tested non-related variables it would be insanity. So there is a rationale for the testing of the variable.

I do not know how they treat this in data mining.

13. Re: bonferroni correction in multivariate regression

Imagine a situation where we have 50 variables, say, relating to a Y. After a multiple regression you find 5 whose coefficients significantly differ from zero. What should be our conclusion?

It seems to me that we can not say that the 5 parameters are linked to Y - and definitely not with a 95% confidence. Interestingly our confidence in the result should be higher if we only tested 25 variables...

regards
rogojel

14. Re: bonferroni correction in multivariate regression

Originally Posted by rogojel
The pb. is I have no knowledge which coefficients should be zero
Answer: Almost certainly none of them. The true parameter values could each range from anywhere from to . Why do you think a value of exactly zero is so plausible that we should be so conservative about avoiding rejecting it?

Unless there is a specific substantive or theoretical rationale to expect a slope parameter to be exactly zero, a null hypothesis of a zero slope is no more plausible than, say, an hypothesis that it is exactly equal to 0.75423.

And in most research, there is not any substantive or theoretical reason to expect that the true slope parameters are exactly zero. Unless we are dabbling in tests of precognition or something!

If you have a specific theory that implies that the true parameters should be exactly zero, then maybe standard null hypothesis tests could be useful. Aside from that, it's probably better to ignore them entirely and instead reports tests that tell us which parameter values are actually most likely. E.g., confidence intervals or Bayesian posterior probability distributions. If necessary, posterior probability distributions can also tell us how certain we can be that the true slope is in a particular direction, which is often more useful than knowing that it is not zero.

15. Re: bonferroni correction in multivariate regression

The original question was about p-values and error rates.

I know that C-Bear does not like significance tests, but if the discussion is done with confidence interval, then maybe it will be more tolerable for him.

(But, a confidence interval is the region that can not be rejected by a significance test. All parameter values outside of the confidence interval can be rejected by a significance test. There is a correspondence theorem of the link between tests and confidence intervals.)

Regardless of if the parameter is zero or not, the confidence interval will cover the true parameter in, say, 95%. If we have, say, three statistically independent explanatory variables in a regression model, then the probability that all three confidence intervals will cover the true parameter will be multiplicative (due to the independence) so that: 0.95*0.95*0.95 = 0.86. If you do 10 tests you will have 0.95^10 = 0.59. So that the would be an error rate of (1-0.59)= 0.41

Yes, the error rate would increase with the number of tests (which the original question was about).

The other paper I linked to about about false discovery rates (FDR) is maybe heavy for a fast reading. This Efron paper talkes about FDR (slight summary on page 2)

Originally Posted by CowboyBear
Unless there is a specific substantive or theoretical rationale to expect a slope parameter to be exactly zero, a null hypothesis of a zero slope is no more plausible than, say, an hypothesis that it is exactly equal to 0.75423.
It seems here that C-Bear want to come out as a Bayesian.

Although I guess that most frequentist would not find it meaningful to discuss the "plausability" (that word is almost "probability", isn't it?) of a parameter value. But to say that it as "plausible"/probable that the parameter value "is exactly equal to 0.75423" as to be zero, that is a controversial statement. Fisher rejected the idea that in the case of no knowledge the different parameter values would have equal probability. That the prior would be flat. In some cases "we simply don't know" as John Maynard Keynes put it (who also was a great statistician).

Originally Posted by CowboyBear
And in most research, there is not any substantive or theoretical reason to expect that the true slope parameters are exactly zero.
I would say that in most cases we have convictions that there are lots of things that are completely irrelevant. (How does the children's ice cream in Australia influence the well-being of the people in a hospital in Scotland?)

Originally Posted by CowboyBear
it's probably better to ignore them entirely and instead reports tests that tell us which parameter values are actually most likely......
So he want to suggest a likelihood interval? Is he a Fisherian?

Originally Posted by CowboyBear
.... E.g., confidence intervals...
But that is a Neyman-Pearson frequentist concept. Is that what he is?

Originally Posted by CowboyBear
...or Bayesian posterior probability distributions. If necessary, posterior probability distributions can also tell us how certain we can be that the true slope is in a particular direction, which is often more useful than knowing that it is not zero.
So he is a Bayesian after all.

16. Re: bonferroni correction in multivariate regression

The original question is quite interesting. We perform significance tests
all the time, so for the sake of discussion let us assume that they are
meaningful, in that they test possibly "true" or nearly-true Null hypotheses.
So, if we have a multifactorial ANOVA or a multiple regression, is there
a built-in mechanism which prevents alpha inflation, caused by multiple
singnificance tests concerning the respective factors/predictors? I
simply don't know. If we have a regression with 4 uncorrelated predictors,
why don't we have to adjust the significance level for the tests of
the regression coefficients, whereas, if we would just perform 4
correlations or 4 single regressions, we could at least discuss whether

With kind regards

K.

17. Re: bonferroni correction in multivariate regression

Originally Posted by GretaGarbo
It seems here that C-Bear want to come out as a Bayesian.
Well sure, I guess I do edge that way But I don't think you have to be a Bayesian to be skeptical of the plausibility of point null hypotheses in most settings. There are Bayesians who tests point null hypotheses, and frequentists who don't!

Although I guess that most frequentist would not find it meaningful to discuss the "plausability" (that word is almost "probability", isn't it?) of a parameter value.
They wouldn't want to say "probability", sure! But surely frequentists are allowed to have beliefs about things? And surely if we go to all this trouble to test null hypotheses and avoid rejecting them without lots of evidence, there must be some implicit idea that the null hypothesis is something that is plausible? That we have some reasonable belief that it could be true? (Even if we don't use the word "probability" to describe our beliefs?)

I would say that in most cases we have convictions that there are lots of things that are completely irrelevant. (How does the children's ice cream in Australia influence the well-being of the people in a hospital in Scotland?)
Ok, so we could specify a null hypothesis that sales of ice cream to children in Australia have exactly no effect on the well-being of people in hospitals in Scotland. Now what if there is a news story about booming ice cream sales in the hot Australian summer on the BBC, and one person in a hospital in Scotland feels momentarily cheered by the thought of tasty ice cream? That would be all it takes for that null hypothesis, taken literally, to be false. The two variables are practically irrelevant to one another, but that isn't the same as an exactly zero relation.

Yes, the error rate would increase with the number of tests (which the original question was about).
Yeah, sorry, I didn't mean to make the discussion too off topic What you're saying here is intuitively obvious: The more tests we do, the bigger the chance that at least one will give the wrong answer. But that wasn't quite what rogojel asked about: He asked about a specific kind of error: False positives (Type 1 error) So the argument goes like this:

If at least some (>1) of the multiple null hypotheses being tested are true
Then testing these hypotheses will lead to a probability of making at least one incorrect rejection that is higher than the nominal alpha level.

Firstly, the implication is technically true, but the whole argument is irrelevant unless the antecedent is true too. So we need to think critically about whether that could plausibly be the case.

Secondly, the fact that we are focusing only on this kind of error - and not the inflated Type 2 error rate that would result from doing an "adjustment" - suggests to me again an implicit assumption that most of the null hypotheses being tested are actually true.

I feel like we tend to have this implicit idea that until demonstrated otherwise, we should assume that variables are exactly unrelated; that this is a good "default" belief; that's it's what we should assume when we are ignorant (i.e., when we don't have data yet). And I think that idea is worth questioning!

18. The Following User Says Thank You to CowboyBear For This Useful Post:

rogojel (11-19-2014)

19. Re: bonferroni correction in multivariate regression

hi,
sorry for the very late reaction, my internet connection is pretty much useless these days.

If I understand correctly what CowboyBear says then I would disagree - the assumption that most of the coefficients are exactly zero ( that is they have no effect on my DV) seems to me to be reasonable.That seems to me to be equivalent to Occams razor or the paucity of effects principle from the DoE crowd.

I guess it would be a way more complex universe in which large numbers of factors had small but ultimately measuranle effects on any DV.

regards
rogojel

Page 1 of 2 1 2 Last

 Tweet