# bonferroni correction in multivariate regression

#### rogojel

##### TS Contributor
hi,
I just read something that made me think about this, the p-values we calculate in a multiple regression have no adjustment for multiple test ( like a Bonferroni correction), right?

Does this mean tjat more independent variables I have the more probable it is that I get false positives , factors that are significant at the 0.05 level but are in fact not related to my DV in any way?

regards
rogojel

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Each variable will have its own ttest or wald test (whatever) set at 0.05. Multiple comparisons are usually an issue when conducting pairwise comparisons for subgroups. You should have independent rationale for each independent variable and it is not likely a subgroup of another variable.

However, I wonder how this comes into play with large dummy coded categorical variables.

#### CB

##### Super Moderator
The more independent variables you have, the more likely is that you get at least one false positive, if the true slopes are actually all zero.

But the true slopes in real life are almost never zero anyway....

#### rogojel

##### TS Contributor
hi,
The pb. is I have no knowledge which coefficients should be zero , so basically the statement stays true, the more parameters I have in a model the less certain I can be that significant parameter is NOT a false positive.

CowboyBear - I think it is not correct that "all slopes should be actually zero" , we should say " of all those parameters that are actually zero" with the added remark that we do not know which parameters belong to that set.

regards
rogojel

#### rogojel

##### TS Contributor
Each variable will have its own ttest or wald test (whatever) set at 0.05. Multiple comparisons are usually an issue when conducting pairwise comparisons for subgroups.

However, I wonder how this comes into play with large dummy coded categorical variables.
hi hlsmith,
I think the problem is the same for dummy and continuous variables. IIRC we have a large number of tests, each with a false alarm probability of 5%. So, the probability that some of the significant results are false alarms will increase with the number of tests.

The question I have now is, are the tests for the different parameters really independent? Or maybe there is some subtle dependence there that reduces the probability of false alarms somehow?

This could have nice implications for data mining because it implies that regression will be increasingly iffy when applied to data with a large number of variables quite apart from the collinearity issues.

regards
rogojel

#### GretaGarbo

##### Human
Yes, as the number of tests increases the risk of somewhere making an error increases.

That is especially difficult in the genetics testing area, where they can be doing 10.000 or even millions of tests. One way to deal with that is with "false discovery rates".

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I liked CB's comment that all included variables are presumed to be associated, and given that you may accidently slide 1/20 variables in spuriously that is not related - this is in regards to variables that are not associated, which shouldn't be the case since your variables are presumed associated. I may have messed that up a little #### rogojel

##### TS Contributor
Oh, I understand CB's comment better now, but I still do not agree. The point is, I do assume as per the null hypothesis that all variables are unrelated to my DV - so Indo not see how can I argue that they are sonehow rekated regards
rogojel

#### hlsmith

##### Less is more. Stay pure. Stay poor.
That is just your null hypothesis, it usually always postulates no relationship. However, if we only tested non-related variables it would be insanity. So there is a rationale for the testing of the variable.

I do not know how they treat this in data mining.

#### rogojel

##### TS Contributor
Imagine a situation where we have 50 variables, say, relating to a Y. After a multiple regression you find 5 whose coefficients significantly differ from zero. What should be our conclusion?

It seems to me that we can not say that the 5 parameters are linked to Y - and definitely not with a 95% confidence. Interestingly our confidence in the result should be higher if we only tested 25 variables...

regards
rogojel

#### CB

##### Super Moderator
The pb. is I have no knowledge which coefficients should be zero
Answer: Almost certainly none of them. The true parameter values could each range from anywhere from $$-\infty$$ to $$\infty$$. Why do you think a value of exactly zero is so plausible that we should be so conservative about avoiding rejecting it?

Unless there is a specific substantive or theoretical rationale to expect a slope parameter to be exactly zero, a null hypothesis of a zero slope is no more plausible than, say, an hypothesis that it is exactly equal to 0.75423.

And in most research, there is not any substantive or theoretical reason to expect that the true slope parameters are exactly zero. Unless we are dabbling in tests of precognition or something! If you have a specific theory that implies that the true parameters should be exactly zero, then maybe standard null hypothesis tests could be useful. Aside from that, it's probably better to ignore them entirely and instead reports tests that tell us which parameter values are actually most likely. E.g., confidence intervals or Bayesian posterior probability distributions. If necessary, posterior probability distributions can also tell us how certain we can be that the true slope is in a particular direction, which is often more useful than knowing that it is not zero.

#### GretaGarbo

##### Human
The original question was about p-values and error rates.

I know that C-Bear does not like significance tests, but if the discussion is done with confidence interval, then maybe it will be more tolerable for him.

(But, a confidence interval is the region that can not be rejected by a significance test. All parameter values outside of the confidence interval can be rejected by a significance test. There is a correspondence theorem of the link between tests and confidence intervals.)

Regardless of if the parameter is zero or not, the confidence interval will cover the true parameter in, say, 95%. If we have, say, three statistically independent explanatory variables in a regression model, then the probability that all three confidence intervals will cover the true parameter will be multiplicative (due to the independence) so that: 0.95*0.95*0.95 = 0.86. If you do 10 tests you will have 0.95^10 = 0.59. So that the would be an error rate of (1-0.59)= 0.41

Yes, the error rate would increase with the number of tests (which the original question was about).

The other paper I linked to about about false discovery rates (FDR) is maybe heavy for a fast reading. This Efron paper talkes about FDR (slight summary on page 2)

Unless there is a specific substantive or theoretical rationale to expect a slope parameter to be exactly zero, a null hypothesis of a zero slope is no more plausible than, say, an hypothesis that it is exactly equal to 0.75423.
It seems here that C-Bear want to come out as a Bayesian.

Although I guess that most frequentist would not find it meaningful to discuss the "plausability" (that word is almost "probability", isn't it?) of a parameter value. But to say that it as "plausible"/probable that the parameter value "is exactly equal to 0.75423" as to be zero, that is a controversial statement. Fisher rejected the idea that in the case of no knowledge the different parameter values would have equal probability. That the prior would be flat. In some cases "we simply don't know" as John Maynard Keynes put it (who also was a great statistician).

And in most research, there is not any substantive or theoretical reason to expect that the true slope parameters are exactly zero.
I would say that in most cases we have convictions that there are lots of things that are completely irrelevant. (How does the children's ice cream in Australia influence the well-being of the people in a hospital in Scotland?)

it's probably better to ignore them entirely and instead reports tests that tell us which parameter values are actually most likely......
So he want to suggest a likelihood interval? Is he a Fisherian?

.... E.g., confidence intervals...
But that is a Neyman-Pearson frequentist concept. Is that what he is?

...or Bayesian posterior probability distributions. If necessary, posterior probability distributions can also tell us how certain we can be that the true slope is in a particular direction, which is often more useful than knowing that it is not zero.
So he is a Bayesian after all. #### Karabiner

##### TS Contributor
The original question is quite interesting. We perform significance tests
all the time, so for the sake of discussion let us assume that they are
meaningful, in that they test possibly "true" or nearly-true Null hypotheses.
So, if we have a multifactorial ANOVA or a multiple regression, is there
a built-in mechanism which prevents alpha inflation, caused by multiple
singnificance tests concerning the respective factors/predictors? I
simply don't know. If we have a regression with 4 uncorrelated predictors,
why don't we have to adjust the significance level for the tests of
the regression coefficients, whereas, if we would just perform 4
correlations or 4 single regressions, we could at least discuss whether

With kind regards

K.

#### CB

##### Super Moderator
It seems here that C-Bear want to come out as a Bayesian.
Well sure, I guess I do edge that way But I don't think you have to be a Bayesian to be skeptical of the plausibility of point null hypotheses in most settings. There are Bayesians who tests point null hypotheses, and frequentists who don't!

Although I guess that most frequentist would not find it meaningful to discuss the "plausability" (that word is almost "probability", isn't it?) of a parameter value.
They wouldn't want to say "probability", sure! But surely frequentists are allowed to have beliefs about things? And surely if we go to all this trouble to test null hypotheses and avoid rejecting them without lots of evidence, there must be some implicit idea that the null hypothesis is something that is plausible? That we have some reasonable belief that it could be true? (Even if we don't use the word "probability" to describe our beliefs?)

I would say that in most cases we have convictions that there are lots of things that are completely irrelevant. (How does the children's ice cream in Australia influence the well-being of the people in a hospital in Scotland?)
Ok, so we could specify a null hypothesis that sales of ice cream to children in Australia have exactly no effect on the well-being of people in hospitals in Scotland. Now what if there is a news story about booming ice cream sales in the hot Australian summer on the BBC, and one person in a hospital in Scotland feels momentarily cheered by the thought of tasty ice cream? That would be all it takes for that null hypothesis, taken literally, to be false. The two variables are practically irrelevant to one another, but that isn't the same as an exactly zero relation.

Yes, the error rate would increase with the number of tests (which the original question was about).
Yeah, sorry, I didn't mean to make the discussion too off topic What you're saying here is intuitively obvious: The more tests we do, the bigger the chance that at least one will give the wrong answer. But that wasn't quite what rogojel asked about: He asked about a specific kind of error: False positives (Type 1 error) So the argument goes like this:

If at least some (>1) of the multiple null hypotheses being tested are true
Then testing these hypotheses will lead to a probability of making at least one incorrect rejection that is higher than the nominal alpha level.

Firstly, the implication is technically true, but the whole argument is irrelevant unless the antecedent is true too. So we need to think critically about whether that could plausibly be the case.

Secondly, the fact that we are focusing only on this kind of error - and not the inflated Type 2 error rate that would result from doing an "adjustment" - suggests to me again an implicit assumption that most of the null hypotheses being tested are actually true.

I feel like we tend to have this implicit idea that until demonstrated otherwise, we should assume that variables are exactly unrelated; that this is a good "default" belief; that's it's what we should assume when we are ignorant (i.e., when we don't have data yet). And I think that idea is worth questioning! #### rogojel

##### TS Contributor
hi,
sorry for the very late reaction, my internet connection is pretty much useless these days.

If I understand correctly what CowboyBear says then I would disagree - the assumption that most of the coefficients are exactly zero ( that is they have no effect on my DV) seems to me to be reasonable.That seems to me to be equivalent to Occams razor or the paucity of effects principle from the DoE crowd.

I guess it would be a way more complex universe in which large numbers of factors had small but ultimately measuranle effects on any DV.

regards
rogojel

#### CB

##### Super Moderator
If I understand correctly what CowboyBear says then I would disagree - the assumption that most of the coefficients are exactly zero ( that is they have no effect on my DV) seems to me to be reasonable.That seems to me to be equivalent to Occams razor or the paucity of effects principle from the DoE crowd
Interesting, I've never heard of the paucity of effects principle. Could you expand?

Occam's razor says we should prefer the simpler of two explanations that are equally good at explaining the same set of observations. It is not a guarantee that the world itself is simple, so I don't think it's relevant here. You could say that the idea that parameters tend often to be zero is a theory, but that theory would by definition not be as good at explaining actual observations as a theory allowing parameters to vary, so again the razor is of limited relevance.

If you'd like an empirical demonstration of my point, see this article: Empirical statistics: IV. Illustrating Meehl's sixth law of soft psychology: everything correlates with everything

The authors take a haphazard grab-bag of 135 educational and biographical variables, and examine their association in 2058 subjects. Despite being conceptually unrelated in many cases, each variable had a significant correlation with about 41% of the other variables. (Not a regression approach, but you get the idea).

#### Injektilo

##### New Member
If I understand correctly what CowboyBear says then I would disagree - the assumption that most of the coefficients are exactly zero ( that is they have no effect on my DV) seems to me to be reasonable.That seems to me to be equivalent to Occams razor or the paucity of effects principle from the DoE crowd.
Well remember how SSTs are applied: we either reject the null hypothesis or we fail to reject it. Note that failing to reject a null hypothesis does not equate to accepting the null hypothesis.

#### rogojel

##### TS Contributor
hi,
the paucity of effects means thatbwhen we design a screening experiment we can assume that most of the factors will be inert , that is they will have no effect on the studied outcome. E.g if we have 10 factors then we can investigate 1024 different effects for main effects and various interactions between the factors - and we very definitely expect most of those to have a coefficient that is exactly zero.

I see Occams razor in a bit less restrictive way. My latin fails me , but IIRC the original statement says something like you should not multiply the causes without need. That would fit into,the view, that one should aim for the model that has the least number of parameters without a serious deterioration of model quality, this latter being defined in some objective way e.g. predictive RMSE or such.

@injektilo - I do not see the practical difference. If I can not reject the null hypothesis that would mean, in practice,mthat I have no basis, for example, to request a new investment in a plant for some machine that will control that factor. If I can prove the null hypothesis false, I have the needed arguments to request such an investment.

regards
rogojel