When to control for multiple comparisons?

I have a test that I have given people, and it is looking at two particular variables. Each variable has two item types. So either an item is, for example, nice or mean, or passionate or dispassionate.

I wanted to look at whether responses to nice and mean items were differentially affected between two time points, and also whether passionate and dispassionate items were differentially affected. Each participant takes the test twice - once at each time point.

I am looking at both rating and response time for each item.

So, overall, from what I understand the best method of doing this is to perform a two-way repeated-measures ANOVA for each of the variables examined. This leads to four different two-way repeated-measures ANOVAs.

  1. time vs. nice-mean - ratings
  2. time vs. nice-mean - response times
  3. time vs. passionate-dispassionate - ratings
  4. time vs. passionate-dispassionate - response times

Do I need to correct for multiple comparisons, and if so, how would I go about doing that?
Hi Rogojel!

I was wondering if you could give further elaboration as to how the Benjamini-Hochberg procedure can be implemented in this scenario?
I tried the Wikipedia link and another site, but I still don't quite understand it. :)


TS Contributor
this is how I understand the procedure:
1. calculate the p values and rank them, smallest first
2. define a rejection rate (alpha, generally 0,05)
3. calculate the cutoffs alpha(k)=(k/4)*alpha. So alpha(1)=1/4)*alpha, alpha(2)=2/4)*alpha....etc (4 because you have 4 independent tests to check)
4. see at which rank (K value) you first get (K/4)*alpha< p(K)
5. reject the null hypothesis for all tests with rank k<K

I hope this helps

Thank you very much for your reply!
It helps a bit, but I'm still somewhat confused.

I was wondering how I'd implement that specifically in the context of two-way repeated-measures ANOVAs? Given that there are two p-values from the two main effects, and then a third p-value of the interaction effect.
So all together therefore twelve p-values from these four hypothesis tests?

Also would you possibly be able to give an example of what you mean, as I don't quite understand what you mean by 'alpha(1)=1/4)*alpha, alpha(2)=2/4)*alpha....etc', and also by '(K/4)*alpha< p(K)'.
If it helps my alpha levels are .05?
Last edited:


TS Contributor
you have 12 p values, so, step one is to rank them in ascending order. Then pick an alpha - that would be 0.05. then create the series alpha*m/k - 0.05*1/12, 0.05*2/12,0.05*3/12....0.05. Then proceed as described above.



Less is more. Stay pure. Stay poor.
I may differ on this slightly. Not a user of Repeated ANOVA, but I would guess you don't actually care about the main effects and that the interaction is what the hypothesis is based on? If so just rank order them and adjust them as appropriate per the adjustment.

What do you mean be rating and response time? Or, just what is response time, the time point (T1 or T2)?

Also, are these two hypotheses mean vs passionate related. If for some reason they are independent, I may not consider the false discovery rate under question given two different test. I typically apply correction only when I have pairwise comparisons, but not when I have different hypothesis that are not related.

I believe the BH adjustment just goes:


So they are ranked and then multiplied by the number of tests, then -1, -2, -3.
Just wanted to weigh in here, but isn't the Bonferroni correction where you divide your p-value by the number of comparisons you are making? For example, if you are making 4 comparisons and are using p=.05, it would be .05/4 = .0125. So for each comparison you would need to have a significance of no more than p=.0125 in order for the error rates not to compound above the full .05 value?
Hi Hilsmith!

I'm looking at the mean score participants have on those particular items (so how they rate them), and then how fast it takes them to have inputted that rating (so the response time, or reaction time).

And yes! I'm interested primarily in the interaction effects - I'm also reporting the main effects, but they are more of a side note and not of any interest to the current study.

Each of the variables are independent of each other I believe? I'm not using the same variable twice in any of the analyses, at least. That is why I was unsure as to whether I would need to correct for the multiple comparisons.

And to Gdaem:

From what I understand that is correct, but Bonferroni is a very conservative estimate, which is not good when you have a really low-powered study - as then type II error is likely to become inflated, which was something I was concerned about.

Additionally I was unsure whether the corrections would be necessary, or to what extent they would be, if all the data was from the same test, but looking at separate variables.


TS Contributor
the BH procedure is valid for independent tests. IMO it is good compromise between doing no correction (way too risky) and the Bonferroni (way too conservative).



Less is more. Stay pure. Stay poor.
No I agree with everything above. That is how the Bonferroni correction works and it is overly conservative, but easy for people to understand. The tests may or may not be clearly independent since what score you give them and the time to the score seem related to me. If I am giving someone a high rating it may only take a moment to comtemplate, etc. Have you examined the relationship between response and rating?

If you are solely interested in the interactions then the main effects are irrelevant and probably ignored for reporting and correction. Use plots to convey the interactions and the readers can make their own interpretation on the conditional main effects when interaction is in play.

I have used the BH, but I think it is a little sneaky in that big effects stay in and you get to slide some small effects in as well. The decision is up to you and it comes down to whether you think they are independent or not and how cautious you would like to be with your errors.

rogojel is right in that you can always change your level of significance if you want to be cautious, it just help convey a greater level of confidence. Plus it is the effect size that you should care about not the p-values, so you need to correct intervals accordingly with whatever you apply to your pvalues, since the pvalues are difficult for the reader to truly interpret the context significance of the findings. Sorry for any typos, I am lazy and writing quickly.