Using Binomial Cumulative Distribution Function to Analyse Repeated Measures Data

Hi everyone!

I performed an experiment in which a group of N=42 participants performed a task under three different sound conditions. My data meets all the requirements for a RANOVA which confirmed that the performance, i.e. the number of errors in the task, significantly differs between the three sound conditions, Bonferroni adjusted post-hoc tests showed a significant difference for one pair of sound conditions.

At this point I could probably just end my analysis but I wondered if it makes sense to additionally evaluate under which condition each participant performed the best. This gives me counts for each sound condition, e.g. 23 participants performed best under the condition Silence (see Figure 1). Cases in which a participant achieved the lowest number of errors under multiple sound conditions were added to the count for each of those conditions. Assuming that the sound condition does not influence the results at all, I would expect these counts to be uniformly distributed between all three conditions? In this case, it should be possible to calculate the probability P(X ≤ k) to obtain k or less counts X for a single category using the binomial cumulative distribution function as

Screenshot 2022-09-29 at 17.19.45.png

with p=1/3 and n being equal to the sum of counts of all three conditions? Similarly one could calculate the probability to obtain k or more counts for a single condition, Figure 1 shows the lowest of those two calculated values P(X ≤ k) and P(X ≥ k) for the observed counts for each condition.

Is it correct to assume that if this probability for the observed counts of a specific condition is small, e.g. P < .05, this sound condition eventually influenced the participants' performance or am I thinking a bit too simple here? If this is a valid approach, does it give me any meaningful insights in addition to my RANOVA results? I already spend some time researching this but couldn't find much which probably means that it is a bad idea. Can someone explain me why?

Thanks for your help!



Well-Known Member
am I thinking a bit too simple here?
Maybe a bit too complicated.
It isn't clear to me what it is that you are proposing, or where the chart numbers came from although I'm sure it is very clear to you.
Perhaps an actual numerical example would help.
Yes, you're right! Let me try to clarify:

42 participants individually performed a cognitive test under three different sound conditions (Close, Far, Silence). To avoid practice or fatigue effects, the order of these conditions was fully balanced across all participants.

This cognitive test has multiple measures, for now I just focus on the number of (commission) errors which looks like this (rows = individual participant results, columns = conditions)


The corresponding means and confidence intervals look like this:


I performed a RANOVA and post-hoc tests with this data which showed that there is a significant difference between the conditions Far and Silence, so far so good.

Now I thought it would be interesting to additionally analyse under which condition each participant performed best, i.e. achieved the lowest number of errors so I simply searched for the minimum of each row in my data (marked green in the following table). I then counted how many participants achieved their personal best performance in each condition, i.e. just add up the number of green cells for each column in the table. At this point I'm not sure how to handle cases where participants performed best under two conditions, for now I just counted them to both conditions. From this I can state that 23 participants performed best under Silence, 18 performed best under Far and 11 performed best under Close. These values correspond to the bars in the plot from my first post.


I think this is an interesting observation in addition to my RANOVA results, the question is just how to analyse if this is actually statistically significant. The approach I described before assumes that, if the sound condition does not influence the participants at all, the condition under which they performed best should be random. Maybe one can think of this as each participant rolls a three sided dice which decides under which condition they achieve the lowest number of errors. If this assumption is true it should be possible to just use the binomial cumulative distribution function in order to calculate the probability to obtain the observed number of participants with minimum errors or more/less for each condition.

So for example 11 participants performed best under the condition Close. The total number of observation is the sum of the observed "best performances" for all three conditions, so 23 + 18 + 11 = 52. Assuming a random uniform distribution, the probability for each participant to perform best under any condition is 1/3. This means that the probability to observe 11 or less "best performances" for one condition would be


So the probability to obtain only 11 or less "best performances" as observed for condition Close is only 0.039 which is so low that one could eventually conclude that the distribution of "best performances" is not uniformly distributed between the different conditions which maybe indicates that the sound condition affected the performance?

This still seems a bit confusing but I hope that you get what I'm trying to do here?


Well-Known Member
OK. This is what I think you are getting at. Lots more people have had their best performance in silence. You want to show statistically that this is unlikely to be a fluke. You also note the fact that some people have their best performance in two lines.
Let's forget this "double best" problem for a minute and assume that everybody has only one best condition. Recode the data using 1 for the best, 0 for the other two in a row. Now you have 42 responses, one for each person. The classic approach to this would be to say that if things are random you would expect a total of 14 in each column, and do a chi square goodness of fit test. Easy to do, interpret and explain. The cumulative binomial is not needed or appropriate.
Now for the double best problem. One suggestion is to give the two bests in a row 1/2 win each. Each row totals 1 and your column totals now add to 42. You can go through the mechanics of the goodness of fit test and find the X2 score for the data. However, the problem is that you can't compare your X2 score with the chi square distribution as before, because chi square wasn't designed to be used with fractional counts.
But, you can use a Monte Carlo simulation to generate your own distribution which takes care of the problem.
Start with your 0 1 data including the 1/2s where appropriate. Total each column. Find sum((column total - 14)^2). This is your test statistic.
Now jumble the three numbers in each of the 42 rows. Record the value of X2. Repeat 1000 times. See where your data X2 fits into the generated distribution of X2's. Get a p value.
Alternatively you could be conservative. Each double pair, remove the one from that is in the extreme group. Then do the goodness of fit test on the 42 remaining ones.
Last edited:


New Member
Great, thanks! I actually thought about a goodness of fit test before but wasn't sure how to handle the "double best" problem so your suggestion really helped!

This approach gives me an X2 score of 4.32 for my data. After 1000 repetitions of randomly permuting the numbers in each row I got an X2 distribution with 72 values that are greater or equal to my data's X2 score, that means my p value would be 72/1000 = 0.072, correct?

If I go with 1000 Monte Carlo repetitions my p value varies between 0.05 and 0.1 when repeating the whole process so I just have to increase the number of Monte Carlo repetitions until my p value stays more or less constant, right?