M1-4 through F3-6 are trials in which a binary response was made (participant chose one of two stimuli presented on the screen). These were M(ale) and F(emale) stimuli. 1 through 6 represent stimuli that were presented on those trials. These numbers correspond to a conceptual variable (i.e., types of stimuli). So M1-4 and F1-4 are conceptually equivalent.

I am interested in comparing response tendency in a subset of these trials. For instance, I want to see whether M1-4 trials resulted in participants choosing a certain face (#4) more than M2-4 trials. I want to make some other such comparisons, e.g. M1-4 vs. M3-4; F1-5 vs. F2-5, etc. I only want to compare WITHIN M and F types of trials, not across.

It would also be good to see if participant's gender (in the last row above) has anything to do with these response tendencies.

I previously analyzed this with McNemar's tests corrected for multiple comparisons (participant gender was not tested in that version) but I received reviews asking me to run a "binomial mixed effect" model instead. I have no knowledge of this type of test. I read around a bit and I think this is a class of Generalized Linear Mixed Models (GLMM). I found some relevant texts confusing and heavy. I am especially not sure which of my variables would be random vs. fixed effects.

I have some knowledge of R and I think glmer is the package/function appropriate for this. If I can get a bit of help in how to build this model for my data, I think I can pull it off. Any pointers, accessible references, sample code or paper/exercise/video that does something similar is highly appreciated.

Thank you!