Interrater Reliability of Dichotomous Variables


New Member
Disclaimer: Not psychological research, I know, but the type of statistics are more often used in psychology than in medicine, so I'm asking for help here.

Background: I had 4 radiologists read CT scans of ~50 patients to assess whether a series of 'signs' were present or not. After a washout period, they read the scans again. Almost all the variables are dichotomous (Present/Absent). The data for each variable ends up looking like this:
A1 A2 B1 B2 C1 C2 D1 D2
0 1 1 1 1 0 0 0
1 1 0 1 1 1 1 0

Calculating intrarater reliability was easy enough, but for interrater I'm having some difficulty. I don't just want to run a Fleiss' Kappa, since (as I have it) it would assume there are 8 'raters'. But given that it's dichotomous, I'm also not sure what my other options are! As an example, for my first varialbe, I calculated Fleiss and then the interrater reliability coefficient with WinPepi (normally use Stata though), to see how they compare:
Fleiss: 0.32
Interrater Coefficient (random raters) 0.30
Interrater Coefficient (fixed raters) 0.39

What would be the best way to assess interrater reliability if you have two ratings per rater, and the variables are dichotomous? (Doesn't have to be one of the above)