Cohen's kappa or Fleiss kappa or both?

Hi all,

I am trying to compare 2 instruments, A and B, against the gold standard. The measurement outcome is dichotomous.

2 different raters, R1 and R2, each uses instrument A and B to rate each subject. So the data looks something like:

ID, A_R1, A_R2, B_R1, B_R2, Gold standard
1, 1, 1, 1, 0, 1
2, 1, 0, 0, 0, 1

I think I can use Cohen's kappa to calculate agreement between gold standard and
A_R1, A_R2, B_R1 and B_R2 separately, but I am not sure if that is the most appropriate way.

1) Can I use Fleiss Kappa to calculate inter-rater agreement for e.g A_R1, A_R2 and gold standard instead? Does it matter that one of the raters is the gold standard?

2) or can I use Cohenn's kappa by combining ratings, e.g.

ID, A, Gold standard
1, 1, 1
1, 1, 1
2, 1, 1
2, 0, 1

I would be most grateful for any advice on how I can proceed. Thank you in advance for your responses :)


Less is more. Stay pure. Stay poor.
If I am remembering right, it seems as though Cohen's will give you the individual comparisons and Fleiss is an overall/general that can examine multiple raters. Not sure if grouping all three together will tell you much, unless you can tease out direct comparisons as well.