Inter/Intra-class correlation between graders, large dataset

#1
I have a set of 100 images obtained from a machine called an OCT, its similar to an ultra-sound and I use it to take pictures of various skin cancers. The main goal is to give a diagnosis by looking at the OCT image. To do so, I pulled 8 ‘features’ that I can observe by OCT. Each feature or group of feature may favor a diagnosis over the other. I would like to assess the reliability of these features and ideally keep only the ones that are the most reliable to reach the correct diagnosis.

I uploaded each image in a quiz format with multiple-choice answer corresponding to each feature. Each question shows an image and graders are asked to select all (or none) of the features according to their observation. All the features are dichotomic (present-absent). I had 3 separate graders look at my images so now I have 100 images each with 8 yes/no answer.

I want to assess which answer is the most consistent amongst all graders, and for all images. I have attached a sample question for you at the end.
At this point in my study, there are no right or wrong answer. By example, I just want to check if different people can consistently see the presence of hyper-reflective nodules or loss of DEJ in various types of image.

The only way I can find to perform this type of analysis is a 2-way random, single rater ICC analysis. The problem is that my data table would have 3 rows and 800 columns.; which sounds a bit unreasonable. Before I proceed that way, I am sure there is a better and more efficient way to do this. I am using SPSS.

I'm quite a rookie at stats, any input will be much apperciated!
Cheers!
 

Attachments