Inter-rater reliability - what analysis do I use?

During an intensive 5-day group therapy camp, where counselors are 1:1 paired with a camper (25 campers, 25 counselors), we collected data on the camper's ability to verbally respond to questions during group morning meeting and closing meeting. There were 25 campers and 25 counselors. If the camper was unable to respond, the counselor applied the treatment protocol to obtain a verbal response.

- Each counselor live-coded their camper during morning meeting and closing meeting for responses to three pre-identified questions each day. Responses were tallied as a noise or verbal response. Frequency of prompts used to obtain a verbal response were also tallied, and whether or not the counselor had to do a side bar to practice was tallied, too. Spontaneous speech was tallied as well.

- We also video recorded the questions and responses just in case. Good thing we did, because sometimes the counselors forgot to tally and needed to be reminded.

- Since each camper was coded by their counselor, and counselors rotate mid week, that means we have one rater who coded the same child on Monday and Tuesday, and then a different rater per child on Wed, and a new rater Thurs and Fri (C1, C1, C2, C3, C3).

- Originally, I was thinking we can compare live coding vs video coding and obtain an inter-rater reliability coefficient, but since there were multiple live raters for multiple children can I do that?

- I can have 2-3 video raters - should I just do that? Use video raters only and obtain an inter-rater reliability coefficient just among those raters? Is there anything I can do with the live coding? Can I compare the live coding data with the video recording data and justify that for future studies, they should do video recording data (the live coding did not go smoothly, we witnessed that during the week).

- Is it possible to just throw out the live coding data and state that it was inconsistent, or do I need to demonstrate that it varied from the recorded data (and therefore run an analysis) to justify going with the video recorded data instead?

Thanks for any help you can provide! I attached a document that shows what coding data we have.