Number of subjects vs raters for Fleiss K- Urgent Help requested

#1
Would appreciate input. We are planning to test a new scoring tool that has 5 categories. We are sending out 3-4 video recorded cases to ~40 raters and will ask them to use the new tool. Assuming a response rate of 50%, am I OK using such a small number of cases i.e. 3-4 video recordings??
I do not want to send out too many cases as that makes the raters lose interest in responding. If I am aiming for moderately high Kappa score i.e > 0.6, what should the numbers of raters be for a fixed number of subjects i.e 4 cases??