Which inter-rater reliability test should I use?

Hi, I have rating data from a questionnaire for two raters. Each rater says which level of a category they think an object will fit into based on a description. In each category, a higher level or score means the object is more advanced in that category. I have a total of 11 scores for each person, as they rate the object's skill in 11 different categories. However, each rating question/ category has a different number of possible responses. Eg. in question/ category 1 the object can be assigned to one of 12 skill levels, whereas in question/category two the object can only be assigned into one of 5 levels.
I was looking at the weighted kappa test, but I think my data violates the assumptions as I have different levels in each category. I was considering the weighted test as my data is ordinal.

Please, can someone advise how I should analyse this? One way would be to look at each category/ question separately then take a mean of those weighted kappa values to get an overall agreement score, but I'm not sure if this would be correct.

Thank you!