Failure to calculate Kappa for a constant rater.

Omerikooo

New Member
I have a dataset with two columns (first column indicating the first measurement, second column indicating the second measurement), and I want to calculate intra-rater reliability as Kappa in SPSS. It can't be calculated because the rates are constant and although I have a perfect match I fail to get a valid kappa value. Are there any workarounds for this or any other statistical tests?

noetsi

No cake for spunky
I don't work with kappa but if the data is constant and you can't calculate kappa for that why would you want to do something the data won't support?

If you want to simulate the results you can add a little bit to each data point so it is not constant.

Omerikooo

New Member

Actually you are right. My main problem is that my intra-reliability is actually perfect but due to intrinsic properties of kappa statistics the coefficient can't be calculated.

Since my variables are categorical adding values like 0.1 would change them into intervals (i.e. 1.1) and in that case Kappa can't be calculated.

My solution can be to present percent agreement (100% are the same, 90% are the same etc.) but this is not a good way to report agreement.

A solution can be an other test which still works even if the rater gives constant points. I failed to find any, unfortunantely.

noetsi

No cake for spunky
I don't understand how intra-reliability, or anything in real data can ever be perfect. You mean two raters agreed every time? That does not seem likely.

But if it is true, do you really need a test statistic for interrater reliability? You have two clones

Omerikooo

New Member
I don't understand how intra-reliability, or anything in real data can ever be perfect. You mean two raters agreed every time? That does not seem likely.

But if it is true, do you really need a test statistic for interrater reliability? You have two clones
Haha, you are right. Intra-rater reliability refers to measurements of one measurer on different time points. My results means that the measurer gave the same values for the same variable in both time points. This is possible since the measurement is easy and the variable is binary.

I agree that there is no need for any other test for this, because it is perfect anyways.

I somehow solved the problem with another approach which is not related to statistics.

Thanks!

Karabiner

TS Contributor
Yes, kappa ist generally useless, logically faulty and often plainly silly,
except under certain circumstances in some experimental studies,
since it can only be meaningfully used if both marginals are fixed.

noetsi

No cake for spunky
Haha, you are right. Intra-rater reliability refers to measurements of one measurer on different time points. My results means that the measurer gave the same values for the same variable in both time points. This is possible since the measurement is easy and the variable is binary.

I agree that there is no need for any other test for this, because it is perfect anyways.

I somehow solved the problem with another approach which is not related to statistics.

Thanks!
When I worked with interrater reliability many years ago we were comparing if raters agree with each other. What you are doing makes more sense in that regard.

Omerikooo

New Member
Yes, kappa ist generally useless, logically faulty and often plainly silly,
except under certain circumstances in some experimental studies,
since it can only be meaningfully used if both marginals are fixed.

I agree. I had many problems using it.

What do you mean by marginals are fixed?

hlsmith

Less is more. Stay pure. Stay poor.
So by constant you mean both raters had perfect agreement within the sample? How big of a sample did you have? Is this issue related to not having a scale that can distinguish the states well enough?

If it can't work and you can't find a fix, just report a correlation coefficient with confidence intervals - that would convey to your audience the matching of values. Also, provide a dataframe of example data, so we know what you are working with. I would imagine some type of weighted accuracy value could be derived using a contingency table if you are working with discrete continuous values.

Omerikooo

New Member
So by constant you mean both raters had perfect agreement within the sample? How big of a sample did you have? Is this issue related to not having a scale that can distinguish the states well enough?

If it can't work and you can't find a fix, just report a correlation coefficient with confidence intervals - that would convey to your audience the matching of values. Also, provide a dataframe of example data, so we know what you are working with. I would imagine some type of weighted accuracy value could be derived using a contingency table if you are working with discrete continuous values.

Only one rater did two ratings in different time points.

Correlation coefficient wouldn't work in my case since I have binary variable with only 0 and 1.

If this was some kind of continuous variable I would use ICC (intra-class coefficient) and it would cause no problem.

In my example one of the column is just ones, this causes the problem.

An easy approach would be to present percent match only but I don't like that solution.

Omerikooo

New Member
Wow! Good source indeed. I failed to find any correlation that takes care of binary-binary cases but still thanks!

noetsi

No cake for spunky
Polychoric works well, or is recommended anyway, with Likert data.