Test-Retest Reliability Coefficient

herwitz

New Member
Hi Folks,

Hopefully this is a relatively easy question. I collected data on a measure at two time points, and I would like to run a test-retest reliability check. Is a Pearson correlation coefficient the one to use, or are there other thoughts on this?

hlsmith

Less is more. Stay pure. Stay poor.
Kappa test, perhaps. Depends on your purpose.

herwitz

New Member
Kappa test, perhaps. Depends on your purpose.
In this case, participants completed a questionnaire with ten items. The items are 5-point Likert ratings.

hlsmith

Less is more. Stay pure. Stay poor.
Are you looking to examine change or reliability?

CB

Super Moderator
Are we talking about Cohen's kappa here? That's used mainly for qualitative (i.e. nominal) ratings. OP's data is at least ordinal, with 50 possible final scores. Kappa will count any difference between scores as a disagreement. E.g., if someone gets a score of 21 at time 1, and then 22 at time 2, this is recorded as a disagreement. Same for a score of 50 at time 1 and 0 at time 2. Ideally you'd prefer something here that takes into account the ordering of ratings.

You could do a weighted kappa, but probably the most conventional way to deal with this problem is to use a correlation (Spearman's or Pearson's, depending if you want to treat the data as ordinal or interval).

Veda

New Member
Are we talking about Cohen's kappa here? That's used mainly for qualitative (i.e. nominal) ratings. OP's data is at least ordinal, with 50 possible final scores. Kappa will count any difference between scores as a disagreement. E.g., if someone gets a score of 21 at time 1, and then 22 at time 2, this is recorded as a disagreement. Same for a score of 50 at time 1 and 0 at time 2. Ideally you'd prefer something here that takes into account the ordering of ratings.

You could do a weighted kappa, but probably the most conventional way to deal with this problem is to use a correlation (Spearman's or Pearson's, depending if you want to treat the data as ordinal or interval).
Hi,

I have a relevant question. I have twenty subjects to rate (Likert scale) 200 words twice. I would like to examine the test-retest reliability across twenty subjects. So, I obtained a correlation coefficient between the first-time rating and the second-time rating for each subject and tested whether these twenty correlation coefficients significantly deviated from zeros. Is it reasonable? Or, are there more appropriate ways to examine the test-retest reliability across subjects? Thanks!

Veda

hlsmith

Less is more. Stay pure. Stay poor.
Back to the original post, CB is correct that a traditional Kappa would be less than ideal, but a Cohen Kappa with linear weights is also a good approach.

Veda,

So did you create 20 correlations? If so perhaps something along the lines of Fleishman's (sp?) Multi-rater reliability may be a useful alternative.