Test-Retest Reliability Coefficient

#1
Hi Folks,

Hopefully this is a relatively easy question. I collected data on a measure at two time points, and I would like to run a test-retest reliability check. Is a Pearson correlation coefficient the one to use, or are there other thoughts on this?

Many thanks for your time!
 

CB

Super Moderator
#7
Are we talking about Cohen's kappa here? That's used mainly for qualitative (i.e. nominal) ratings. OP's data is at least ordinal, with 50 possible final scores. Kappa will count any difference between scores as a disagreement. E.g., if someone gets a score of 21 at time 1, and then 22 at time 2, this is recorded as a disagreement. Same for a score of 50 at time 1 and 0 at time 2. Ideally you'd prefer something here that takes into account the ordering of ratings.

You could do a weighted kappa, but probably the most conventional way to deal with this problem is to use a correlation (Spearman's or Pearson's, depending if you want to treat the data as ordinal or interval).
 

Veda

New Member
#8
Are we talking about Cohen's kappa here? That's used mainly for qualitative (i.e. nominal) ratings. OP's data is at least ordinal, with 50 possible final scores. Kappa will count any difference between scores as a disagreement. E.g., if someone gets a score of 21 at time 1, and then 22 at time 2, this is recorded as a disagreement. Same for a score of 50 at time 1 and 0 at time 2. Ideally you'd prefer something here that takes into account the ordering of ratings.

You could do a weighted kappa, but probably the most conventional way to deal with this problem is to use a correlation (Spearman's or Pearson's, depending if you want to treat the data as ordinal or interval).
Hi,

I have a relevant question. I have twenty subjects to rate (Likert scale) 200 words twice. I would like to examine the test-retest reliability across twenty subjects. So, I obtained a correlation coefficient between the first-time rating and the second-time rating for each subject and tested whether these twenty correlation coefficients significantly deviated from zeros. Is it reasonable? Or, are there more appropriate ways to examine the test-retest reliability across subjects? Thanks!

Veda
 

hlsmith

Less is more. Stay pure. Stay poor.
#9
Back to the original post, CB is correct that a traditional Kappa would be less than ideal, but a Cohen Kappa with linear weights is also a good approach.

Veda,

So did you create 20 correlations? If so perhaps something along the lines of Fleishman's (sp?) Multi-rater reliability may be a useful alternative.