# Test-Retest Reliability Coefficient

#### herwitz

##### New Member
Hi Folks,

Hopefully this is a relatively easy question. I collected data on a measure at two time points, and I would like to run a test-retest reliability check. Is a Pearson correlation coefficient the one to use, or are there other thoughts on this?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Kappa test, perhaps. Depends on your purpose.

#### herwitz

##### New Member
Kappa test, perhaps. Depends on your purpose.
In this case, participants completed a questionnaire with ten items. The items are 5-point Likert ratings.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Are you looking to examine change or reliability?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You are looking at the kappa then.

#### CB

##### Super Moderator
Are we talking about Cohen's kappa here? That's used mainly for qualitative (i.e. nominal) ratings. OP's data is at least ordinal, with 50 possible final scores. Kappa will count any difference between scores as a disagreement. E.g., if someone gets a score of 21 at time 1, and then 22 at time 2, this is recorded as a disagreement. Same for a score of 50 at time 1 and 0 at time 2. Ideally you'd prefer something here that takes into account the ordering of ratings.

You could do a weighted kappa, but probably the most conventional way to deal with this problem is to use a correlation (Spearman's or Pearson's, depending if you want to treat the data as ordinal or interval).

#### Veda

##### New Member
Are we talking about Cohen's kappa here? That's used mainly for qualitative (i.e. nominal) ratings. OP's data is at least ordinal, with 50 possible final scores. Kappa will count any difference between scores as a disagreement. E.g., if someone gets a score of 21 at time 1, and then 22 at time 2, this is recorded as a disagreement. Same for a score of 50 at time 1 and 0 at time 2. Ideally you'd prefer something here that takes into account the ordering of ratings.

You could do a weighted kappa, but probably the most conventional way to deal with this problem is to use a correlation (Spearman's or Pearson's, depending if you want to treat the data as ordinal or interval).
Hi,

I have a relevant question. I have twenty subjects to rate (Likert scale) 200 words twice. I would like to examine the test-retest reliability across twenty subjects. So, I obtained a correlation coefficient between the first-time rating and the second-time rating for each subject and tested whether these twenty correlation coefficients significantly deviated from zeros. Is it reasonable? Or, are there more appropriate ways to examine the test-retest reliability across subjects? Thanks!

Veda

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Back to the original post, CB is correct that a traditional Kappa would be less than ideal, but a Cohen Kappa with linear weights is also a good approach.

Veda,

So did you create 20 correlations? If so perhaps something along the lines of Fleishman's (sp?) Multi-rater reliability may be a useful alternative.