Help with statistical analysis for physio research


I am an MSc student in physiotherapy. For my research study, I am looking to see the agreement between a patient's questionnaire and a physiotherapist estimate of the same score. So basically, the patient complete a ten part questionnaire, all scored 0-10. The physiotherapist sees the patients, is blinded to the questionnaire score, and then completes their own estimates of the same/similarly worded questionnaire (i.e. their estimates for what the patient scored).

Therefore, there will be two scores for each patient per question. e.g. Question one, patient scored themselves at a 5, the physio scored them a 3. And so on...

There will be approximately 80 patients for the study. Approximately 10-15 physiotherapists, so about 5 patients each.

What I am unsure of, because similar studies have used different methods, is whether to use an agreement measure such as ICC/Bland-Altman (if this is continuous data?) or weighted kappa (if this is ordinal). Or whether I should simply use a correlation analysis, such as pearsons/spearmans, which some studies use. To me, an agreement measure makes more sense, as I want to know how close the physiotherapists are getting to the patient's score (which is the validated 'gold standard' measure). But I am not sure, so if anyone has any advice that'd be amazing. thank you


TS Contributor
There are a number of ways you could analyse this:
  • One approach would be to calculate the delta between the patient and the physiotherapist then perform a 1-way ANOVA (or ANOM depending on your null hypothesis) using the different physiotherapists as the grouping variable.
  • Or perform a repeated measures ANOVA using the physiotherapists as the between groups variable and the two scores as the within groups variable.
What did you mean by "similar studies have used different methods"? Similar studies to your study?


TS Contributor
While this isn't my field, I would not favor the correlation approach. All that would demonstrate is that the ratings presumably move in the same general direction. It would say nothing about the accuracy of the ratings. For that you would need an approach similar to the one I described previously, a Bland-Altman plot, or possibly Kendall's coefficient of concordance.


TS Contributor
Krippendorff's alpha would also be useful to look at systematic disagreement or agreement among the replies. It also has better desirable properties for investigating "agreement", but as @Miner said, some simpler methods like a Bland-Altman plot might be useful and even just looking at the median absolute difference and other quantiles of absolute differences between ratings to see how much disagreement exists. Also, simple scatter plot of the ratings and with the line of identity plotted for reference (slope of 1 through the origin) would show general patterns such as under-over assessment at higher, lower, or middle ratings, for example.

Here is a good (not the original) paper on Krippendorff's alpha (discusses drawbacks to Kappa and others and he discusses his implementation of Krippendorff's alpha in SAS, but first covers some basics of the agreement topic):

Klaus Krippendorff is the original author of the idea and has some papers as well.