Ordinal paired evaluation - What test should we use?

#1
Dear all,

Instability in the knee after an anterior cruciate ligament (ACL) rupture is measured by the means of the Lachman test (displacement) and by the Pivot Shift test (rotation). Both tests are graded from 0 to III, being 0 completely stable and III completely unstable.

We used those tests to evaluate the pre and postoperative stability of patients who underwent ACL reconstruction surgery using two different surgical techniques.
So we have paired data (pre and postoperative), an ordinal evaluation scale and a factor that separates our sample in two groups (technique). What statistical test should we use?

And if the factor had more than two groups (imagine that we also wanted to compare results in patients with different race, for instance)?

And what if the preoperative stability was considered irrelevant? Meaning that we would only be comparing the postoperative results.

Thank you for your kind help.
 

Karabiner

TS Contributor
#2
So we have paired data (pre and postoperative), an ordinal evaluation scale and a factor that separates our sample in two groups (technique). What statistical test should we use?
Well, you did not clearly state your research questions.

Would it be sufficient to categorize each patient as improved/unchanged/worse and compare this between groups
using Chi² test? Or do you need something more sophisticated, or would this not deal with the research question?

How large is your sample size, by the way?

And what if the preoperative stability was considered irrelevant?
That would probably be a huge waste of statistical information. And you'd have still the problem of an ordinal scaled
dependent variable.

With kind regards

Karabiner
 
#3
Thanks a lot for your help, Karabiner
Would it be sufficient to categorize each patient as improved/unchanged/worse and compare this between groups
using Chi² test? Or do you need something more sophisticated, or would this not deal with the research question?
Well, as we have an ordinal scale with 4 levels, I guess that we would have many categories to be compared (improves one level, improves two levels, improves three levels, keeps the same and worsens one, two or three levels). It would be very difficult to say what proportion of results would be the best one. It would be difficult to say even with the improved/unchanged/worse thing. Is it better to have a 90% of improved and 8% of unchanged or 92% of improved and 2% of unchanged? It would be very subjective.

I thought about converting it to numbers (0-3) and perform a repeated measurements ANOVA, but I guess that it would not be correct to say (for instance) “the mean difference between the two techniques is 0,23 levels but it is not statistically significant (p=0.854)”. The step between categories might not be that straight forward.

I guess that this situation should be similar to dealing with Likert-scaled surveys. How do you deal with a situation where you have a “moderately satisfied” customer that, after some sort of change in the business goes into the category “extremely satisfied”. I mean, there should be a way to get a clear idea on what is actually the best result. When you have continous variables it is clear that the higher average is better (in general). Is there a similar option for categorical variables?

How large is your sample size, by the way?
We are still collecting data, but we are aiming for 300.

That would probably be a huge waste of statistical information. And you'd have still the problem of an ordinal scaled
dependent variable.
You are right and, also, I'd like to learn how to do it properly.

Thank you!!!!
 

Karabiner

TS Contributor
#4
Well, as we have an ordinal scale with 4 levels, I guess that we would have many categories to be compared (improves one level, improves two levels,
Therefore, I suggested 3 outcomes: improved, not improved, worse.
Whether this is sufficient, depends on theoretical and/or practical
context, and on the research questions.
Is it better to have a 90% of improved and 8% of unchanged or 92% of improved and 2% of unchanged? It would be very subjective.
You asked for a comparison between the 2 groups. The first thing
would be to look at whether they differ at all. How to interpret the
outcome patterns, would be the next step, IMO.
I thought about converting it to numbers (0-3)
If you take it seriously that "Grade 0 to III" is an ordinal scale,
then you cannot do that.
We are still collecting data, but we are aiming for 300.
So you could consider ordinal logistic regression of the follow-up
grade on group, additionally using the baseline grade as categorical
predictor. You could also add some other meaningful predictors
to the model.

With kind regards

Karabiner
 
#5
Again, thank you for your honest interest.

If you take it seriously that "Grade 0 to III" is an ordinal scale,
then you cannot do that.
I was pretty sure about that ;-)

So you could consider ordinal logistic regression of the follow-up
grade on group, additionally using the baseline grade as categorical
predictor. You could also add some other meaningful predictors
to the model.
I guess this could be a valid approach. Unfortunately, my statistical knowledge is very limited and I am not familiar with that kind of analysis. Nevertheless, I am sure that I will find out some info on the internet or in a book. I will try to learn more about ordinal regression and I will come back to you -if you don't mind- if I needed further help.

Thanks a lot for your kind help.
 
#6
Excuse me for asking you another question (Karabiner, if you could help I would be very thankful ;)). I have been reading a little bit about ordered logistic regression and, although I still don't understand it well, I have some doubts about its applicability in the case that I propose in this post.

If we were talking about continuous data, we would do a repeated measures ANOVA to obtain the mean difference between the preoperative and postoperative measurements of each group and the p-value that would indicate whether these differences are statistically significant. Then we could do pairwise comparisons.

Since we have an ordinal variable, let us suppose that we do the ordered logistic regression and use the preoperative value as the predictor variable. The problem I see is that, in a sufficiently large sample, the “mean” preoperative value should be the same in the two groups (or, at least, should not differ from a statistical point of view). So, does it make sense to make a model -of whatever type- using a variable as a predictor that should not be different between the different groups? It would be like trying to predict the probability of having lung cancer based on the height of two groups of 10,000 people. Since the mean heights between the two groups are likely to be the same, I don't think the model would have any validity.

I'm sorry if the question doesn't make sense, but my knowledge of statistics is quite limited, and I will appreciate your help to continue learning.

Thanks.
 

Karabiner

TS Contributor
#7
Since we have an ordinal variable, let us suppose that we do the ordered logistic regression and use the preoperative value as the predictor variable. The problem I see is that, in a sufficiently large sample, the “mean” preoperative value should be the same in the two groups (or, at least, should not differ from a statistical point of view).
Why do you think so? There was nowhere mentioned that allocation to groups (techniques) was randomized.
So pre-existing differences between the groups can be expected.
So, does it make sense to make a model -of whatever type- using a variable as a predictor that should not be different between the different groups?
In addition for accounting for pre-existing differences, using baseline values also decreases statistical noise and
therefore increases statistical power. If this was a randomized study, and sample size is large, then using only
the post-operative scores might perhaps be justified (but in case of a randomized study, a statistician would
already be involved into the study, and we would not need to discuss this here, or not?).

With kind regards

Karabiner