Multiple comparison test, with ambiguous labels/ranks

In a clinical study, I have patient's health condition labeled as grade 1, 2, or 3. Some patients are labeled as "Either 1 or 2" by the doctor, meaning the doctor considers both grade 1 and 2 as valid descriptions of the patient's health condition.

When performing say a Kruskal-Wallis test, I don't want to consider such patients as a separate group (in addition to the original 3 groups). What I think makes sense to do is to run the K-W Test multiple times, each time using either 1 or 2 as rank, get the p-value for each run, and among all these runs, use the one with the lowest p-value.

For example, if there is only patient record that is "Either 1 or 2", while all other patients are clearly labeled as 1,2, or 3, then I can run K-W Test twice, get two p-values, and report the one with lower p-value.

However, as I have 50 patients with such "ambiguous" label, I would need to run K-W Test 2^50 times, which would be very slow.

Any suggestions?


Less is more. Stay pure. Stay poor.
Making it 2.5 would mean the intervals of the ordinal data would not be equal, not sure if you could do this and use a generalized linear model.

Also, if you are going the regrouping route you may want to correct you level of significance and or select the larger p-value to sway on the side of caution.
Thanks for your reply.
Making it 2.5 would make sense when training a model, but my question is NOT about training a model, but about evaluating a model.
According to the doctor, predicting a patient with "Either 1 or 2" as "1" is considered a correct prediction.
According to the doctor, predicting a patient with "Either 1 or 2" as "2" is also considered a correct prediction.
Having separate labels "1.5" "2.5" during evaluation does NOT capture the doctor's intention. I totally agree though that it makes sense during training.
The trivial solution is to run many many evaluations, and in one evaluation consider a "1.5" prediction as "1", while in another evaluation consider a "1.5" as "2". But this approach does not work if I have many patient records with 1.5, as it would take 2^N evaluations.

Any suggestions?


Less is more. Stay pure. Stay poor.
Can you recode the values so 1 is 1 then 1 or 2 is 2, then 2 is 3, then 2 or 3 is 4, then 3 is 5, etc.?

Then interprete results accordingly.