Multiple comparison test, with ambiguous labels/ranks

In a clinical study, I have patient's health condition labeled as grade 1, 2, or 3. Some patients are labeled as "Either 1 or 2" by the doctor, meaning the doctor considers both grade 1 and 2 as valid descriptions of the patient's health condition.

When performing say a Kruskal-Wallis test, I don't want to consider such patients as a separate group (in addition to the original 3 groups). What I think makes sense to do is to run the K-W Test multiple times, each time using either 1 or 2 as rank, get the p-value for each run, and among all these runs, use the one with the lowest p-value.

For example, if there is only patient record that is "Either 1 or 2", while all other patients are clearly labeled as 1,2, or 3, then I can run K-W Test twice, get two p-values, and report the one with lower p-value.

However, as I have 50 patients with such "ambiguous" label, I would need to run K-W Test 2^50 times, which would be very slow.

Any suggestions?


Not a robit
Making it 2.5 would mean the intervals of the ordinal data would not be equal, not sure if you could do this and use a generalized linear model.

Also, if you are going the regrouping route you may want to correct you level of significance and or select the larger p-value to sway on the side of caution.
Thanks for your reply.
Making it 2.5 would make sense when training a model, but my question is NOT about training a model, but about evaluating a model.
According to the doctor, predicting a patient with "Either 1 or 2" as "1" is considered a correct prediction.
According to the doctor, predicting a patient with "Either 1 or 2" as "2" is also considered a correct prediction.
Having separate labels "1.5" "2.5" during evaluation does NOT capture the doctor's intention. I totally agree though that it makes sense during training.
The trivial solution is to run many many evaluations, and in one evaluation consider a "1.5" prediction as "1", while in another evaluation consider a "1.5" as "2". But this approach does not work if I have many patient records with 1.5, as it would take 2^N evaluations.

Any suggestions?


Not a robit
Can you recode the values so 1 is 1 then 1 or 2 is 2, then 2 is 3, then 2 or 3 is 4, then 3 is 5, etc.?

Then interprete results accordingly.