In a clinical study, I have patient's health condition labeled as grade 1, 2, or 3. Some patients are labeled as "Either 1 or 2" by the doctor, meaning the doctor considers both grade 1 and 2 as valid descriptions of the patient's health condition.
When performing say a Kruskal-Wallis test, I don't want to consider such patients as a separate group (in addition to the original 3 groups). What I think makes sense to do is to run the K-W Test multiple times, each time using either 1 or 2 as rank, get the p-value for each run, and among all these runs, use the one with the lowest p-value.
For example, if there is only patient record that is "Either 1 or 2", while all other patients are clearly labeled as 1,2, or 3, then I can run K-W Test twice, get two p-values, and report the one with lower p-value.
However, as I have 50 patients with such "ambiguous" label, I would need to run K-W Test 2^50 times, which would be very slow.
Any suggestions?
Thanks
When performing say a Kruskal-Wallis test, I don't want to consider such patients as a separate group (in addition to the original 3 groups). What I think makes sense to do is to run the K-W Test multiple times, each time using either 1 or 2 as rank, get the p-value for each run, and among all these runs, use the one with the lowest p-value.
For example, if there is only patient record that is "Either 1 or 2", while all other patients are clearly labeled as 1,2, or 3, then I can run K-W Test twice, get two p-values, and report the one with lower p-value.
However, as I have 50 patients with such "ambiguous" label, I would need to run K-W Test 2^50 times, which would be very slow.
Any suggestions?
Thanks