Lumping categories of a categorical variable to make N sufficient

The issue is something like:

6 surgeons, A through F, carried out a surgery and infused milliliters of Y fluid. Whether the milliliters of fluid infused is signficantly different by the performing surgeon needs to be found out. A did 100, B did 21, but C, D, E and F did 9, 9, 2 and 2 surgeries only, respectively. Can I compare A, B and the MEANS of C,D,E and F and test whether A, B and the mean of the rest was signficantly different?

PS- This is part of a multiple linear regression. There are other variables for the dependent variable Y (fluid mls).



Omega Contributor
Can you rationally say they are all the same, there is not a reason why they performed fewer (systematic bias)? How about experience, could one of these surgeons usually perform dozens, but was on vacation - so now may get group with the more inexperienced doctors.

You can possibly do it, but you may want to ensure they are all comparable on other variables of interest that may cause effect modification, and testing if they are comparable may be difficult since these statistics will be underpowered.

Also, if you do this (merger), can you defend your strategy to a reviewer?
Thanks for the thoughts. Yes, this is related to experience. The inexperienced would be inefficient in their use of the dependent variable (this isn't the exact example but a close analogy). Oh, and if you mean those who performed fewer surgeries maybe the inexperienced ones- I'm not sure- they may simply be visiting surgeons at this hospital and may be performing elsewhere.

As for defending the merger- I don't know what else to say other than it would increase my...well...N.
Last edited: