Yes, I agree there is not a standard definition and I will point out that just based on data distributions some values will be say two or three SDs out. I may hold off saying 3 sds is fairly common, since I only know my own field and because I think fields vary in defining what may be considered extreme - say they may want a 1/million or billion percent.
Agreed, if you are just looking for a value that is say the 99 percentile, well - you are going to find one. On the other hand, if you are looking for a value with a different data generating process or that is erroneous - these are different.
@Karabiner I was kind of thinking of the opposite scenario. Where faulty or different DGP derived observation, removed then makes two groups more comparable or not a different. But as eluded to, there are many definitions and rationales for sleuthing them out.