Can I remove these outliers?

Hi, is it acceptable if I remove the outliers with charges above 55k for this regression analysis? Or is there any other option to minimize their impact in the model?



Thank you


Less is more. Stay pure. Stay poor.
If they are real values, not erroneous, you need a very strong rational to exclude data - since now you are asking a different question. Side note, Flint Michigan water crisis could have been discovered if someone didn't trim extreme values.
Last edited:


Fortran must die
Try to figure out why they are occurring. Generally you can only remove outliers if they are data errors. To me it looks like you have two sets of data with the top data not really generated from the same process as the rest of the data.