Can I remove these outliers?

#1
Hi, is it acceptable if I remove the outliers with charges above 55k for this regression analysis? Or is there any other option to minimize their impact in the model?

Capture.PNG

Capture.PNG

Thank you
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
If they are real values, not erroneous, you need a very strong rational to exclude data - since now you are asking a different question. Side note, Flint Michigan water crisis could have been discovered if someone didn't trim extreme values.
 
Last edited:
#4
Try to figure out why they are occurring. Generally you can only remove outliers if they are data errors. To me it looks like you have two sets of data with the top data not really generated from the same process as the rest of the data.