Investigation outliers, could data mining be a solution?


New Member
Hi everyone, This is my first post. :wave:

I am currently working on a project to investigate the reason for why these providers are outliers on the dataset on a whole. I am finding it very tricky to find statistical methods that "investigate" outliers- as in show you why they are outliers. (any info you can send about this would be helpful- methods that i could add to a data mining model :shakehead)

I have found the outliers comparing the data on a whole by using a box plot on SPSS.

My first question is, which I think I know the answer to (yes) shall I be splitting the dataset up into groups and looking athe outliers there instead and investigating those outliers?
As in the dataset there lies confounding variables such as provider size, the region they are based, and their subject combinations?

What do you guys think?



Less is more. Stay pure. Stay poor.
Describing your topic (content) might help us understand your concerns better. Have you examined residual's influence and leverage yet?

Your model also seems like a good candidate for mixed models (hierarchical modeling), is this what you are using?