Leverage analysis for groups of outliers

noetsi

Fortran must die
There are a wide range of statistics that deal with leverage of individual data points (that is their impact on the regression line such as DFBETA). In my case I normally have thousands of points so its unlikely a single point is going to move the line much. But small groups of points might.

So my question is, how do you determine (and is there anything like DFBETA or Cook's D] for a group of points that are outliers and might influence your analysis. And is there anything like robust regression that addresses the issue of multiple as compared to individual points.

hlsmith

Not a robit
If you have suspect values, you can just run the model with and without them to see if they "leverage" your results.

noetsi

Fortran must die
Thanks. Believe it or not no one in any book I read suggested that They all talk about statistics like Cook's d or DFBETA.

ondansetron

TS Contributor
Thanks. Believe it or not no one in any book I read suggested that They all talk about statistics like Cook's d or DFBETA.
This is essentially how the DF beta is calculated, though. The model is run (n) separate times using (n-1) observations on each run to calculate the estimates leaving out the ith observation on each calculation until all (n) observations have been separately left out. The coefficient estimates are then compared to the estimate where all observations are included and DFbeta is calculated. Unless you're referring to using it for the group, then I haven't seen that done (where the group is omitted and a group DFBeta is calculated, although it wouldn't surprise me if it has been done). I would imagine careful selection of the group is required. Be sure to fully investigate each observation and have justification for including it in a particular group to omit.

Last edited:

noetsi

Fortran must die
I was referring to using these for a group. I run analysis on thousands of points and in theory and practice its hard for me to imagine that one point would move the regressive line much when you run 5000 plus points. I have never seen DFBETA applied to a set of outliers as compared to one. I did not think it had that functionality.

ondansetron

TS Contributor
I was referring to using these for a group. I run analysis on thousands of points and in theory and practice its hard for me to imagine that one point would move the regressive line much when you run 5000 plus points. I have never seen DFBETA applied to a set of outliers as compared to one. I did not think it had that functionality.
I can't say I've seen it done, either. However, I wonder if there are some papers where people figured out a way to group the observations and then calculate a group leverage statistic. As you said, it might not have that functionality, though.

hlsmith

Not a robit
I was just thinking something like say a Windsor trim job on residual related values. Not sure if there is a standardized test, but then you could look at relative changes in estimates or maybe AICC, etc.