# Thread: How to choose best robust regression model?

1. ## How to choose best robust regression model?

Hello.

There are several robust regression methods like LAR-(aka LAV-, LAD-, L1-Norm-)Regression, Quantil-Regression, M-Estimator, ... They are assumed to be especially appropriate for data, that does not fulfill the 5 OLS conditions.

The major part of the robust regression literature (I read) argues abstractly with the breakdownpoint which robust estimator should generally be preferred.

The other part of the robust regression literature (I read) argues, the best robust estimator depends from the next best comparable theoretical distribution. E.g. will the LAR-estimator most probably be the best robust estimator at approximate Laplace distribution (although it has a worse breakdown point than quantile-estimator/ M-estimator).

Question:
How do I choose the best robust regression model from multiple robust estimators for data, that does (graphically obviously) not fullfill the OLS-conditions?

2. ## Re: How to choose best robust regression model?

Which conditions are being violated?

3. ## Re: How to choose best robust regression model?

Originally Posted by Dason
Which conditions are being violated?
In robust regression problems - especially in my one - the constant variance assumption is heavily violated in combination with skew residuals.

4. ## Re: How to choose best robust regression model?

hi,
maybe you could just use the generalized least squares with the appropriate variance structure? (package nlme in R)

regards

5. ## Re: How to choose best robust regression model?

Originally Posted by rogojel
hi,
maybe you could just use the generalized least squares with the appropriate variance structure? (package nlme in R)

regards
I already have parameter estimates from LAR-Regression, Quantile Regression. Of cause, further nlme parameter estimates may be interesting, too.

But my question is, what criteria shows me which robust regression model respectively its estimates is best. R^2 and correlation do not work on robust problems.

6. ## Re: How to choose best robust regression model?

Originally Posted by consuli
I already have parameter estimates from LAR-Regression, Quantile Regression. Of cause, further nlme parameter estimates may be interesting, too.

But my question is, what criteria shows me which robust regression model respectively its estimates is best. R^2 and correlation do not work on robust problems.
Hi,
I would use cross validation.

regards

7. ## Re: How to choose best robust regression model?

But my question is, what criteria shows me which robust regression model respectively its estimates is best.

But what is the problem here? Do you simply need to switch distribution, like to gamma distribution or log-normal (skewed and heteroscedastisk)?

8. ## Re: How to choose best robust regression model?

Good point Greta. OP, can we see what this data looks like or the residuals? Thanks.

9. ## Re: How to choose best robust regression model?

Concluding from your answers. There is no generally accepted goodness of fit measure for robust regression problems, right? Even if your answer would be "no", this will answer my question for the short term.

10. ## Re: How to choose best robust regression model?

Originally Posted by consuli
Concluding from your answers. There is no generally accepted goodness of fit measure for robust regression problems, right?
I think this is a fair statement even for "normal" OLS multiple regression.
I believe that using something like the RMSE measure with cross-validation is the least controversial way to pick a model.´

regards

11. ## Re: How to choose best robust regression model?

To get this discussion a little bit more fact based, I have written a small R program, that calculates OLS, GLM(with Gamma), LAR and Quantil-Regression parameter estimates on two robust datasets from package robustbase. Further it calculates R^2, Pearson coorrelation, BIC and MSE.

Code:
mse= function(y1, y2)  {
resid= y1 -y2
return(colSums(resid^2) /length(y1) )
}

library("robustbase")
library("robust")
library("quantreg")

str(get(data(pension)))
str(get(data(salinity)))

# Select robust dataframe
df= get(data(pension))[ , c(2, 1)]
# df= get(data(salinity))[ , c(2, 4)]

plot( df[ , 1]~ df[ , 2], data = df, cex= .5, col = "blue", xlab = "predictor", ylab = "target")

lmmod= lm(df[ , 1]~ df[ , 2], data= df)
glmgammamod= glm(df[ , 1]~ df[ , 2], data= df, family= Gamma(link = "identity") )
lmrobmod= lmRob(df[ , 1]~ df[ , 2], data= df)
rqmod= rq(df[ , 1]~ df[ , 2], data= df, tau= 0.5)

lm= lmmod\$coefficients
glmgamma= glmgammamod\$coefficients
lmrob= lmrobmod\$coefficients
rq= rqmod\$coefficients

# Calc Estimates
coefs= cbind(lm, glmgamma, lmrob, rq)
predictors= matrix( ncol=2, c(rep(1, nrow(df)), df[ , 2]) )
est= predictors %*% coefs

# Goodness of Fit

cor(df[ , 1], est, method= "p")
# Comparison with R^2
summary(lmmod)\$r.squared

mse(df[ , 1], est)

BIC(lmmod)
BIC(glmgammamod)
# BIC(lmrobmod) BIC not available
# BIC(rqmod) BIC not plausible

# Bias Test
mean(df[ , 1])
colMeans(est)

# Coefficients
coefs
With the same following results:
R^2 and Pearson-Corelation are indifferent.
BIC is only available for OLS and GLM.
MSE always prefers OLS solution (which however is not plausible, as these are special datasets in favour for robust regression).

I have also testet on other robust datasets. Always the same inplausible results.

12. ## The Following User Says Thank You to consuli For This Useful Post:

hlsmith (02-28-2017)

13. ## Re: How to choose best robust regression model?

I am familiar that robust reg exists, but have not used it. I find it hard to believe that there aren't better resources for you. I will keep my eyes open in case a fortuitously stumble across some thing.

Could it be possible for you to simulate a dataset very close to your's, with know parameters and assumptions that test all the above approaches?

14. ## Re: How to choose best robust regression model?

Originally Posted by hlsmith
I find it hard to believe that there aren't better resources for you.

Originally Posted by hlsmith
Could it be possible for you to simulate a dataset very close to your's, with know parameters and assumptions that test all the above approaches?
I don't know, how to do that. It would be helpful, if (mathematical) guidance was provided how to simulate the datasets, especially how to specify the increasing variance and skewness in the datasets and following how to simulate them. If a clear mathematical concept is layed out, I am pretty confident I can program it in R.

15. ## Re: How to choose best robust regression model?

A win-win. Check out this link on robust and quantiles reg, which has an simulation example:

https://t.co/xlZpoeeLCX

16. ## Re: How to choose best robust regression model?

Thanks for the robust regression article from regression pope Fox.

As far I could follow the article, it does neither say about a robust goodness of fit measure nor how to reproduce skew residuals (which would be necessary to reproduce the robust regression datasets with known parameters, as you suggested).

However, it solved another problem I had. :-D

Page 1 of 2 1 2 Last

 Tweet