Which conditions are being violated?
Hello.
There are several robust regression methods like LAR-(aka LAV-, LAD-, L1-Norm-)Regression, Quantil-Regression, M-Estimator, ... They are assumed to be especially appropriate for data, that does not fulfill the 5 OLS conditions.
The major part of the robust regression literature (I read) argues abstractly with the breakdownpoint which robust estimator should generally be preferred.
The other part of the robust regression literature (I read) argues, the best robust estimator depends from the next best comparable theoretical distribution. E.g. will the LAR-estimator most probably be the best robust estimator at approximate Laplace distribution (although it has a worse breakdown point than quantile-estimator/ M-estimator).
Question:
How do I choose the best robust regression model from multiple robust estimators for data, that does (graphically obviously) not fullfill the OLS-conditions?
Last edited by consuli; 02-26-2017 at 07:26 AM.
Prediction is very difficult, especially about the future. (Niels Bohr)
Which conditions are being violated?
I don't have emotions and sometimes that makes me very sad.
hi,
maybe you could just use the generalized least squares with the appropriate variance structure? (package nlme in R)
regards
I already have parameter estimates from LAR-Regression, Quantile Regression. Of cause, further nlme parameter estimates may be interesting, too.
But my question is, what criteria shows me which robust regression model respectively its estimates is best. R^2 and correlation do not work on robust problems.
Prediction is very difficult, especially about the future. (Niels Bohr)
When someone asks about "the best" one start to think about best by what optimality criterion.But my question is, what criteria shows me which robust regression model respectively its estimates is best.
But what is the problem here? Do you simply need to switch distribution, like to gamma distribution or log-normal (skewed and heteroscedastisk)?
Good point Greta. OP, can we see what this data looks like or the residuals? Thanks.
Stop cowardice, ban guns!
Concluding from your answers. There is no generally accepted goodness of fit measure for robust regression problems, right? Even if your answer would be "no", this will answer my question for the short term.
Prediction is very difficult, especially about the future. (Niels Bohr)
To get this discussion a little bit more fact based, I have written a small R program, that calculates OLS, GLM(with Gamma), LAR and Quantil-Regression parameter estimates on two robust datasets from package robustbase. Further it calculates R^2, Pearson coorrelation, BIC and MSE.
With the same following results:Code:mse= function(y1, y2) { resid= y1 -y2 return(colSums(resid^2) /length(y1) ) } library("robustbase") library("robust") library("quantreg") str(get(data(pension))) str(get(data(salinity))) # Select robust dataframe df= get(data(pension))[ , c(2, 1)] # df= get(data(salinity))[ , c(2, 4)] plot( df[ , 1]~ df[ , 2], data = df, cex= .5, col = "blue", xlab = "predictor", ylab = "target") lmmod= lm(df[ , 1]~ df[ , 2], data= df) glmgammamod= glm(df[ , 1]~ df[ , 2], data= df, family= Gamma(link = "identity") ) lmrobmod= lmRob(df[ , 1]~ df[ , 2], data= df) rqmod= rq(df[ , 1]~ df[ , 2], data= df, tau= 0.5) lm= lmmod$coefficients glmgamma= glmgammamod$coefficients lmrob= lmrobmod$coefficients rq= rqmod$coefficients # Calc Estimates coefs= cbind(lm, glmgamma, lmrob, rq) predictors= matrix( ncol=2, c(rep(1, nrow(df)), df[ , 2]) ) est= predictors %*% coefs # Goodness of Fit cor(df[ , 1], est, method= "p") # Comparison with R^2 summary(lmmod)$r.squared mse(df[ , 1], est) BIC(lmmod) BIC(glmgammamod) # BIC(lmrobmod) BIC not available # BIC(rqmod) BIC not plausible # Bias Test mean(df[ , 1]) colMeans(est) # Coefficients coefs
R^2 and Pearson-Corelation are indifferent.
BIC is only available for OLS and GLM.
MSE always prefers OLS solution (which however is not plausible, as these are special datasets in favour for robust regression).
I have also testet on other robust datasets. Always the same inplausible results.
Prediction is very difficult, especially about the future. (Niels Bohr)
hlsmith (02-28-2017)
I am familiar that robust reg exists, but have not used it. I find it hard to believe that there aren't better resources for you. I will keep my eyes open in case a fortuitously stumble across some thing.
Could it be possible for you to simulate a dataset very close to your's, with know parameters and assumptions that test all the above approaches?
Stop cowardice, ban guns!
Any helpful links are highly appreciated.
I don't know, how to do that. It would be helpful, if (mathematical) guidance was provided how to simulate the datasets, especially how to specify the increasing variance and skewness in the datasets and following how to simulate them. If a clear mathematical concept is layed out, I am pretty confident I can program it in R.
Last edited by consuli; 03-01-2017 at 12:56 PM.
Prediction is very difficult, especially about the future. (Niels Bohr)
A win-win. Check out this link on robust and quantiles reg, which has an simulation example:
https://t.co/xlZpoeeLCX
Stop cowardice, ban guns!
Thanks for the robust regression article from regression pope Fox.
As far I could follow the article, it does neither say about a robust goodness of fit measure nor how to reproduce skew residuals (which would be necessary to reproduce the robust regression datasets with known parameters, as you suggested).
However, it solved another problem I had. :-D
Prediction is very difficult, especially about the future. (Niels Bohr)
Tweet |