Robust regression techniques are relevant tools
for investigating data contaminated with influential
observations. The article briefly reviews and describes 7
robust estimators for linear regression, including popular
ones (Huber M, Tukey’s bisquare M, least absolute
deviation also called L1 or median regression), some that
combine high breakdown and high efficiency [fast MM
(Modified M-estimator), fast ?-estimator and HBR (High
breakdown rank-based)], and one to handle small samples
(Distance-constrained maximum likelihood (DCML)). We
include the fast MM and fast ?-estimators because we use
the fast-robust bootstrap (FRB) for MM and ?-estimators.
Our objective is to compare the predictive performance on
a real data application using OLS (Ordinary least squares)
and to propose alternatives by using 7 different robust
estimations. We also run simulations under various
combinations of 4 factors: sample sizes, percentage of
outliers, percentage of leverage and number of covariates.
The predictive performance is evaluated by crossvalidation and minimizing the mean squared error (MSE).
We use the R language for data analysis. In the real
dataset OLS provides the best prediction. DCML and
popular robust estimators give good predictive results as
well, especially the Huber M-estimator.
In simulations involving 3 predictors and n=50, the results
clearly favor fast MM, fast ?-estimator and HBR whatever
the proportion of outliers. DCML and Tukey M are also
good estimators when n=50, especially when the
percentage of outliers is small (5% and 10%%). With 10
predictors, however, HBR, fast MM, fast ? and especially
DCML give better results for n=50. HBR, fast MM and
DCML provide better results for n=500. For n=5000 all the
robust estimators give the same results independently of
the percentage of outliers.
If we vary the percentages of outliers and leverage points
simultaneously, DCML, fast MM and HBR are good
estimators for n=50 and p=3. For n=500, fast MM, fast ?
and HBR provi