scholarly journals COMPARING THE PREDICTIVE PERFORMANCE OF OLS AND 7 ROBUST LINEAR REGRESSION ESTIMATORS ON A REAL AND SIMULATED DATASETS

Author(s):  
Sacha Varin

Robust regression techniques are relevant tools for investigating data contaminated with influential observations. The article briefly reviews and describes 7 robust estimators for linear regression, including popular ones (Huber M, Tukey’s bisquare M, least absolute deviation also called L1 or median regression), some that combine high breakdown and high efficiency [fast MM (Modified M-estimator), fast ?-estimator and HBR (High breakdown rank-based)], and one to handle small samples (Distance-constrained maximum likelihood (DCML)). We include the fast MM and fast ?-estimators because we use the fast-robust bootstrap (FRB) for MM and ?-estimators. Our objective is to compare the predictive performance on a real data application using OLS (Ordinary least squares) and to propose alternatives by using 7 different robust estimations. We also run simulations under various combinations of 4 factors: sample sizes, percentage of outliers, percentage of leverage and number of covariates. The predictive performance is evaluated by crossvalidation and minimizing the mean squared error (MSE). We use the R language for data analysis. In the real dataset OLS provides the best prediction. DCML and popular robust estimators give good predictive results as well, especially the Huber M-estimator. In simulations involving 3 predictors and n=50, the results clearly favor fast MM, fast ?-estimator and HBR whatever the proportion of outliers. DCML and Tukey M are also good estimators when n=50, especially when the percentage of outliers is small (5% and 10%%). With 10 predictors, however, HBR, fast MM, fast ? and especially DCML give better results for n=50. HBR, fast MM and DCML provide better results for n=500. For n=5000 all the robust estimators give the same results independently of the percentage of outliers. If we vary the percentages of outliers and leverage points simultaneously, DCML, fast MM and HBR are good estimators for n=50 and p=3. For n=500, fast MM, fast ? and HBR provi

2014 ◽  
Vol 2014 ◽  
pp. 1-9
Author(s):  
Ali Erkoc ◽  
Esra Emiroglu ◽  
Kadri Ulas Akay

In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.


2021 ◽  
pp. 1-13
Author(s):  
Ahmed H. Youssef ◽  
Amr R. Kamel ◽  
Mohamed R. Abonazel

This paper proposed three robust estimators (M-estimation, S-estimation, and MM-estimation) for handling the problem of outlier values in seemingly unrelated regression equations (SURE) models. The SURE model is one of regression multivariate cases, which have especially assumption, i.e., correlation between errors on the multivariate linear models; by considering multiple regression equations that are linked by contemporaneously correlated disturbances. Moreover, the effects of outliers may permeate through the system of equations; the primary aim of SURE which is to achieve efficiency in estimation, but this is questionable. The goal of robust regression is to develop methods that are resistant to the possibility that one or several unknown outliers may occur anywhere in the data. In this paper, we study and compare the performance of robust estimations with the traditional non-robust (ordinary least squares and Zellner) estimations based on a real dataset of the Egyptian insurance market during the financial year from 1999 to 2018. In our study, we selected the three most important insurance companies in Egypt operating in the same field of insurance activity (personal and property insurance). The effect of some important indicators (exogenous variables) issued by insurance corporations on the net profit has been studied. The results showed that robust estimators greatly improved the efficiency of the SURE estimation, and the best robust estimation is MM-estimation. Moreover, the selected exogenous variables in our study have a significant effect on the net profit in the Egyptian insurance market.


2014 ◽  
Vol 71 (1) ◽  
Author(s):  
Bello Abdulkadir Rasheed ◽  
Robiah Adnan ◽  
Seyed Ehsan Saffari ◽  
Kafi Dano Pati

In a linear regression model, the ordinary least squares (OLS) method is considered the best method to estimate the regression parameters if the assumptions are met. However, if the data does not satisfy the underlying assumptions, the results will be misleading. The violation for the assumption of constant variance in the least squares regression is caused by the presence of outliers and heteroscedasticity in the data. This assumption of constant variance (homoscedasticity) is very important in linear regression in which the least squares estimators enjoy the property of minimum variance. Therefor e robust regression method is required to handle the problem of outlier in the data. However, this research will use the weighted least square techniques to estimate the parameter of regression coefficients when the assumption of error variance is violated in the data. Estimation of WLS is the same as carrying out the OLS in a transformed variables procedure. The WLS can easily be affected by outliers. To remedy this, We have suggested a strong technique for the estimation of regression parameters in the existence of heteroscedasticity and outliers. Here we apply the robust regression of M-estimation using iterative reweighted least squares (IRWLS) of Huber and Tukey Bisquare function and resistance regression estimator of least trimmed squares to estimating the model parameters of state-wide crime of united states in 1993. The outcomes from the study indicate the estimators obtained from the M-estimation techniques and the least trimmed method are more effective compared with those obtained from the OLS.


2020 ◽  
Vol 2020 ◽  
pp. 1-24
Author(s):  
Adewale F. Lukman ◽  
Kayode Ayinde ◽  
B. M. Golam Kibria ◽  
Segun L. Jegede

The general linear regression model has been one of the most frequently used models over the years, with the ordinary least squares estimator (OLS) used to estimate its parameter. The problems of the OLS estimator for linear regression analysis include that of multicollinearity and outliers, which lead to unfavourable results. This study proposed a two-parameter ridge-type modified M-estimator (RTMME) based on the M-estimator to deal with the combined problem resulting from multicollinearity and outliers. Through theoretical proofs, Monte Carlo simulation, and a numerical example, the proposed estimator outperforms the modified ridge-type estimator and some other considered existing estimators.


2014 ◽  
Vol 2014 ◽  
pp. 1-7
Author(s):  
Guikai Hu ◽  
Qingguo Li ◽  
Shenghua Yu

Under a balanced loss function, we derive the explicit formulae of the risk of the Stein-rule (SR) estimator, the positive-part Stein-rule (PSR) estimator, the feasible minimum mean squared error (FMMSE) estimator, and the adjusted feasible minimum mean squared error (AFMMSE) estimator in a linear regression model with multivariateterrors. The results show that the PSR estimator dominates the SR estimator under the balanced loss and multivariateterrors. Also, our numerical results show that these estimators dominate the ordinary least squares (OLS) estimator when the weight of precision of estimation is larger than about half, and vice versa. Furthermore, the AFMMSE estimator dominates the PSR estimator in certain occasions.


Author(s):  
Warha, Abdulhamid Audu ◽  
Yusuf Abbakar Muhammad ◽  
Akeyede, Imam

Linear regression is the measure of relationship between two or more variables known as dependent and independent variables. Classical least squares method for estimating regression models consist of minimising the sum of the squared residuals. Among the assumptions of Ordinary least squares method (OLS) is that there is no correlations (multicollinearity) between the independent variables. Violation of this assumptions arises most often in regression analysis and can lead to inefficiency of the least square method. This study, therefore, determined the efficient estimator between Least Absolute Deviation (LAD) and Weighted Least Square (WLS) in multiple linear regression models at different levels of multicollinearity in the explanatory variables. Simulation techniques were conducted using R Statistical software, to investigate the performance of the two estimators under violation of assumptions of lack of multicollinearity. Their performances were compared at different sample sizes. Finite properties of estimators’ criteria namely, mean absolute error, absolute bias and mean squared error were used for comparing the methods. The best estimator was selected based on minimum value of these criteria at a specified level of multicollinearity and sample size. The results showed that, LAD was the best at different levels of multicollinearity and was recommended as alternative to OLS under this condition. The performances of the two estimators decreased when the levels of multicollinearity was increased.


2019 ◽  
Vol 48 (3) ◽  
pp. 181-186
Author(s):  
R. LI ◽  
F. LI ◽  
J. W. HUANG

In this paper, detailed comparisons are given between those estimators that can be derived from the principal component two-parameter estimator such as the ordinary least squares estimator, the principal components regression estimator, the ridge regression estimator, the Liu estimator, the r-k estimator and the r-d estimator by the prediction mean square error criterion. In addition, conditions for the superiority of the principal component two-parameter estimator over the others are obtained. Furthermore, a numerical example study is conducted to compare these estimators under the prediction mean squared error criterion.


2019 ◽  
Vol 8 (1) ◽  
pp. 81-92
Author(s):  
Dhea Kurnia Mubyarjati ◽  
Abdul Hoyyi ◽  
Hasbi Yasin

Multiple Linear Regression can be solved by using the Ordinary Least Squares (OLS). Some classic assumptions must be fulfilled namely normality, homoskedasticity, non-multicollinearity, and non-autocorrelation. However, violations of assumptions can occur due to outliers so the estimator obtained is biased and inefficient. In statistics, robust regression is one of method can be used to deal with outliers. Robust regression has several estimators, one of them is Scale estimator (S-estimator) used in this research. Case for this reasearch is fish production per district / city in Central Java in 2015-2016 which is influenced by the number of fishermen, number of vessels, number of trips, number of fishing units, and number of households / fishing companies. Approximate estimation with the Ordinary Least Squares occur in violation of the assumptions of normality, autocorrelation and homoskedasticity this occurs because there are outliers. Based on the t- test at 5% significance level can be concluded that several predictor variables there are the number of fishermen, the number of ships, the number of trips and the number of fishing units have a significant effect on the variables of fish production. The influence value of predictor variables to fish production is 88,006% and MSE value is 7109,519. GUI Matlab is program for robust regression for S-estimator to make it easier for users to do calculations. Keywords: Ordinary Least Squares (OLS), Outliers, Robust Regression, Fish Production, GUI Matlab.


2014 ◽  
Vol 29 (4) ◽  
pp. 317-327 ◽  
Author(s):  
Annalisa Orenti ◽  
Ettore Marubini

The ordinary least squares (OLS) method is routinely used to estimate the unknown concentration of nucleic acids in a given solution by means of calibration. However, when outliers are present it could appear sensible to resort to robust regression methods. We analyzed data from an External Quality Control program concerning quantitative real-time PCR and we found that 24 laboratories out of 40 presented outliers, which occurred most frequently at the lowest concentrations. In this article we investigated and compared the performance of the OLS method, the least absolute deviation (LAD) method, and the biweight MM-estimator in real-time PCR calibration via a Monte Carlo simulation. Outliers were introduced by replacement contamination. When contamination was absent the coverages of OLS and MM-estimator intervals were acceptable and their widths small, whereas LAD intervals had acceptable coverages at the expense of higher widths. In the presence of contamination we observed a trade-off between width and coverage: the OLS performance got worse, the MM-estimator intervals widths remained short (but this was associated with a reduction in coverages), while LAD intervals widths were constantly larger with acceptable coverages at the nominal level.


2002 ◽  
Vol 18 (5) ◽  
pp. 1086-1098 ◽  
Author(s):  
Akio Namba

In this paper, we consider a linear regression model when relevant regressors are omitted. We derive the explicit formulae for the predictive mean squared errors (PMSEs) of the Stein-rule (SR) estimator, the positive-part Stein-rule (PSR) estimator, the minimum mean squared error (MMSE) estimator, and the adjusted minimum mean squared error (AMMSE) estimator. It is shown analytically that the PSR estimator dominates the SR estimator in terms of PMSE even when there are omitted relevant regressors. Also, our numerical results show that the PSR estimator and the AMMSE estimator have much smaller PMSEs than the ordinary least squares estimator even when the relevant regressors are omitted.


Sign in / Sign up

Export Citation Format

Share Document