Graphical Evaluation of the Ridge-Type Robust Regression Estimators in Mixture Experiments

In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.

Download Full-text

COMPARING THE PREDICTIVE PERFORMANCE OF OLS AND 7 ROBUST LINEAR REGRESSION ESTIMATORS ON A REAL AND SIMULATED DATASETS

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2021.v05i11.002 ◽

2021 ◽

Vol 5 (11) ◽

Author(s):

Sacha Varin

Keyword(s):

Linear Regression ◽

High Efficiency ◽

Robust Regression ◽

Mean Squared Error ◽

Predictive Performance ◽

Ordinary Least Squares ◽

Small Samples ◽

Robust Estimators ◽

Absolute Deviation ◽

M Estimator

Robust regression techniques are relevant tools for investigating data contaminated with influential observations. The article briefly reviews and describes 7 robust estimators for linear regression, including popular ones (Huber M, Tukey’s bisquare M, least absolute deviation also called L1 or median regression), some that combine high breakdown and high efficiency [fast MM (Modified M-estimator), fast ?-estimator and HBR (High breakdown rank-based)], and one to handle small samples (Distance-constrained maximum likelihood (DCML)). We include the fast MM and fast ?-estimators because we use the fast-robust bootstrap (FRB) for MM and ?-estimators. Our objective is to compare the predictive performance on a real data application using OLS (Ordinary least squares) and to propose alternatives by using 7 different robust estimations. We also run simulations under various combinations of 4 factors: sample sizes, percentage of outliers, percentage of leverage and number of covariates. The predictive performance is evaluated by crossvalidation and minimizing the mean squared error (MSE). We use the R language for data analysis. In the real dataset OLS provides the best prediction. DCML and popular robust estimators give good predictive results as well, especially the Huber M-estimator. In simulations involving 3 predictors and n=50, the results clearly favor fast MM, fast ?-estimator and HBR whatever the proportion of outliers. DCML and Tukey M are also good estimators when n=50, especially when the percentage of outliers is small (5% and 10%%). With 10 predictors, however, HBR, fast MM, fast ? and especially DCML give better results for n=50. HBR, fast MM and DCML provide better results for n=500. For n=5000 all the robust estimators give the same results independently of the percentage of outliers. If we vary the percentages of outliers and leverage points simultaneously, DCML, fast MM and HBR are good estimators for n=50 and p=3. For n=500, fast MM, fast ? and HBR provi

Download Full-text

Robust SURE estimates of profitability in the Egyptian insurance market

Statistical Journal of the IAOS ◽

10.3233/sji-200734 ◽

2021 ◽

pp. 1-13

Author(s):

Ahmed H. Youssef ◽

Amr R. Kamel ◽

Mohamed R. Abonazel

Keyword(s):

Robust Regression ◽

Ordinary Least Squares ◽

Seemingly Unrelated Regression ◽

Insurance Market ◽

Insurance Companies ◽

Regression Equations ◽

Robust Estimators ◽

Exogenous Variables ◽

Net Profit ◽

Mm Estimation

This paper proposed three robust estimators (M-estimation, S-estimation, and MM-estimation) for handling the problem of outlier values in seemingly unrelated regression equations (SURE) models. The SURE model is one of regression multivariate cases, which have especially assumption, i.e., correlation between errors on the multivariate linear models; by considering multiple regression equations that are linked by contemporaneously correlated disturbances. Moreover, the effects of outliers may permeate through the system of equations; the primary aim of SURE which is to achieve efficiency in estimation, but this is questionable. The goal of robust regression is to develop methods that are resistant to the possibility that one or several unknown outliers may occur anywhere in the data. In this paper, we study and compare the performance of robust estimations with the traditional non-robust (ordinary least squares and Zellner) estimations based on a real dataset of the Egyptian insurance market during the financial year from 1999 to 2018. In our study, we selected the three most important insurance companies in Egypt operating in the same field of insurance activity (personal and property insurance). The effect of some important indicators (exogenous variables) issued by insurance corporations on the net profit has been studied. The results showed that robust estimators greatly improved the efficiency of the SURE estimation, and the best robust estimation is MM-estimation. Moreover, the selected exogenous variables in our study have a significant effect on the net profit in the Egyptian insurance market.

Download Full-text

Performance analysis on least absolute shrinkage selection operator, elastic net and correlation adjusted elastic net regression methods

International Journal of Advanced Statistics and Probability ◽

10.14419/ijasp.v3i1.4364 ◽

2015 ◽

Vol 3 (1) ◽

pp. 93

Author(s):

Pascalis Kadaro Matthew ◽

Abubakar Yahaya

Keyword(s):

Linear Regression ◽

Prediction Accuracy ◽

Penalized Regression ◽

Ordinary Least Squares ◽

Complex Model ◽

Elastic Net ◽

Data Set ◽

Regression Methods ◽

Regression Techniques ◽

Selection Operator

<p>Some few decades ago, penalized regression techniques for linear regression have been developed specifically to reduce the flaws inherent in the prediction accuracy of the classical ordinary least squares (OLS) regression technique. In this paper, we used a diabetes data set obtained from previous literature to compare three of these well-known techniques, namely: Least Absolute Shrinkage Selection Operator (LASSO), Elastic Net and Correlation Adjusted Elastic Net (CAEN). After thorough analysis, it was observed that CAEN generated a less complex model.</p>

Download Full-text

THE IMPROVED BPNN-NAR AND BPNN-NARMA MODELS ON MALAYSIAN AGGREGATE COST INDICES WITH OUTLYING DATA

Jurnal Teknologi ◽

10.11113/jt.v78.10024 ◽

2016 ◽

Vol 78 (12-3) ◽

Author(s):

Saadi Ahmad Kamaruddin ◽

Nor Azura Md Ghani ◽

Norazan Mohamed Ramli

Keyword(s):

Neural Network ◽

Time Series ◽

Mean Squared Error ◽

Ordinary Least Squares ◽

Series Data ◽

Backpropagation Neural Network ◽

Data Set ◽

Squared Error ◽

The Impact ◽

Nonlinear Autoregressive

Neurocomputing have been adapted in time series forecasting arena, but the presence of outliers that usually occur in data time series may be harmful to the data network training. This is because the ability to automatically find out any patterns without prior assumptions and loss of generality. In theory, the most common training algorithm for Backpropagation algorithms leans on reducing ordinary least squares estimator (OLS) or more specifically, the mean squared error (MSE). However, this algorithm is not fully robust when outliers exist in training data, and it will lead to false forecast future value. Therefore, in this paper, we present a new algorithm that manipulate algorithms firefly on least median squares estimator (FFA-LMedS) for Backpropagation neural network nonlinear autoregressive (BPNN-NAR) and Backpropagation neural network nonlinear autoregressive moving (BPNN-NARMA) models to reduce the impact of outliers in time series data. The performances of the proposed enhanced models with comparison to the existing enhanced models using M-estimators, Iterative LMedS (ILMedS) and Particle Swarm Optimization on LMedS (PSO-LMedS) are done based on root mean squared error (RMSE) values which is the main highlight of this paper. In the meanwhile, the real-industrial monthly data of Malaysian Aggregate cost indices data set from January 1980 to December 2012 (base year 1980=100) with different degree of outliers problem is adapted in this research. At the end of this paper, it was found that the enhanced BPNN-NARMA models using M-estimators, ILMedS and FFA-LMedS performed very well with RMSE values almost zero errors. It is expected that the findings would assist the respected authorities involve in Malaysian construction projects to overcome cost overruns.

Download Full-text

Using Inquiry-Based Methods to Teach Spatial Statistics

Case Studies in the Environment ◽

10.1525/cse.2020.1223915 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Andrew J. Gregory ◽

Emma S. Spence

Keyword(s):

Spatial Statistics ◽

Spatial Regression ◽

Ordinary Least Squares ◽

Data Sets ◽

Least Squares Regression ◽

Data Set ◽

Higher Level Thinking ◽

Interesting Approach ◽

Regression Techniques ◽

Undergraduate Statistics

Spatial statistics and experimental design are among the most important topics students in the environmental and ecological sciences learn and utilize throughout their careers. These topics are also among the most difficult for students to learn, often due to the use of contrived data sets that present simplified and unrealistic scenarios that fail to engage students in higher level thinking. One way to engage students in higher level thinking is to use an inquiry-based pedagogical framework. The use of inquiry as a pedagogical approach should be instinctive for most scientists, as it mimics how science is conducted, yet most instructors continue to use lecture-based, textbook-driven instructional formats. This type of approach is efficient in covering material, but it suffers in its ability to engage students or enhance learning. Using a Bigfoot data set in an inquiry-based framework, students in a cross-listed graduate/undergraduate statistics class learned ordinary least squares regression and geographically weighted regression techniques. These techniques are among the most frequently applied analyses in the natural sciences. The use of a Bigfoot data set engaged students’ interest, rendering the prospect of learning regression topics as an emergent property of their interest and engagement. This approach also has an additional benefit in that students learned not only key statistical concepts but also learn how to self-diagnose deficiencies with their models as well as how to identify strategies to overcome these deficiencies. We hope that both instructors and students in graduate and undergraduate statistics or spatial modeling courses find this case study, and included data sets, a useful and interesting approach to teach and learn regression and spatial regression.

Download Full-text

Systematic Risk and Accounting Determinants: An Empirical Assessment in the Indian Stock Market

Organizations and Markets in Emerging Economies ◽

10.15388/omee.2019.10.16 ◽

2019 ◽

Vol 10 (2) ◽

pp. 310-334

Author(s):

Srikanth Parthasarathy

Keyword(s):

Stock Market ◽

Robust Regression ◽

Risk Measures ◽

Systematic Risk ◽

Ordinary Least Squares ◽

Cross Sectional ◽

Market Beta ◽

Regression Techniques ◽

Indian Stock Market ◽

High Degree

This study explores the contemporaneous association between market determined risk measures and accounting determined risk measures using the large liquid non-financial stocks in the Indian stock market in the recent 2012-2017 period. Two measures of systematic risk and seven accounting determined risk measures are chosen based on prior research. This study uses three regression techniques, namely Ordinary Least Squares (OLS), stepwise regression and robust regression, to identify the influential accounting variables for the systematic risk measured by market beta. The results evidence that there is a high degree of contemporaneous association between market determined and accounting determined risk measures, with nearly 30% of the cross sectional variance in systematic risk explained by accounting determined risk measures. The results suggest that the accounting variables can be used in the predictive models of future risk, leading to superior decision making at the level of individual decision maker.

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

A new vision on ordinary least squares estimation of parameters of linear model

10.1063/5.0066922 ◽

2021 ◽

Author(s):

K. Lakshmi ◽

B. Mahaboob ◽

D. Sateesh Kumar ◽

G. Balagi Prakash ◽

T. Nageswara Rao

Keyword(s):

Least Squares ◽

Linear Model ◽

Ordinary Least Squares ◽

Least Squares Estimation ◽

Estimation Of Parameters ◽

New Vision

Download Full-text

How local personal vote-earning attributes affect the aggregate party vote share: Evidence from the Belgian flexible-list PR system (2003–2014)

Politics ◽

10.1177/0263395718811969 ◽

2018 ◽

Vol 39 (4) ◽

pp. 464-479

Author(s):

Gert-Jan Put ◽

Jef Smulders ◽

Bart Maddens

Keyword(s):

Regression Models ◽

Empirical Test ◽

Original Data ◽

Ordinary Least Squares ◽

Vote Share ◽

District Level ◽

National Party ◽

Data Set ◽

Geographically Distributed ◽

Party Vote Shares

This article investigates the effect of candidates exhibiting local personal vote-earning attributes (PVEA) on the aggregate party vote share at the district level. Previous research has often assumed that packing ballot lists with localized candidates increases the aggregate party vote and seat shares. We present a strict empirical test of this argument by analysing the relative electoral swing of ballot lists at the district level, a measure of change in party vote shares which controls for the national party trend and previous party results in the district. The analysis is based on data of 7527 candidacies during six Belgian regional and federal election cycles between 2003 and 2014, which is aggregated to an original data set of 223 ballot lists. The ordinary least squares (OLS) regression models do not show a significant effect of candidates exhibiting local PVEA on relative electoral swing of ballot lists. However, the results suggest that ballot lists do benefit electorally if candidates with local PVEA are geographically distributed over different municipalities in the district.

Download Full-text

Data Set for the Reporting of Malignant Odontogenic Tumors: Explanations and Recommendations of the Guidelines From the International Collaboration on Cancer Reporting

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2018-0417-sa ◽

2018 ◽

Vol 143 (5) ◽

pp. 587-592 ◽

Cited By ~ 1

Author(s):

Pieter J. Slootweg ◽

Edward W. Odell ◽

Daniel Baumhoer ◽

Roman Carlos ◽

Keith D. Hunter ◽

...

Keyword(s):

International Collaboration ◽

Case Series ◽

Initial Step ◽

Odontogenic Tumors ◽

Data Set ◽

Expert Opinions ◽

Organ Systems ◽

Standardized Reporting ◽

Selection Of ◽

Cancer Reporting

A data set has been developed for the reporting of excisional biopsies and resection specimens for malignant odontogenic tumors by members of an expert panel working on behalf of the International Collaboration on Cancer Reporting, an international organization established to unify and standardize reporting of cancers. Odontogenic tumors are rare, which limits evidence-based support for designing a scientifically sound data set for reporting them. Thus, the selection of reportable elements within the data set and considering them as either core or noncore is principally based on evidence from malignancies affecting other organ systems, limited case series, expert opinions, and/or anecdotal reports. Nevertheless, this data set serves as the initial step toward standardized reporting on malignant odontogenic tumors that should evolve over time as more evidence becomes available and functions as a prompt for further research to provide such evidence.

Download Full-text