scholarly journals fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search

Stats ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 327-347
Author(s):  
Francesca Torti ◽  
Aldo Corbellini ◽  
Anthony C. Atkinson

The forward search (FS) is a general method of robust data fitting that moves smoothly from very robust to maximum likelihood estimation. The regression procedures are included in the MATLAB toolbox FSDA. The work on a SAS version of the FS originates from the need for the analysis of large datasets expressed by law enforcement services operating in the European Union that use our SAS software for detecting data anomalies that may point to fraudulent customs returns. Specific to our SAS implementation, the fsdaSAS package, we describe the approximation used to provide fast analyses of large datasets using an FS which progresses through the inclusion of batches of observations, rather than progressing one observation at a time. We do, however, test for outliers one observation at a time. We demonstrate that our SAS implementation becomes appreciably faster than the MATLAB version as the sample size increases and is also able to analyse larger datasets. The series of fits provided by the FS leads to the adaptive data-dependent choice of maximally efficient robust estimates. This also allows the monitoring of residuals and parameter estimates for fits of differing robustness levels. We mention that our fsdaSAS also applies the idea of monitoring to several robust estimators for regression for a range of values of breakdown point or nominal efficiency, leading to adaptive values for these parameters. We have also provided a variety of plots linked through brushing. Further programmed analyses include the robust transformations of the response in regression. Our package also provides the SAS community with methods of monitoring robust estimators for multivariate data, including multivariate data transformations.

Entropy ◽  
2020 ◽  
Vol 22 (4) ◽  
pp. 399 ◽  
Author(s):  
Marco Riani ◽  
Anthony C. Atkinson ◽  
Aldo Corbellini ◽  
Domenico Perrotta

Minimum density power divergence estimation provides a general framework for robust statistics, depending on a parameter α , which determines the robustness properties of the method. The usual estimation method is numerical minimization of the power divergence. The paper considers the special case of linear regression. We developed an alternative estimation procedure using the methods of S-estimation. The rho function so obtained is proportional to one minus a suitably scaled normal density raised to the power α . We used the theory of S-estimation to determine the asymptotic efficiency and breakdown point for this new form of S-estimation. Two sets of comparisons were made. In one, S power divergence is compared with other S-estimators using four distinct rho functions. Plots of efficiency against breakdown point show that the properties of S power divergence are close to those of Tukey’s biweight. The second set of comparisons is between S power divergence estimation and numerical minimization. Monitoring these two procedures in terms of breakdown point shows that the numerical minimization yields a procedure with larger robust residuals and a lower empirical breakdown point, thus providing an estimate of α leading to more efficient parameter estimates.


2015 ◽  
Vol 67 (Code Snippet 1) ◽  
Author(s):  
Marco Riani ◽  
Domenico Perrotta ◽  
Andrea Cerioli

Methodology ◽  
2005 ◽  
Vol 1 (2) ◽  
pp. 81-85 ◽  
Author(s):  
Stefan C. Schmukle ◽  
Jochen Hardt

Abstract. Incremental fit indices (IFIs) are regularly used when assessing the fit of structural equation models. IFIs are based on the comparison of the fit of a target model with that of a null model. For maximum-likelihood estimation, IFIs are usually computed by using the χ2 statistics of the maximum-likelihood fitting function (ML-χ2). However, LISREL recently changed the computation of IFIs. Since version 8.52, IFIs reported by LISREL are based on the χ2 statistics of the reweighted least squares fitting function (RLS-χ2). Although both functions lead to the same maximum-likelihood parameter estimates, the two χ2 statistics reach different values. Because these differences are especially large for null models, IFIs are affected in particular. Consequently, RLS-χ2 based IFIs in combination with conventional cut-off values explored for ML-χ2 based IFIs may lead to a wrong acceptance of models. We demonstrate this point by a confirmatory factor analysis in a sample of 2449 subjects.


Marketing ZFP ◽  
2019 ◽  
Vol 41 (4) ◽  
pp. 21-32
Author(s):  
Dirk Temme ◽  
Sarah Jensen

Missing values are ubiquitous in empirical marketing research. If missing data are not dealt with properly, this can lead to a loss of statistical power and distorted parameter estimates. While traditional approaches for handling missing data (e.g., listwise deletion) are still widely used, researchers can nowadays choose among various advanced techniques such as multiple imputation analysis or full-information maximum likelihood estimation. Due to the available software, using these modern missing data methods does not pose a major obstacle. Still, their application requires a sound understanding of the prerequisites and limitations of these methods as well as a deeper understanding of the processes that have led to missing values in an empirical study. This article is Part 1 and first introduces Rubin’s classical definition of missing data mechanisms and an alternative, variable-based taxonomy, which provides a graphical representation. Secondly, a selection of visualization tools available in different R packages for the description and exploration of missing data structures is presented.


2020 ◽  
Vol 196 ◽  
pp. 105777
Author(s):  
Jadson Jose Monteiro Oliveira ◽  
Robson Leonardo Ferreira Cordeiro

2021 ◽  
pp. 1-13
Author(s):  
Ahmed H. Youssef ◽  
Amr R. Kamel ◽  
Mohamed R. Abonazel

This paper proposed three robust estimators (M-estimation, S-estimation, and MM-estimation) for handling the problem of outlier values in seemingly unrelated regression equations (SURE) models. The SURE model is one of regression multivariate cases, which have especially assumption, i.e., correlation between errors on the multivariate linear models; by considering multiple regression equations that are linked by contemporaneously correlated disturbances. Moreover, the effects of outliers may permeate through the system of equations; the primary aim of SURE which is to achieve efficiency in estimation, but this is questionable. The goal of robust regression is to develop methods that are resistant to the possibility that one or several unknown outliers may occur anywhere in the data. In this paper, we study and compare the performance of robust estimations with the traditional non-robust (ordinary least squares and Zellner) estimations based on a real dataset of the Egyptian insurance market during the financial year from 1999 to 2018. In our study, we selected the three most important insurance companies in Egypt operating in the same field of insurance activity (personal and property insurance). The effect of some important indicators (exogenous variables) issued by insurance corporations on the net profit has been studied. The results showed that robust estimators greatly improved the efficiency of the SURE estimation, and the best robust estimation is MM-estimation. Moreover, the selected exogenous variables in our study have a significant effect on the net profit in the Egyptian insurance market.


Author(s):  
Noel Anderson ◽  
Benjamin E. Bagozzi ◽  
Ore Koren

Abstract This article provides an accessible introduction to the phenomenon of monotone likelihood in duration modeling of political events. Monotone likelihood arises when covariate values are monotonic when ordered according to failure time, causing parameter estimates to diverge toward infinity. Within political science duration model applications, this problem leads to misinterpretation, model misspecification and omitted variable biases, among other issues. Using a combination of mathematical exposition, Monte Carlo simulations and empirical applications, this article illustrates the advantages of Firth's penalized maximum-likelihood estimation in resolving the methodological complications underlying monotone likelihood. The results identify the conditions under which monotone likelihood is most acute and provide guidance for political scientists applying duration modeling techniques in their empirical research.


2018 ◽  
Vol 1 (1) ◽  
pp. 37
Author(s):  
Hasih Pratiwi ◽  
Yuliana Susanti ◽  
Sri Sulistijowati Handajani

Linear least-squares estimates can behave badly when the error distribution is not normal, particularly when the errors are heavy-tailed. One remedy is to remove influential observations from the least-squares fit. Another approach, robust regression, is to use a fitting criterion that is not as vulnerable as least squares to unusual data. The most common general method of robust regression is M-estimation. This class of estimators can be regarded as a generalization of maximum-likelihood estimation. In this paper we discuss robust regression model for corn production by using two popular estimators; i.e. Huber estimator and Tukey bisquare estimator.<br />Keywords : robust regression, M-estimation, Huber estimator, Tukey bisquare estimator


Sign in / Sign up

Export Citation Format

Share Document