scholarly journals Comparison of Robust Estimators’ Performance for Detecting Outliers in Multivariate Data

2021 ◽  
Vol 3 (2) ◽  
pp. 36-64
Author(s):  
Sharifah Sakinah Syed Abd Mutalib ◽  
Siti Zanariah Satari ◽  
Wan Nur Syahidah Wan Yusoff

In multivariate data, outliers are difficult to detect especially when the dimension of the data increase. Mahalanobis distance (MD) has been one of the classical methods to detect outliers for multivariate data. However, the classical mean and covariance matrix in MD suffered from masking and swamping effects if the data contain outliers. Due to this problem, many studies used a robust estimator instead of the classical estimator of mean and covariance matrix. In this study, the performance of five robust estimators namely Fast Minimum Covariance Determinant (FMCD), Minimum Vector Variance (MVV), Covariance Matrix Equality (CME), Index Set Equality (ISE), and Test on Covariance (TOC) are investigated and compared. FMCD has been widely used and is known as among the best robust estimator. However, there are certain conditions that FMCD still lacks. MVV, CME, ISE and TOC are innovative of FMCD. These four robust estimators improve the last step of the FMCD algorithm. Hence, the objective of this study is to observe the performance of these five estimator to detect outliers in multivariate data particularly TOC as TOC is the latest robust estimator. Simulation studies are conducted for two outlier scenarios with various conditions. There are three performance measures, which are pout, pmask and pswamp used to measure the performance of the robust estimators. It is found that the TOC gives better performance in pswamp for most conditions. TOC gives better results for pout and pmask for certain conditions.

2019 ◽  
Vol 7 (4) ◽  
pp. 465-497
Author(s):  
Yaoyuan V Tan ◽  
Carol A C Flannagan ◽  
Michael R Elliott

Abstract Examples of “doubly robust” estimators for missing data include augmented inverse probability weighting (AIPWT) and penalized splines of propensity prediction (PSPP). Doubly robust estimators have the property that, if either the response propensity or the mean is modeled correctly, a consistent estimator of the population mean is obtained. However, doubly robust estimators can perform poorly when modest misspecification is present in both models. Here we consider extensions of the AIPWT and PSPP that use Bayesian additive regression trees (BART) to provide highly robust propensity and mean model estimation. We term these “robust-squared” in the sense that the propensity score, the means, or both can be estimated with minimal model misspecification, and applied to the doubly robust estimator. We consider their behavior via simulations where propensities and/or mean models are misspecified. We apply our proposed method to impute missing instantaneous velocity (delta-v) values from the 2014 National Automotive Sampling System Crashworthiness Data System dataset and missing Blood Alcohol Concentration values from the 2015 Fatality Analysis Reporting System dataset. We found that BART, applied to PSPP and AIPWT, provides a more robust estimate compared with PSPP and AIPWT.


1987 ◽  
Vol 12 (4) ◽  
pp. 339-368 ◽  
Author(s):  
Howard Wainer ◽  
David Thissen

No model is ever a perfect reflection of the data it is to summarize. There are always errors of fit. This is as true with modern item response theory (IRT) as with all other models. It is important to know to what extent the accuracy of measurement made with these models is influenced by misfit and what can be done to minimize the inaccuracy. First, a detailed general model was fit to data to provide the framework for a realistic simulation structure. Then three of the most commonly used IRT models were fit in this simulation. A variety of robust estimators of ability were used and the accuracy and efficiency of each estimator was determined. With short tests, a simple model coupled with a robust estimator seemed to be the methodology of choice for describing the data. As test length increased, so too did the benefits of utilizing a more complex parameterization. An unexpected finding was that coupling robust estimators with a Bayesian prior yielded substantial shrinkage. Future work on ability estimation, especially for practical applications of adaptive testing, is required to “unshrink” ability estimates.


2017 ◽  
Vol 46 (3-4) ◽  
pp. 13-22 ◽  
Author(s):  
Alexander Dürre ◽  
Roland Fried ◽  
Daniel Vogel

We summarize properties of the spatial sign covariance matrix and especially consider the relationship between its eigenvalues and those of the shape matrix of an elliptical distribution. The explicit relationship known in the bivariate case was used to construct the spatial sign correlation coefficient, which is a non-parametric and robust estimator for the correlation coefficient within the elliptical model. We consider a multivariate generalization, which we call the multivariate spatial sign correlation matrix. A small simulation study indicates that the new estimator is very efficient under various elliptical distributions if the dimension is large. We furthermore derive its influence function under certain conditions which indicates that the multivariate spatial sign correlation becomes more sensitive to outliers as the dimension increases.


2020 ◽  
Author(s):  
Takuya Kawahara ◽  
Tomohiro Shinozaki ◽  
Yutaka Matsuyama

Abstract Background: In the presence of dependent censoring even after stratification of baseline covariates, the Kaplan–Meier estimator provides an inconsistent estimate of risk. To account for dependent censoring, time-varying covariates can be used along with two statistical methods: the inverse probability of censoring weighted (IPCW) Kaplan–Meier estimator and the parametric g-formula estimator. The consistency of the IPCW Kaplan–Meier estimator depends on the correctness of the model specification of censoring hazard, whereas that of the parametric g-formula estimator depends on the correctness of the models for event hazard and time-varying covariates. Methods: We combined the IPCW Kaplan–Meier estimator and the parametric g-formula estimator into a doubly robust estimator that can adjust for dependent censoring. The estimator is theoretically more robust to model misspecification than the IPCW Kaplan–Meier estimator and the parametric g-formula estimator. We conducted simulation studies with a time-varying covariate that affected both time-to-event and censoring under correct and incorrect models for censoring, event, and time-varying covariates. We applied our proposed estimator to a large clinical trial data with censoring before the end of follow-up. Results: Simulation studies demonstrated that our proposed estimator is doubly robust, namely it is consistent if either the model for the IPCW Kaplan–Meier estimator or the models for the parametric g-formula estimator, but not necessarily both, is correctly specified. Simulation studies and data application demonstrated that our estimator can be more efficient than the IPCW Kaplan–Meier estimator. Conclusions: The proposed estimator is useful for estimation of risk if censoring is affected by time-varying risk factors.


2005 ◽  
Vol 29 (2) ◽  
pp. 267-295 ◽  
Author(s):  
Saeid Habibi

A new method for state estimation, referred to as the Variable Structure Filter (VSF), has recently been proposed. The VSF is a model based predictor-corrector method. It uses an internal model to provide an initial estimate of the states and subsequently refines this initial estimate by a corrective term that is a function of the system output and the upper bound of uncertainties. As such, the VSF can explicitly cater for uncertainties in its internal model. In this paper, a conceptual discussion of the VSF strategy and its performance in terms of stability, accuracy, and convergence is provided. The impact of modeling uncertainties on the performance of the VSF is discussed and quantified. The analysis is augmented by comparative simulation studies to further illustrate the concept.


Sign in / Sign up

Export Citation Format

Share Document