scholarly journals Detection of Outliers in Multivariate Data: A Method Based on Clustering and Robust Estimators

Compstat ◽  
2002 ◽  
pp. 291-296 ◽  
Author(s):  
Carla M. Santos-Pereira ◽  
Ana M. Pires
1995 ◽  
Vol 52 (2) ◽  
pp. 295-307 ◽  
Author(s):  
C. Caroni ◽  
P. Prescott

2021 ◽  
Vol 3 (1) ◽  
pp. 1-15
Author(s):  
Sharifah Sakinah Syed Abd Mutalib ◽  
Siti Zanariah Satari ◽  
Wan Nur Syahidah Wan Yusoff

Data in practice are often of high dimension and multivariate in nature. Detection of outliers has been one of the problems in multivariate analysis. Detecting outliers in multivariate data is difficult and it is not sufficient by using only graphical inspection. In this paper, a nontechnical and brief outlier detection method for multivariate data which are projection pursuit method, methods based on robust distance and cluster analysis are reviewed. The strengths and weaknesses of each method are briefly discussed.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Saima Afzal ◽  
Ayesha Afzal ◽  
Muhammad Amin ◽  
Sehar Saleem ◽  
Nouman Ali ◽  
...  

Outlier detection is a challenging task especially when outliers are defined by rare combinations of multiple variables. In this paper, we develop and evaluate a new method for the detection of outliers in multivariate data that relies on Principal Components Analysis (PCA) and three-sigma limits. The proposed approach employs PCA to effectively perform dimension reduction by regenerating variables, i.e., fitted points from the original observations. The observations lying outside the three-sigma limits are identified as the outliers. This proposed method has been successfully employed to two real life and several artificially generated datasets. The performance of the proposed method is compared with some of the existing methods using different performance evaluation criteria including the percentage of correct classification, precision, recall, and F-measure. The supremacy of the proposed method is confirmed by abovementioned criteria and datasets. The F-measure for the first real life dataset is the highest, i.e., 0.6667 for the proposed method and 0.3333 and 0.4000 for the two existing approaches. Similarly, for the second real dataset, this measure is 0.8000 for the proposed approach and 0.5263 and 0.6315 for the two existing approaches. It is also observed by the simulation experiments that the performance of the proposed approach got better with increasing sample size.


2021 ◽  
Vol 3 (2) ◽  
pp. 36-64
Author(s):  
Sharifah Sakinah Syed Abd Mutalib ◽  
Siti Zanariah Satari ◽  
Wan Nur Syahidah Wan Yusoff

In multivariate data, outliers are difficult to detect especially when the dimension of the data increase. Mahalanobis distance (MD) has been one of the classical methods to detect outliers for multivariate data. However, the classical mean and covariance matrix in MD suffered from masking and swamping effects if the data contain outliers. Due to this problem, many studies used a robust estimator instead of the classical estimator of mean and covariance matrix. In this study, the performance of five robust estimators namely Fast Minimum Covariance Determinant (FMCD), Minimum Vector Variance (MVV), Covariance Matrix Equality (CME), Index Set Equality (ISE), and Test on Covariance (TOC) are investigated and compared. FMCD has been widely used and is known as among the best robust estimator. However, there are certain conditions that FMCD still lacks. MVV, CME, ISE and TOC are innovative of FMCD. These four robust estimators improve the last step of the FMCD algorithm. Hence, the objective of this study is to observe the performance of these five estimator to detect outliers in multivariate data particularly TOC as TOC is the latest robust estimator. Simulation studies are conducted for two outlier scenarios with various conditions. There are three performance measures, which are pout, pmask and pswamp used to measure the performance of the robust estimators. It is found that the TOC gives better performance in pswamp for most conditions. TOC gives better results for pout and pmask for certain conditions.


Stats ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 327-347
Author(s):  
Francesca Torti ◽  
Aldo Corbellini ◽  
Anthony C. Atkinson

The forward search (FS) is a general method of robust data fitting that moves smoothly from very robust to maximum likelihood estimation. The regression procedures are included in the MATLAB toolbox FSDA. The work on a SAS version of the FS originates from the need for the analysis of large datasets expressed by law enforcement services operating in the European Union that use our SAS software for detecting data anomalies that may point to fraudulent customs returns. Specific to our SAS implementation, the fsdaSAS package, we describe the approximation used to provide fast analyses of large datasets using an FS which progresses through the inclusion of batches of observations, rather than progressing one observation at a time. We do, however, test for outliers one observation at a time. We demonstrate that our SAS implementation becomes appreciably faster than the MATLAB version as the sample size increases and is also able to analyse larger datasets. The series of fits provided by the FS leads to the adaptive data-dependent choice of maximally efficient robust estimates. This also allows the monitoring of residuals and parameter estimates for fits of differing robustness levels. We mention that our fsdaSAS also applies the idea of monitoring to several robust estimators for regression for a range of values of breakdown point or nominal efficiency, leading to adaptive values for these parameters. We have also provided a variety of plots linked through brushing. Further programmed analyses include the robust transformations of the response in regression. Our package also provides the SAS community with methods of monitoring robust estimators for multivariate data, including multivariate data transformations.


1968 ◽  
Author(s):  
Gerald H. Shure ◽  
Laurence I. Press ◽  
Miles S. Rogers

Sign in / Sign up

Export Citation Format

Share Document