Multivariate outlier detection in incomplete survey data: the epidemic algorithm and transformed rank correlations

The distribution of multivariate quantitative survey data usually is not normal. Skewed and semi-continuous distributions occur often. In addition, missing values and non-response is common. All together this mix of problems makes multivariate outlier detection difficult. Examples of surveys where these problems occur are most business surveys and some household surveys like the Survey for the Statistics of Income and Living Condition (SILC) of the European Union. Several methods for multivariate outlier detection are collected in the R-package modi. This paper gives an overview of modi and its functions for outlier detection and corresponding imputation. The use of the methods is explained with a business survey dataset. The discussion covers pre- and post-processing to deal with skewness and zero-inflation, advantages and disadvantages of the methods and the choice of the parameters.

Download Full-text

Multivariate Outlier Detection in Postprocessing of Multi-temporal PS-InSAR Results using Deep Learning

Procedia Computer Science ◽

10.1016/j.procs.2021.01.326 ◽

2021 ◽

Vol 181 ◽

pp. 1146-1153

Author(s):

Pedro Aguiar ◽

António Cunha ◽

Matus Bakon ◽

Antonio M. Ruiz-Armenteros ◽

Joaquim J. Sousa

Keyword(s):

Deep Learning ◽

Outlier Detection ◽

Multivariate Outlier Detection ◽

Multi Temporal

Download Full-text

Multivariate Outlier Detection With High-Breakdown Estimators

Journal of the American Statistical Association ◽

10.1198/jasa.2009.tm09147 ◽

2010 ◽

Vol 105 (489) ◽

pp. 147-156 ◽

Cited By ~ 76

Author(s):

Andrea Cerioli

Keyword(s):

Outlier Detection ◽

Multivariate Outlier Detection

Download Full-text

A New Procedure of Clustering Based on Multivariate Outlier Detection

Journal of Data Science ◽

10.6339/jds.201301_11(1).0005 ◽

2021 ◽

Vol 11 (1) ◽

pp. 69-84

Author(s):

G. S. David Sam Jayakumar ◽

Bejoy John Thomas

Keyword(s):

Outlier Detection ◽

Multivariate Outlier Detection

Download Full-text

Multivariate Outlier Detection in Applied Data Analysis: Global, Local, Compositional and Cellwise Outliers

Mathematical Geosciences ◽

10.1007/s11004-020-09861-6 ◽

2020 ◽

Vol 52 (8) ◽

pp. 1049-1066

Author(s):

Peter Filzmoser ◽

Mariella Gregorich

Keyword(s):

Data Analysis ◽

Outlier Detection ◽

Spatial Information ◽

Compositional Data ◽

Subject Area ◽

Data Matrix ◽

Multivariate Outlier Detection ◽

Data Formats ◽

Atypical Cells ◽

The Subject

AbstractOutliers are encountered in all practical situations of data analysis, regardless of the discipline of application. However, the term outlier is not uniformly defined across all these fields since the differentiation between regular and irregular behaviour is naturally embedded in the subject area under consideration. Generalized approaches for outlier identification have to be modified to allow the diligent search for potential outliers. Therefore, an overview of different techniques for multivariate outlier detection is presented within the scope of selected kinds of data frequently found in the field of geosciences. In particular, three common types of data in geological studies are explored: spatial, compositional and flat data. All of these formats motivate new outlier concepts, such as local outlyingness, where the spatial information of the data is used to define a neighbourhood structure. Another type are compositional data, which nicely illustrate the fact that some kinds of data require not only adaptations to standard outlier approaches, but also transformations of the data itself before conducting the outlier search. Finally, the very recently developed concept of cellwise outlyingness, typically used for high-dimensional data, allows one to identify atypical cells in a data matrix. In practice, the different data formats can be mixed, and it is demonstrated in various examples how to proceed in such situations.

Download Full-text