Multivariate outlier detection in incomplete survey data: the epidemic algorithm and transformed rank correlations

Author(s):  
Cédric Béguin ◽  
Beat Hulliger
2016 ◽  
Vol 45 (1) ◽  
pp. 3-23 ◽  
Author(s):  
Marc Bill ◽  
Beat Hulliger

The distribution of multivariate quantitative survey data usually is not normal. Skewed and semi-continuous distributions occur often. In addition, missing values and non-response is common. All together this mix of problems makes multivariate outlier detection difficult. Examples of surveys where these problems occur are most business surveys and some household surveys like the Survey for the Statistics of Income and Living Condition (SILC) of the European Union. Several methods for multivariate outlier detection  are collected in the R-package modi. This paper gives an overview of modi and its functions for outlier detection and corresponding imputation. The use of the methods is explained with a business survey dataset. The discussion covers pre- and post-processing  to deal with skewness and zero-inflation, advantages and disadvantages of the methods and the choice of the parameters.


2021 ◽  
Vol 181 ◽  
pp. 1146-1153
Author(s):  
Pedro Aguiar ◽  
António Cunha ◽  
Matus Bakon ◽  
Antonio M. Ruiz-Armenteros ◽  
Joaquim J. Sousa

2021 ◽  
Vol 11 (1) ◽  
pp. 69-84
Author(s):  
G. S. David Sam Jayakumar ◽  
Bejoy John Thomas

2020 ◽  
Vol 52 (8) ◽  
pp. 1049-1066
Author(s):  
Peter Filzmoser ◽  
Mariella Gregorich

AbstractOutliers are encountered in all practical situations of data analysis, regardless of the discipline of application. However, the term outlier is not uniformly defined across all these fields since the differentiation between regular and irregular behaviour is naturally embedded in the subject area under consideration. Generalized approaches for outlier identification have to be modified to allow the diligent search for potential outliers. Therefore, an overview of different techniques for multivariate outlier detection is presented within the scope of selected kinds of data frequently found in the field of geosciences. In particular, three common types of data in geological studies are explored: spatial, compositional and flat data. All of these formats motivate new outlier concepts, such as local outlyingness, where the spatial information of the data is used to define a neighbourhood structure. Another type are compositional data, which nicely illustrate the fact that some kinds of data require not only adaptations to standard outlier approaches, but also transformations of the data itself before conducting the outlier search. Finally, the very recently developed concept of cellwise outlyingness, typically used for high-dimensional data, allows one to identify atypical cells in a data matrix. In practice, the different data formats can be mixed, and it is demonstrated in various examples how to proceed in such situations.


2020 ◽  
Vol 36 (4) ◽  
pp. 1272-1295
Author(s):  
Waldyn G. Martinez ◽  
Maria L. Weese ◽  
L. Allison Jones-Farmer

Sign in / Sign up

Export Citation Format

Share Document