Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance

2018 ◽  
Vol 74 ◽  
pp. 150-156 ◽  
Author(s):  
Christophe Leys ◽  
Olivier Klein ◽  
Yves Dominicy ◽  
Christophe Ley
2020 ◽  
pp. 084456212093205
Author(s):  
Maher M. El-Masri ◽  
Fabrice I. Mowbray ◽  
Susan M. Fox-Wasylyshyn ◽  
David Kanters

The presence of statistical outliers is a shared concern in research. If ignored or improperly handled, outliers have the potential to distort parameter estimates and possibly compromise the validity of research findings. The purpose of this paper is to provide a conceptual and practical overview of multivariate outliers with a focus on common techniques used to identify and manage multivariate outliers. Specifically, this paper discusses the use of Mahalanobis distance and residual statistics as common multivariate outlier identification techniques. It also discusses the use of leverage and Cook’s distance as two common techniques to determine the influence that multivariate outliers may have on statistical models. Finally, this paper discusses techniques that are commonly used to handle influential multivariate outlier cases.


Author(s):  
Hamid Ghorbani

While methods of detecting outliers is frequently implemented by statisticians when analyzing univariate data, identifying outliers in multivariate data pose challenges that univariate data do not. In this paper, after short reviewing some tools for univariate outliers detection, the Mahalanobis distance, as a famous multivariate statistical distances, and its ability to detect multivariate outliers are discussed. As an application the univariate and multivariate outliers of a real data set has been detected using R software environment for statistical computing.


2016 ◽  
Vol 34 (2) ◽  
Author(s):  
Peter Filzmoser

Three methods for the identification of multivariate outliers (Rousseeuw and Van Zomeren, 1990; Becker and Gather, 1999; Filzmoser et al., 2005) are compared. They are based on the Mahalanobis distance that will be made resistant against outliers and model deviations by robust estimation of location and covariance. The comparison is made by means of a simulation study. Not only the case of multivariate normally distributed data, but also heavy tailed and asymmetric distributions will be considered. The simulations are focused on low dimensional (p = 5) and high dimensional (p = 30) data.


2012 ◽  
Vol 57 (3) ◽  
pp. 829-835 ◽  
Author(s):  
Z. Głowacz ◽  
J. Kozik

The paper describes a procedure for automatic selection of symptoms accompanying the break in the synchronous motor armature winding coils. This procedure, called the feature selection, leads to choosing from a full set of features describing the problem, such a subset that would allow the best distinguishing between healthy and damaged states. As the features the spectra components amplitudes of the motor current signals were used. The full spectra of current signals are considered as the multidimensional feature spaces and their subspaces are tested. Particular subspaces are chosen with the aid of genetic algorithm and their goodness is tested using Mahalanobis distance measure. The algorithm searches for such a subspaces for which this distance is the greatest. The algorithm is very efficient and, as it was confirmed by research, leads to good results. The proposed technique is successfully applied in many other fields of science and technology, including medical diagnostics.


2016 ◽  
Vol 140 ◽  
pp. 213-233 ◽  
Author(s):  
Patrick Hamill ◽  
Marco Giordano ◽  
Carolyne Ward ◽  
David Giles ◽  
Brent Holben

Sign in / Sign up

Export Citation Format

Share Document