scholarly journals Identification of Multivariate Outliers: A Performance Study

2016 ◽  
Vol 34 (2) ◽  
Author(s):  
Peter Filzmoser

Three methods for the identification of multivariate outliers (Rousseeuw and Van Zomeren, 1990; Becker and Gather, 1999; Filzmoser et al., 2005) are compared. They are based on the Mahalanobis distance that will be made resistant against outliers and model deviations by robust estimation of location and covariance. The comparison is made by means of a simulation study. Not only the case of multivariate normally distributed data, but also heavy tailed and asymmetric distributions will be considered. The simulations are focused on low dimensional (p = 5) and high dimensional (p = 30) data.

2019 ◽  
Vol 21 (6) ◽  
pp. 1904-1919 ◽  
Author(s):  
Riccardo De Bin ◽  
Anne-Laure Boulesteix ◽  
Axel Benner ◽  
Natalia Becker ◽  
Willi Sauerbrei

Abstract Data integration, i.e. the use of different sources of information for data analysis, is becoming one of the most important topics in modern statistics. Especially in, but not limited to, biomedical applications, a relevant issue is the combination of low-dimensional (e.g. clinical data) and high-dimensional (e.g. molecular data such as gene expressions) data sources in a prediction model. Not only the different characteristics of the data, but also the complex correlation structure within and between the two data sources, pose challenging issues. In this paper, we investigate these issues via simulations, providing some useful insight into strategies to combine low- and high-dimensional data in a regression prediction model. In particular, we focus on the effect of the correlation structure on the results, while accounting for the influence of our specific choices in the design of the simulation study.


2020 ◽  
Vol 245 ◽  
pp. 06018
Author(s):  
Ursula Laa

In physics we often encounter high-dimensional data, in the form of multivariate measurements or of models with multiple free parameters. The information encoded is increasingly explored using machine learning, but is not typically explored visually. The barrier tends to be visualising beyond 3D, but systematic approaches for this exist in the statistics literature. I use examples from particle and astrophysics to show how we can use the “grand tour” for such multidimensional visualisations, for example to explore grouping in high dimension and for visual identification of multivariate outliers. I then discuss the idea of projection pursuit, i.e. searching the high-dimensional space for “interesting” low dimensional projections, and illustrate how we can detect complex associations between multiple parameters.


2006 ◽  
Vol 11 (1) ◽  
pp. 12-24 ◽  
Author(s):  
Alexander von Eye

At the level of manifest categorical variables, a large number of coefficients and models for the examination of rater agreement has been proposed and used. The most popular of these is Cohen's κ. In this article, a new coefficient, κ s , is proposed as an alternative measure of rater agreement. Both κ and κ s allow researchers to determine whether agreement in groups of two or more raters is significantly beyond chance. Stouffer's z is used to test the null hypothesis that κ s = 0. The coefficient κ s allows one, in addition to evaluating rater agreement in a fashion parallel to κ, to (1) examine subsets of cells in agreement tables, (2) examine cells that indicate disagreement, (3) consider alternative chance models, (4) take covariates into account, and (5) compare independent samples. Results from a simulation study are reported, which suggest that (a) the four measures of rater agreement, Cohen's κ, Brennan and Prediger's κ n , raw agreement, and κ s are sensitive to the same data characteristics when evaluating rater agreement and (b) both the z-statistic for Cohen's κ and Stouffer's z for κ s are unimodally and symmetrically distributed, but slightly heavy-tailed. Examples use data from verbal processing and applicant selection.


2020 ◽  
Vol 10 (5) ◽  
pp. 1797 ◽  
Author(s):  
Mera Kartika Delimayanti ◽  
Bedy Purnama ◽  
Ngoc Giang Nguyen ◽  
Mohammad Reza Faisal ◽  
Kunti Robiatul Mahmudah ◽  
...  

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 743
Author(s):  
Xi Liu ◽  
Shuhang Chen ◽  
Xiang Shen ◽  
Xiang Zhang ◽  
Yiwen Wang

Neural signal decoding is a critical technology in brain machine interface (BMI) to interpret movement intention from multi-neural activity collected from paralyzed patients. As a commonly-used decoding algorithm, the Kalman filter is often applied to derive the movement states from high-dimensional neural firing observation. However, its performance is limited and less effective for noisy nonlinear neural systems with high-dimensional measurements. In this paper, we propose a nonlinear maximum correntropy information filter, aiming at better state estimation in the filtering process for a noisy high-dimensional measurement system. We reconstruct the measurement model between the high-dimensional measurements and low-dimensional states using the neural network, and derive the state estimation using the correntropy criterion to cope with the non-Gaussian noise and eliminate large initial uncertainty. Moreover, analyses of convergence and robustness are given. The effectiveness of the proposed algorithm is evaluated by applying it on multiple segments of neural spiking data from two rats to interpret the movement states when the subjects perform a two-lever discrimination task. Our results demonstrate better and more robust state estimation performance when compared with other filters.


Author(s):  
Fumiya Akasaka ◽  
Kazuki Fujita ◽  
Yoshiki Shimomura

This paper proposes the PSS Business Case Map as a tool to support designers’ idea generation in PSS design. The map visualizes the similarities among PSS business cases in a two-dimensional diagram. To make the map, PSS business cases are first collected by conducting, for example, a literature survey. The collected business cases are then classified from multiple aspects that characterize each case such as its product type, service type, target customer, and so on. Based on the results of this classification, the similarities among the cases are calculated and visualized by using the Self-Organizing Map (SOM) technique. A SOM is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional) view from high-dimensional data. The visualization result is offered to designers in a form of a two-dimensional map, which is called the PSS Business Case Map. By using the map, designers can figure out the position of their current business and can acquire ideas for the servitization of their business.


Complexity ◽  
2003 ◽  
Vol 8 (4) ◽  
pp. 39-50 ◽  
Author(s):  
Stefan Häusler ◽  
Henry Markram ◽  
Wolfgang Maass

2021 ◽  
pp. 147387162110481
Author(s):  
Haijun Yu ◽  
Shengyang Li

Hyperspectral images (HSIs) have become increasingly prominent as they can maintain the subtle spectral differences of the imaged objects. Designing approaches and tools for analyzing HSIs presents a unique set of challenges due to their high-dimensional characteristics. An improved color visualization approach is proposed in this article to achieve communication between users and HSIs in the field of remote sensing. Under the real-time interactive control and color visualization, this approach can help users intuitively obtain the rich information hidden in original HSIs. Using the dimensionality reduction (DR) method based on band selection, high-dimensional HSIs are reduced to low-dimensional images. Through drop-down boxes, users can freely specify images that participate in the combination of RGB channels of the output image. Users can then interactively and independently set the fusion coefficient of each image within an interface based on concentric circles. At the same time, the output image will be calculated and visualized in real time, and the information it reflects will also be different. In this approach, channel combination and fusion coefficient setting are two independent processes, which allows users to interact more flexibly according to their needs. Furthermore, this approach is also applicable for interactive visualization of other types of multi-layer data.


Sign in / Sign up

Export Citation Format

Share Document