unsupervised outlier detection Latest Research Papers

Tax evasion risk management using a Hybrid Unsupervised Outlier Detection method

Expert Systems with Applications ◽

10.1016/j.eswa.2021.116409 ◽

2022 ◽

pp. 116409

Author(s):

Miloš Savić ◽

Jasna Atanasijević ◽

Dušan Jakovetić ◽

Nataša Krejić

Keyword(s):

Risk Management ◽

Outlier Detection ◽

Tax Evasion ◽

Detection Method ◽

Unsupervised Outlier Detection

Distance Based Joint Probability Density Estimation For Unsupervised Outlier Detection

10.1109/jeeit53412.2021.9634099 ◽

2021 ◽

Author(s):

Atiq ur Rehman ◽

Samir Brahim Belhaouari

Keyword(s):

Probability Density ◽

Outlier Detection ◽

Density Estimation ◽

Joint Probability ◽

Probability Density Estimation ◽

Joint Probability Density ◽

Unsupervised Outlier Detection

Unsupervised Outlier Detection: A Meta-Learning Algorithm Based on Feature Selection

Electronics ◽

10.3390/electronics10182236 ◽

2021 ◽

Vol 10 (18) ◽

pp. 2236

Author(s):

Vasilis Papastefanopoulos ◽

Pantelis Linardatos ◽

Sotiris Kotsiantis

Keyword(s):

Feature Selection ◽

Outlier Detection ◽

Learning Algorithm ◽

Ground Truth ◽

Detection Methods ◽

Selection Strategy ◽

Many Worlds ◽

Wide Range ◽

Meta Learning ◽

Unsupervised Outlier Detection

Outlier detection refers to the problem of the identification and, where appropriate, the elimination of anomalous observations from data. Such anomalous observations can emerge due to a variety of reasons, including human or mechanical errors, fraudulent behaviour as well as environmental or systematic changes, occurring either naturally or purposefully. The accurate and timely detection of deviant observations allows for the early identification of potentially extensive problems, such as fraud or system failures, before they escalate. Several unsupervised outlier detection methods have been developed; however, there is no single best algorithm or family of algorithms, as typically each relies on a measure of `outlierness’ such as density or distance, ignoring other measures. To add to that, in an unsupervised setting, the absence of ground-truth labels makes finding a single best algorithm an impossible feat even for a single given dataset. In this study, a new meta-learning algorithm for unsupervised outlier detection is introduced in order to mitigate this problem. The proposed algorithm, in a fully unsupervised manner, attempts not only to combine the best of many worlds from the existing techniques through ensemble voting but also mitigate any undesired shortcomings by employing an unsupervised feature selection strategy in order to identify the most informative algorithms for a given dataset. The proposed methodology was evaluated extensively through experimentation, where it was benchmarked and compared against a wide range of commonly-used techniques for outlier detection. Results obtained using a variety of widely accepted datasets demonstrated its usefulness and its state-of-the-art results as it topped the Friedman ranking test for both the area under receiver operating characteristic (ROC) curve and precision metrics when averaged over five independent trials.

Learning low-dimensional manifolds under the L0-norm constraint for unsupervised outlier detection

International Journal of Data Science and Analytics ◽

10.1007/s41060-021-00269-x ◽

2021 ◽

Author(s):

Yoshinao Ishii ◽

Satoshi Koide ◽

Keiichiro Hayakawa

Keyword(s):

Outlier Detection ◽

Data Matrix ◽

Detection Methods ◽

Detection Accuracy ◽

Error Matrix ◽

Nonlinear Features ◽

Norm Constraint ◽

Low Dimensional ◽

Artificial Datasets ◽

Unsupervised Outlier Detection

AbstractUnsupervised outlier detection without the need for clean data has attracted great attention because it is suitable for real-world problems as a result of its low data collection costs. Reconstruction-based methods are popular approaches for unsupervised outlier detection. These methods decompose a data matrix into low-dimensional manifolds and an error matrix. Then, samples with a large error are detected as outliers. To achieve high outlier detection accuracy, when data are corrupted by large noise, the detection method should have the following two properties: (1) it should be able to decompose the data under the L0-norm constraint on the error matrix and (2) it should be able to reflect the nonlinear features of the data in the manifolds. Despite significant efforts, no method with both of these properties exists. To address this issue, we propose a novel reconstruction-based method: “L0-norm constrained autoencoders (L0-AE).” L0-AE uses autoencoders to learn low-dimensional manifolds that capture the nonlinear features of the data and uses a novel optimization algorithm that can decompose the data under the L0-norm constraints on the error matrix. This novel L0-AE algorithm provably guarantees the convergence of the optimization if the autoencoder is trained appropriately. The experimental results show that L0-AE is more robust, accurate and stable than other unsupervised outlier detection methods not only for artificial datasets with corrupted samples but also artificial datasets with well-known outlier distributions and real datasets. Additionally, the results show that the accuracy of L0-AE is moderately stable to changes in the parameter of the constrained term, and for real datasets, L0-AE achieves higher accuracy than the baseline non-robustified method for most parameter values.

Unsupervised outlier detection in multidimensional data

Journal Of Big Data ◽

10.1186/s40537-021-00469-z ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Atiq ur Rehman ◽

Samir Brahim Belhaouari

Keyword(s):

State Of The Art ◽

Machine Learning Algorithms ◽

Multidimensional Data ◽

Statistical Techniques ◽

High Dimensions ◽

Comprehensive Performance ◽

Benchmark Datasets ◽

Detection Schemes ◽

Unsupervised Outlier Detection ◽

Better Than

AbstractDetection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use transformation of data to a unidimensional distance space to detect the outliers, so irrespective of the data’s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.

Point-Denoise: Unsupervised outlier detection for 3D point clouds enhancement

Multimedia Tools and Applications ◽

10.1007/s11042-021-10924-x ◽

2021 ◽

Author(s):

Yousra Regaya ◽

Fodil Fadli ◽

Abbes Amira

Keyword(s):

Outlier Detection ◽

Point Clouds ◽

3D Point Clouds ◽

Unsupervised Outlier Detection

Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441453 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-20

Author(s):

Georg Steinbuss ◽

Klemens Böhm

Keyword(s):

Outlier Detection ◽

Synthetic Data ◽

Real Data ◽

Detection Methods ◽

Quality Of Data ◽

Benchmark Data ◽

Core Idea ◽

Generic Process ◽

Unsupervised Outlier Detection

Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instances with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work, we propose a generic process for the generation of datasets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. We propose and describe a generic process for the benchmarking of unsupervised outlier detection, as sketched so far. We then describe three instantiations of this generic process that generate outliers with specific characteristics, like local outliers. To validate our process, we perform a benchmark with state-of-the-art detection methods and carry out experiments to study the quality of data reconstructed in this way. Next to showcasing the workflow, this confirms the usefulness of our proposed process. In particular, our process yields regular instances close to the ones from real data. Summing up, we propose and validate a new and practical process for the benchmarking of unsupervised outlier detection.

A Review on Outlier/Anomaly Detection in Time Series Data

ACM Computing Surveys ◽

10.1145/3444690 ◽

2021 ◽

Vol 54 (3) ◽

pp. 1-33

Author(s):

Ane Blázquez-García ◽

Angel Conde ◽

Usue Mori ◽

Jose A. Lozano

Keyword(s):

Time Series ◽

Outlier Detection ◽

Time Series Data ◽

State Of The Art ◽

Series Data ◽

Detection Techniques ◽

The Past ◽

Time Series Mining ◽

Detection Of Outliers ◽

Unsupervised Outlier Detection

Recent advances in technology have brought major breakthroughs in data collection, enabling a large amount of data to be gathered over time and thus generating time series. Mining this data has become an important task for researchers and practitioners in the past few years, including the detection of outliers or anomalies that may represent errors or events of interest. This review aims to provide a structured and comprehensive state-of-the-art on unsupervised outlier detection techniques in the context of time series. To this end, a taxonomy is presented based on the main aspects that characterize an outlier detection technique.

Unsupervised outlier detection in heavy-ion collisions

Physica Scripta ◽

10.1088/1402-4896/abf214 ◽

2021 ◽

Author(s):

Punnatat Thaprasop ◽

Kai Zhou ◽

Jan Steinheimer ◽

Christoph Herold

Keyword(s):

Outlier Detection ◽

Heavy Ion Collisions ◽

Heavy Ion ◽

Ion Collisions ◽

Unsupervised Outlier Detection

Unsupervised Outlier Detection in Multidimensional Data

10.21203/rs.3.rs-250665/v1 ◽

2021 ◽

Author(s):

Atiq Rehman ◽

Samir Brahim Belhaouari

Keyword(s):

State Of The Art ◽

Machine Learning Algorithms ◽

Multidimensional Data ◽

High Dimensions ◽

Comprehensive Performance ◽

Benchmark Datasets ◽

Distance Vector ◽

Detection Schemes ◽

Unsupervised Outlier Detection ◽

Better Than

Abstract Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use only a single dimensional distance vector to detect the outliers, so irrespective of the data’s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.

unsupervised outlier detection
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Tax evasion risk management using a Hybrid Unsupervised Outlier Detection method

Distance Based Joint Probability Density Estimation For Unsupervised Outlier Detection

Unsupervised Outlier Detection: A Meta-Learning Algorithm Based on Feature Selection

Learning low-dimensional manifolds under the L0-norm constraint for unsupervised outlier detection

Unsupervised outlier detection in multidimensional data

Point-Denoise: Unsupervised outlier detection for 3D point clouds enhancement

Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

A Review on Outlier/Anomaly Detection in Time Series Data

Unsupervised outlier detection in heavy-ion collisions

Unsupervised Outlier Detection in Multidimensional Data

Export Citation Format

unsupervised outlier detectionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Tax evasion risk management using a Hybrid Unsupervised Outlier Detection method

Distance Based Joint Probability Density Estimation For Unsupervised Outlier Detection

Unsupervised Outlier Detection: A Meta-Learning Algorithm Based on Feature Selection

Learning low-dimensional manifolds under the L0-norm constraint for unsupervised outlier detection

Unsupervised outlier detection in multidimensional data

Point-Denoise: Unsupervised outlier detection for 3D point clouds enhancement

Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

A Review on Outlier/Anomaly Detection in Time Series Data

Unsupervised outlier detection in heavy-ion collisions

Unsupervised Outlier Detection in Multidimensional Data

unsupervised outlier detection
Recently Published Documents