detection of outliers
Recently Published Documents


TOTAL DOCUMENTS

183
(FIVE YEARS 43)

H-INDEX

19
(FIVE YEARS 3)

2022 ◽  
Vol 13 (1) ◽  
pp. 1-17
Author(s):  
Ankit Kumar ◽  
Abhishek Kumar ◽  
Ali Kashif Bashir ◽  
Mamoon Rashid ◽  
V. D. Ambeth Kumar ◽  
...  

Detection of outliers or anomalies is one of the vital issues in pattern-driven data mining. Outlier detection detects the inconsistent behavior of individual objects. It is an important sector in the data mining field with several different applications such as detecting credit card fraud, hacking discovery and discovering criminal activities. It is necessary to develop tools used to uncover the critical information established in the extensive data. This paper investigated a novel method for detecting cluster outliers in a multidimensional dataset, capable of identifying the clusters and outliers for datasets containing noise. The proposed method can detect the groups and outliers left by the clustering process, like instant irregular sets of clusters (C) and outliers (O), to boost the results. The results obtained after applying the algorithm to the dataset improved in terms of several parameters. For the comparative analysis, the accurate average value and the recall value parameters are computed. The accurate average value is 74.05% of the existing COID algorithm, and our proposed algorithm has 77.21%. The average recall value is 81.19% and 89.51% of the existing and proposed algorithm, which shows that the proposed work efficiency is better than the existing COID algorithm.


2021 ◽  
Vol 929 (1) ◽  
pp. 012022
Author(s):  
S A Imashev

Abstract The aim of this study is to present a method for detection of outliers in the time series of total intensity of geomagnetic field using Extended Isolation Forest algorithm. The method is consisted of three steps: 1) generation of additional features that take into account the regular daily variation and smooth behaviour of normal data, 2) detection of potential outliers based on ensemble of extended isolating trees and 3) subsequent refinement based on difference between the outlier and its replacement with interpolated value. Application of the method for detection of outliers in yearly time series of the total geomagnetic field at Ak-Suu and Kegety stations showed that the algorithm identifies both global and contextual outliers. Average classification metrics for the method are characterized as high and have the following values: precision 94.3%, recall 93.9% and F-score 94.5%, and probabilities of errors of the first and second kind are comparable to similar algorithms used for detection of outliers in magnetograms of different sampling rate.


Author(s):  
Chunyan Liu ◽  
Daniel Jurich ◽  
Carol Morrison ◽  
Irina Grabovsky

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Saima Afzal ◽  
Ayesha Afzal ◽  
Muhammad Amin ◽  
Sehar Saleem ◽  
Nouman Ali ◽  
...  

Outlier detection is a challenging task especially when outliers are defined by rare combinations of multiple variables. In this paper, we develop and evaluate a new method for the detection of outliers in multivariate data that relies on Principal Components Analysis (PCA) and three-sigma limits. The proposed approach employs PCA to effectively perform dimension reduction by regenerating variables, i.e., fitted points from the original observations. The observations lying outside the three-sigma limits are identified as the outliers. This proposed method has been successfully employed to two real life and several artificially generated datasets. The performance of the proposed method is compared with some of the existing methods using different performance evaluation criteria including the percentage of correct classification, precision, recall, and F-measure. The supremacy of the proposed method is confirmed by abovementioned criteria and datasets. The F-measure for the first real life dataset is the highest, i.e., 0.6667 for the proposed method and 0.3333 and 0.4000 for the two existing approaches. Similarly, for the second real dataset, this measure is 0.8000 for the proposed approach and 0.5263 and 0.6315 for the two existing approaches. It is also observed by the simulation experiments that the performance of the proposed approach got better with increasing sample size.


Author(s):  
Italo Epicoco ◽  
Catiuscia Melle ◽  
Massimo Cafaro ◽  
Marco Pulimeno

AbstractWe present afqn (Approximate Fast Qn), a novel algorithm for approximate computation of the Qn scale estimator in a streaming setting, in the sliding window model. It is well-known that computing the Qn estimator exactly may be too costly for some applications, and the problem is a fortiori exacerbated in the streaming setting, in which the time available to process incoming data stream items is short. In this paper we show how to efficiently and accurately approximate the Qn estimator. As an application, we show the use of afqn for fast detection of outliers in data streams. In particular, the outliers are detected in the sliding window model, with a simple check based on the Qn scale estimator. Extensive experimental results on synthetic and real datasets confirm the validity of our approach by showing up to three times faster updates per second. Our contributions are the following ones: (i) to the best of our knowledge, we present the first approximation algorithm for online computation of the Qn scale estimator in a streaming setting and in the sliding window model; (ii) we show how to take advantage of our UDDSketch algorithm for quantile estimation in order to quickly compute the Qn scale estimator; (iii) as an example of a possible application of the Qn scale estimator, we discuss how to detect outliers in an input data stream.


2021 ◽  
Vol 263 (3) ◽  
pp. 3833-3844
Author(s):  
Matthias Kreuzer ◽  
Alexander Schmidt ◽  
Walter Kellermann

In this paper, we address the challenging problem of detecting bearing faults from vibration signals. For this, several time- and frequency domain features have been proposed. However, these proposed features are usually evaluated on data originating from relatively simple scenarios and a significant performance loss can be observed if more realistic scenarios are considered. To overcome this, we introduce Mel Frequency Cepstral Coefficients (MFCCs) and features extracted from the Amplitude Modulation Spetrogram (AMS) as features for the detection of bearing faults. Both AMS and MFCCs were originally introduced in the context of audio signal processing but it is demonstrated that a significantly improved classification performance can be obtained using the proposed features. Furthermore, the data imbalance problem that is prevailing in the context of bearing fault detection, meaning that typically much more data from healthy bearings than from damaged bearings is available. Therefore, we propose to train a One-class SVM with data from healthy bearings only. Bearing faults are then classified by the detection of outliers. Our approach is evaluated with data measured in a highly challenging scenario comprising a state-of-the-art commuter railway engine which is supplied by an industrial power converter and attached to a gear and load.


2021 ◽  
Vol 3 (1) ◽  
pp. 1-15
Author(s):  
Sharifah Sakinah Syed Abd Mutalib ◽  
Siti Zanariah Satari ◽  
Wan Nur Syahidah Wan Yusoff

Data in practice are often of high dimension and multivariate in nature. Detection of outliers has been one of the problems in multivariate analysis. Detecting outliers in multivariate data is difficult and it is not sufficient by using only graphical inspection. In this paper, a nontechnical and brief outlier detection method for multivariate data which are projection pursuit method, methods based on robust distance and cluster analysis are reviewed. The strengths and weaknesses of each method are briefly discussed.


Author(s):  
Farid Zamani Che Rose ◽  
Mohd Tahir Ismail ◽  
Mohd Hanafi Tumin

Structural changes that occur due to outliers may reduce the accuracy of an estimated time series model, shifting the mean distribution and causing forecast failure. This study used general-to-specific approach to detect outliers via indicator saturation approach in the local level model framework. Focusing on impulse indicator saturation, performance recorded by the suggested approach was evaluated using Monte Carlo simulations. To tackle the issue of higher number of regressors compared to the number of observations, this research utilized the split-half approach algorithm. We found that the impulse indicator saturation performance relies heavily on the size of outlier, location of outlier and number of splits in the series examined. Detection of outliers using sequential and non-sequential algorithms is the most crucial issue in this study. The sequential searching algorithm was able to outperform the non-sequential searching algorithm in eliminating the non-significant indicators based on potency and gauge. The outliers captured using impulse indicator saturation in financial times stock exchange (FTSE) United States of America (USA) shariah index correspond to the financial crisis in 2008-2009.


Sign in / Sign up

Export Citation Format

Share Document