Anomaly Pattern Detection in Streaming Data Based on the Transformation to Multiple Binary-Valued Data Streams

Abstract Anomaly pattern detection in a data stream aims to detect a time point where outliers begin to occur abnormally. Recently, a method for anomaly pattern detection has been proposed based on binary classification for outliers and statistical tests in the data stream of binary labels of normal or an outlier. It showed that an anomaly pattern can be detected accurately even when outlier detection performance is relatively low. However, since the anomaly pattern detection method is based on the binary classification for outliers, most well-known outlier detection methods, with the output of real-valued outlier scores, can not be used directly. In this paper, we propose an anomaly pattern detection method in a data stream using the transformation to multiple binary-valued data streams from real-valued outlier scores. By using three outlier detection methods, Isolation Forest(IF), Autoencoder-based outlier detection, and Local outlier factor(LOF), the proposed anomaly pattern detection method is tested using artificial and real data sets. The experimental results show that anomaly pattern detection using Isolation Forest gives the best performance.

Download Full-text

Detection of Power Contract Violations using an Anomaly Pattern Detection Method on Power Consumption Data Streams

Journal of KIISE ◽

10.5626/jok.2020.47.5.504 ◽

2020 ◽

Vol 47 (5) ◽

pp. 504-512

Author(s):

Tae Gong Kim ◽

Cheong Hee Park

Keyword(s):

Power Consumption ◽

Data Streams ◽

Detection Method ◽

Pattern Detection ◽

Consumption Data ◽

Anomaly Pattern

Download Full-text

TADILOF: Time Aware Density-Based Incremental Local Outlier Detection in Data Streams

Sensors ◽

10.3390/s20205829 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5829 ◽

Cited By ~ 1

Author(s):

Jen-Wei Huang ◽

Meng-Xun Zhong ◽

Bijay Prasad Jaysawal

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Data Stream ◽

State Of The Art ◽

Streaming Data ◽

Current State ◽

Data Points ◽

Local Outlier ◽

Time Aware ◽

Over Time

Outlier detection in data streams is crucial to successful data mining. However, this task is made increasingly difficult by the enormous growth in the quantity of data generated by the expansion of Internet of Things (IoT). Recent advances in outlier detection based on the density-based local outlier factor (LOF) algorithms do not consider variations in data that change over time. For example, there may appear a new cluster of data points over time in the data stream. Therefore, we present a novel algorithm for streaming data, referred to as time-aware density-based incremental local outlier detection (TADILOF) to overcome this issue. In addition, we have developed a means for estimating the LOF score, termed "approximate LOF," based on historical information following the removal of outdated data. The results of experiments demonstrate that TADILOF outperforms current state-of-the-art methods in terms of AUC while achieving similar performance in terms of execution time. Moreover, we present an application of the proposed scheme to the development of an air-quality monitoring system.

Download Full-text

Data Streams Oriented Outlier Detection Method: A Fast Minimal Infrequent Pattern Mining

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/6/14 ◽

2021 ◽

Author(s):

ZhongYu Zhou ◽

DeChang Pi

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Pattern Mining ◽

Detection Method ◽

Detection Algorithm ◽

Detection Methods ◽

Mining Method ◽

Telemetry Data ◽

Process Data ◽

Mining Data Streams

Outlier detection is a common method for analyzing data streams. In the existing outlier detection methods, most of methods compute distance of points to solve certain specific outlier detection problems. However, these methods are computationally expensive and cannot process data streams quickly. The outlier detection method based on pattern mining resolves the aforementioned issues, but the existing methods are inefficient and cannot meet requirements of quickly mining data streams. In order to improve the efficiency of the method, a new outlier detection method is proposed in this paper. First, a fast minimal infrequent pattern mining method is proposed to mine the minimal infrequent pattern from data streams. Second, an efficient outlier detection algorithm based on minimal infrequent pattern is proposed for detecting the outliers in the data streams by mining minimal infrequent pattern. The algorithm proposed in this paper is demonstrated by real telemetry data of a satellite in orbit. The experimental results show that the proposed method not only can be applied to satellite outlier detection, but also is superior to the existing methods.

Download Full-text

Anomalies Detection Using Isolation in Concept-Drifting Data Streams

Computers ◽

10.3390/computers10010013 ◽

2021 ◽

Vol 10 (1) ◽

pp. 13

Author(s):

Maurras Ulbricht Togbe ◽

Yousra Chabchoub ◽

Aliou Boly ◽

Mariam Barry ◽

Raja Chiky ◽

...

Keyword(s):

Anomaly Detection ◽

Half Space ◽

Data Streams ◽

Detection Efficiency ◽

Concept Drift ◽

Streaming Data ◽

Detection Methods ◽

Data Sets ◽

Stream Data ◽

Isolation Forest

Detecting anomalies in streaming data is an important issue for many application domains, such as cybersecurity, natural disasters, or bank frauds. Different approaches have been designed in order to detect anomalies: statistics-based, isolation-based, clustering-based, etc. In this paper, we present a structured survey of the existing anomaly detection methods for data streams with a deep view on Isolation Forest (iForest). We first provide an implementation of Isolation Forest Anomalies detection in Stream Data (IForestASD), a variant of iForest for data streams. This implementation is built on top of scikit-multiflow (River), which is an open source machine learning framework for data streams containing a single anomaly detection algorithm in data streams, called Streaming half-space trees. We performed experiments on different real and well known data sets in order to compare the performance of our implementation of IForestASD and half-space trees. Moreover, we extended the IForestASD algorithm to handle drifting data by proposing three algorithms that involve two main well known drift detection methods: ADWIN and KSWIN. ADWIN is an adaptive sliding window algorithm for detecting change in a data stream. KSWIN is a more recent method and it refers to the Kolmogorov–Smirnov Windowing method for concept drift detection. More precisely, we extended KSWIN to be able to deal with n-dimensional data streams. We validated and compared all of the proposed methods on both real and synthetic data sets. In particular, we evaluated the F1-score, the execution time, and the memory consumption. The experiments show that our extensions have lower resource consumption than the original version of IForestASD with a similar or better detection efficiency.

Download Full-text

Unsupervised Feature Selection for Outlier Detection on Streaming Data to Enhance Network Security

Applied Sciences ◽

10.3390/app112412073 ◽

2021 ◽

Vol 11 (24) ◽

pp. 12073

Author(s):

Michael Heigl ◽

Enrico Weigelt ◽

Dalibor Fiala ◽

Martin Schramm

Keyword(s):

Feature Selection ◽

Outlier Detection ◽

Data Streams ◽

State Of The Art ◽

Streaming Data ◽

Detection Methods ◽

Unsupervised Feature Selection ◽

Detection Algorithms ◽

Efficient Detection ◽

Selection For

Over the past couple of years, machine learning methods—especially the outlier detection ones—have anchored in the cybersecurity field to detect network-based anomalies rooted in novel attack patterns. However, the ubiquity of massive continuously generated data streams poses an enormous challenge to efficient detection schemes and demands fast, memory-constrained online algorithms that are capable to deal with concept drifts. Feature selection plays an important role when it comes to improve outlier detection in terms of identifying noisy data that contain irrelevant or redundant features. State-of-the-art work either focuses on unsupervised feature selection for data streams or (offline) outlier detection. Substantial requirements to combine both fields are derived and compared with existing approaches. The comprehensive review reveals a research gap in unsupervised feature selection for the improvement of outlier detection methods in data streams. Thus, a novel algorithm for Unsupervised Feature Selection for Streaming Outlier Detection, denoted as UFSSOD, will be proposed, which is able to perform unsupervised feature selection for the purpose of outlier detection on streaming data. Furthermore, it is able to determine the amount of top-performing features by clustering their score values. A generic concept that shows two application scenarios of UFSSOD in conjunction with off-the-shell online outlier detection algorithms has been derived. Extensive experiments have shown that a promising feature selection mechanism for streaming data is not applicable in the field of outlier detection. Moreover, UFSSOD, as an online capable algorithm, yields comparable results to a state-of-the-art offline method trimmed for outlier detection.

Download Full-text

A Variable Markovian Based Outlier Detection Method for Multi-Dimensional Sequence over Data Stream

2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) ◽

10.1109/pdcat.2016.049 ◽

2016 ◽

Author(s):

Dongsheng Yang ◽

Yijie Wang ◽

Yongmou Li ◽

Xingkong Ma

Keyword(s):

Outlier Detection ◽

Data Stream ◽

Detection Method

Download Full-text

Hydrological Time Series Anomaly Pattern Detection based on Isolation Forest

2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) ◽

10.1109/itnec.2019.8729405 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yu Qin ◽

YuanSheng Lou

Keyword(s):

Time Series ◽

Pattern Detection ◽

Anomaly Pattern ◽

Isolation Forest

Download Full-text

Anomaly pattern detection for streaming data

Expert Systems with Applications ◽

10.1016/j.eswa.2020.113252 ◽

2020 ◽

Vol 149 ◽

pp. 113252 ◽

Cited By ~ 3

Author(s):

Taegong Kim ◽

Cheong Hee Park

Keyword(s):

Streaming Data ◽

Pattern Detection ◽

Anomaly Pattern

Download Full-text

Outlier Detection in Growth Data: Beyond Biologically Implausible Values

Current Developments in Nutrition ◽

10.1093/cdn/nzaa056_021 ◽

2020 ◽

Vol 4 (Supplement_2) ◽

pp. 1174-1174

Author(s):

Paraskevi Massara ◽

Robert Bandsma ◽

Celine Bourdon ◽

Jonathon Maguire ◽

Elena Comelli ◽

...

Keyword(s):

Outlier Detection ◽

Sensitivity And Specificity ◽

Detection Method ◽

Nutritional Assessment ◽

Empirical Method ◽

Child Growth ◽

Detection Methods ◽

Healthy Children ◽

Growth Data ◽

Growth Standards

Abstract Objectives Eliminating anthropometry measurement error and employing outlier and biological implausible values (BIV) detection methods adapted to longitudinal measurements is important for the study of growth. This work aimed to review and assess the accuracy of the available BIV and outlier detection methods and propose a growth trajectory outlier detection method. Methods We included 2354 infants from the Applied Research Group for Kids (TARGet Kids! ) cohort-based in Toronto (ON, Canada) that recruits healthy children from birth to 5 years of age. We considered infants with at least 8 length and weight measurements available between the 1st and the 24th month of age. Weight-for-length z-scores (wflz) were calculated using the WHO growth standards. Outlier measurements were randomly introduced in 5% of the wflz measurements using a normal distribution (μ = 0, σ = 1). We employed 4 outlier detection methods; an empirical detection method for BIV using the cut-offs derived from the WHO Child Growth Standards, a clustering method, a method based on cluster prototypes for individual outlier measurements and a method based on cluster prototypes for entire growth trajectories. Each method was applied individually and evaluated using the sensitivity and specificity indexes based on the manually introduced outliers. We also calculated the Kappa statistic to evaluate the agreement of each method against the manual outliers. Results After excluding premature (<37 weeks), low birth weight (<1500 g) neonates and children with missing length and weight measurements, we analyzed 393 children with a total of 3144 measurements. Sensitivity and specificity for the four methods ranged between 4.4%–55.0% and 83.7% −99.7%, respectively, with kappa being non-significant (P > 0.05) only for the empirical. The clustering detection method reported a higher finding rate, while the empirical method found most of the BIV, but few of the rest of the outliers. Conclusions BIV account for a small portion of the possible outliers in growth datasets. We show that additional statistical or model-based methods are required for a more comprehensive outlier detection process, which has implications for growth analysis and nutritional assessment. Funding Sources Joannah and Brian Lawson Center for Child Nutrition, Connaught Fund, Onassis Foundation.

Download Full-text

Local outlier detection method towards data stream

2011 IEEE 3rd International Conference on Communication Software and Networks ◽

10.1109/iccsn.2011.6014613 ◽

2011 ◽

Author(s):

Xiao Jian-Qiong

Keyword(s):

Outlier Detection ◽

Data Stream ◽

Detection Method ◽

Local Outlier

Download Full-text