A Comparative Analysis of Traditional and Deep Learning-Based Anomaly Detection Methods for Streaming Data

The need for robust unsupervised anomaly detection in streaming data is increasing rapidly in the current era of smart devices, where enormous data are gathered from numerous sensors. These sensors record the internal state of a machine, the external environment, and the interaction of machines with other machines and humans. It is of prime importance to leverage this information in order to minimize downtime of machines, or even avoid downtime completely by constant monitoring. Since each device generates a different type of streaming data, it is normally the case that a specific kind of anomaly detection technique performs better than the others depending on the data type. For some types of data and use-cases, statistical anomaly detection techniques work better, whereas for others, deep learning-based techniques are preferred. In this paper, we present a novel anomaly detection technique, FuseAD, which takes advantage of both statistical and deep-learning-based approaches by fusing them together in a residual fashion. The obtained results show an increase in area under the curve (AUC) as compared to state-of-the-art anomaly detection methods when FuseAD is tested on a publicly available dataset (Yahoo Webscope benchmark). The obtained results advocate that this fusion-based technique can obtain the best of both worlds by combining their strengths and complementing their weaknesses. We also perform an ablation study to quantify the contribution of the individual components in FuseAD, i.e., the statistical ARIMA model as well as the deep-learning-based convolutional neural network (CNN) model.

Download Full-text

Comparative Analysis of Traffic Anomaly Detection Methods

Traffic Anomaly Detection ◽

10.1016/b978-1-78548-012-6.50003-5 ◽

2015 ◽

pp. 29-45

Author(s):

Antonio Cuadra-Sánchez ◽

Javier Aracil

Keyword(s):

Comparative Analysis ◽

Anomaly Detection ◽

Detection Methods ◽

Traffic Anomaly ◽

Traffic Anomaly Detection

Download Full-text

Auto-Threshold Deep SVDD for Anomaly-based Web Application Firewall

10.36227/techrxiv.15135468 ◽

2021 ◽

Author(s):

Ali Moradi Vartouni ◽

Matin Shokri ◽

Mohammad Teshnehlab

Keyword(s):

Deep Learning ◽

Anomaly Detection ◽

Web Application ◽

Threshold Level ◽

Web Security ◽

Detection Methods ◽

Support Vector ◽

Support Vector Data Description ◽

Vector Data ◽

Deep Support

Protecting websites and applications from cyber-threats is vital for any organization. A Web application firewall (WAF) prevents attacks to damaging applications. This provides a web security by filtering and monitoring traffic network to protect against attacks. A WAF solution based on the anomaly detection can identify zero-day attacks. Deep learning is the state-of-the-art method that is widely used to detect attacks in the anomaly-based WAF area. Although deep learning has demonstrated excellent results on anomaly detection tasks in web requests, there is trade-off between false-positive and missed-attack rates which is a key problem in WAF systems. On the other hand, anomaly detection methods suffer adjusting threshold-level to distinguish attack and normal traffic. In this paper, first we proposed a model based on Deep Support Vector Data Description (Deep SVDD), then we compare two feature extraction strategies, one-hot and bigram, on the raw requests. Second to overcome threshold challenges, we introduce a novel end-to-end algorithm Auto-Threshold Deep SVDD (ATDSVDD) to determine an appropriate threshold during the learning process. As a result we compare our model with other deep models on CSIC-2010 and ECML/PKDD-2007 datasets. Results show ATDSVDD on bigram feature data have better performance in terms of accuracy and generalization. <br>

Download Full-text

COMPARATIVE ANALYSIS OF SYSTEM LOGS AND STREAMING DATA ANOMALY DETECTION ALGORITHMS

Information systems and technologies security ◽

10.17721/ists.2020.1.50-59 ◽

2020 ◽

pp. 5-7

Author(s):

Andriy Lishchytovych ◽

Volodymyr Pavlenko ◽

Alexander Shmatok ◽

Yuriy Finenko

Keyword(s):

Comparative Analysis ◽

Anomaly Detection ◽

Large Scale ◽

Streaming Data ◽

Incident Management ◽

Hierarchical Temporal Memory ◽

Detection Systems ◽

It Systems ◽

System Logs ◽

Temporary Memory

This paper provides with the description, comparative analysis of multiple commonly used approaches of the analysis of system logs, and streaming data massively generated by company IT infrastructure with an unattended anomaly detection feature. An importance of the anomaly detection is dictated by the growing costs of system downtime due to the events that would have been predicted based on the log entries with the abnormal data reported. Anomaly detection systems are built using standard workflow of the data collection, parsing, information extraction and detection steps. Most of the document is related to the anomaly detection step and algorithms like regression, decision tree, SVM, clustering, principal components analysis, invariants mining and hierarchical temporal memory model. Model-based anomaly algorithms and hierarchical temporary memory algorithms were used to process HDFS, BGL and NAB datasets with ~16m log messages and 365k data points of the streaming data. The data was manually labeled to enable the training of the models and accuracy calculation. According to the results, supervised anomaly detection systems achieve high precision but require significant training effort, while HTM-based algorithm shows the highest detection precision with zero training. Detection of the abnormal system behavior plays an important role in large-scale incident management systems. Timely detection allows IT administrators to quickly identify issues and resolve them immediately. This approach reduces the system downtime dramatically.Most of the IT systems generate logs with the detailed information of the operations. Therefore, the logs become an ideal data source of the anomaly detection solutions. The volume of the logs makes it impossible to analyze them manually and requires automated approaches.

Download Full-text

Deep Learning-based Anomaly Detection in Cyber-physical Systems

ACM Computing Surveys ◽

10.1145/3453155 ◽

2021 ◽

Vol 54 (5) ◽

pp. 1-36

Author(s):

Yuan Luo ◽

Ya Xiao ◽

Long Cheng ◽

Guojun Peng ◽

Danfeng (Daphne) Yao

Keyword(s):

Deep Learning ◽

Anomaly Detection ◽

Cyber Physical Systems ◽

Detection Methods ◽

Future Research ◽

Open Problems ◽

Physical Systems ◽

Domain Specific ◽

Domain Specific Knowledge ◽

Review State

Anomaly detection is crucial to ensure the security of cyber-physical systems (CPS). However, due to the increasing complexity of CPSs and more sophisticated attacks, conventional anomaly detection methods, which face the growing volume of data and need domain-specific knowledge, cannot be directly applied to address these challenges. To this end, deep learning-based anomaly detection (DLAD) methods have been proposed. In this article, we review state-of-the-art DLAD methods in CPSs. We propose a taxonomy in terms of the type of anomalies, strategies, implementation, and evaluation metrics to understand the essential properties of current methods. Further, we utilize this taxonomy to identify and highlight new characteristics and designs in each CPS domain. Also, we discuss the limitations and open problems of these methods. Moreover, to give users insights into choosing proper DLAD methods in practice, we experimentally explore the characteristics of typical neural models, the workflow of DLAD methods, and the running performance of DL models. Finally, we discuss the deficiencies of DL approaches, our findings, and possible directions to improve DLAD methods and motivate future research.

Download Full-text

Anomalies Detection Using Isolation in Concept-Drifting Data Streams

Computers ◽

10.3390/computers10010013 ◽

2021 ◽

Vol 10 (1) ◽

pp. 13

Author(s):

Maurras Ulbricht Togbe ◽

Yousra Chabchoub ◽

Aliou Boly ◽

Mariam Barry ◽

Raja Chiky ◽

...

Keyword(s):

Anomaly Detection ◽

Half Space ◽

Data Streams ◽

Detection Efficiency ◽

Concept Drift ◽

Streaming Data ◽

Detection Methods ◽

Data Sets ◽

Stream Data ◽

Isolation Forest

Detecting anomalies in streaming data is an important issue for many application domains, such as cybersecurity, natural disasters, or bank frauds. Different approaches have been designed in order to detect anomalies: statistics-based, isolation-based, clustering-based, etc. In this paper, we present a structured survey of the existing anomaly detection methods for data streams with a deep view on Isolation Forest (iForest). We first provide an implementation of Isolation Forest Anomalies detection in Stream Data (IForestASD), a variant of iForest for data streams. This implementation is built on top of scikit-multiflow (River), which is an open source machine learning framework for data streams containing a single anomaly detection algorithm in data streams, called Streaming half-space trees. We performed experiments on different real and well known data sets in order to compare the performance of our implementation of IForestASD and half-space trees. Moreover, we extended the IForestASD algorithm to handle drifting data by proposing three algorithms that involve two main well known drift detection methods: ADWIN and KSWIN. ADWIN is an adaptive sliding window algorithm for detecting change in a data stream. KSWIN is a more recent method and it refers to the Kolmogorov–Smirnov Windowing method for concept drift detection. More precisely, we extended KSWIN to be able to deal with n-dimensional data streams. We validated and compared all of the proposed methods on both real and synthetic data sets. In particular, we evaluated the F1-score, the execution time, and the memory consumption. The experiments show that our extensions have lower resource consumption than the original version of IForestASD with a similar or better detection efficiency.

Download Full-text

Research on Time Series Anomaly Detection: Based on Deep Learning Methods

Journal of Physics Conference Series ◽

10.1088/1742-6596/2132/1/012012 ◽

2021 ◽

Vol 2132 (1) ◽

pp. 012012

Author(s):

Jiaqi Zhou

Keyword(s):

Neural Network ◽

Neural Networks ◽

Time Series ◽

Deep Learning ◽

Anomaly Detection ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Detection Task ◽

Detection Methods ◽

Learning Methods

Abstract Time series anomaly detection has always been an important research direction. The early time series anomaly detection methods are mainly statistical methods and machine learning methods. With the powerful functions of deep neural network being continuously mined by researchers, the effect of deep neural network in anomaly detection task has been significantly better than the traditional methods. In view of the continuous development and application of deep neural networks such as transformer and graph neural network (GNN) in time series anomaly detection in recent years, the body of research lacks a comparative evaluation of deep learning methods in recent years. This paper studies various deep neural networks suitable for time series, which are divided into three categories according to anomaly detection methods. The evaluation is conducted on public datasets. By analyzing the evaluation criteria, this paper discusses the performance of each model, as well as the problems and development direction in the field of time series anomaly detection in the future. This study found that in the time series anomaly detection task, transformer is suitable for dealing with long-time series prediction, and studying the graph structure of time series may be the best way to deal with time series anomaly detection in the future

Download Full-text

Auto-Threshold Deep SVDD for Anomaly-based Web Application Firewall

10.36227/techrxiv.15135468.v1 ◽

2021 ◽

Author(s):

Ali Moradi Vartouni ◽

Matin Shokri ◽

Mohammad Teshnehlab

Keyword(s):

Deep Learning ◽

Anomaly Detection ◽

Web Application ◽

Threshold Level ◽

Web Security ◽

Detection Methods ◽

Support Vector ◽

Support Vector Data Description ◽

Vector Data ◽

Deep Support

Protecting websites and applications from cyber-threats is vital for any organization. A Web application firewall (WAF) prevents attacks to damaging applications. This provides a web security by filtering and monitoring traffic network to protect against attacks. A WAF solution based on the anomaly detection can identify zero-day attacks. Deep learning is the state-of-the-art method that is widely used to detect attacks in the anomaly-based WAF area. Although deep learning has demonstrated excellent results on anomaly detection tasks in web requests, there is trade-off between false-positive and missed-attack rates which is a key problem in WAF systems. On the other hand, anomaly detection methods suffer adjusting threshold-level to distinguish attack and normal traffic. In this paper, first we proposed a model based on Deep Support Vector Data Description (Deep SVDD), then we compare two feature extraction strategies, one-hot and bigram, on the raw requests. Second to overcome threshold challenges, we introduce a novel end-to-end algorithm Auto-Threshold Deep SVDD (ATDSVDD) to determine an appropriate threshold during the learning process. As a result we compare our model with other deep models on CSIC-2010 and ECML/PKDD-2007 datasets. Results show ATDSVDD on bigram feature data have better performance in terms of accuracy and generalization. <br>

Download Full-text

Deep Learning Anomaly Detection methods to passively detect COVID-19 from Audio

10.1109/icdh52753.2021.00023 ◽

2021 ◽

Author(s):

Shreesha Narasimha Murthy ◽

Emmanuel Agu

Keyword(s):

Deep Learning ◽

Anomaly Detection ◽

Detection Methods

Download Full-text

Temporal signals to images: Monitoring the condition of industrial assets with deep learning image processing algorithms

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x21994446 ◽

2021 ◽

pp. 1748006X2199444 ◽

Cited By ~ 1

Author(s):

Gabriel Rodriguez Garcia ◽

Gabriel Michau ◽

Mélanie Ducoffe ◽

Jayant Sen Gupta ◽

Olga Fink

Keyword(s):

Time Series ◽

Deep Learning ◽

Anomaly Detection ◽

Recurrence Plot ◽

Detection Methods ◽

Image Encoding ◽

Learning Framework ◽

Helicopter Flight ◽

Markov Transition ◽

The Time Domain

The ability to detect anomalies in time series is considered highly valuable in numerous application domains. The sequential nature of time series objects is responsible for an additional feature complexity, ultimately requiring specialized approaches in order to solve the task. Essential characteristics of time series, situated outside the time domain, are often difficult to capture with state-of-the-art anomaly detection methods when no transformations have been applied to the time series. Inspired by the success of deep learning methods in computer vision, several studies have proposed transforming time series into image-like representations, used as inputs for deep learning models, and have led to very promising results in classification tasks. In this paper, we first review the signal to image encoding approaches found in the literature. Second, we propose modifications to some of their original formulations to make them more robust to the variability in large datasets. Third, we compare them on the basis of a common unsupervised task to demonstrate how the choice of the encoding can impact the results when used in the same deep learning architecture. We thus provide a comparison between six encoding algorithms with and without the proposed modifications. The selected encoding methods are Gramian Angular Field, Markov Transition Field, recurrence plot, grey scale encoding, spectrogram, and scalogram. We also compare the results achieved with the raw signal used as input for another deep learning model. We demonstrate that some encodings have a competitive advantage and might be worth considering within a deep learning framework. The comparison is performed on a dataset collected and released by Airbus SAS, containing highly complex vibration measurements from real helicopter flight tests. The different encodings provide competitive results for anomaly detection.

Download Full-text