Exathlon

Access to high-quality data repositories and benchmarks have been instrumental in advancing the state of the art in many experimental research domains. While advanced analytics tasks over time series data have been gaining lots of attention, lack of such community resources severely limits scientific progress. In this paper, we present Exathlon, the first comprehensive public benchmark for explainable anomaly detection over high-dimensional time series data. Exathlon has been systematically constructed based on real data traces from repeated executions of large-scale stream processing jobs on an Apache Spark cluster. Some of these executions were intentionally disturbed by introducing instances of six different types of anomalous events (e.g., misbehaving inputs, resource contention, process failures). For each of the anomaly instances, ground truth labels for the root cause interval as well as those for the extended effect interval are provided, supporting the development and evaluation of a wide range of anomaly detection (AD) and explanation discovery (ED) tasks. We demonstrate the practical utility of Exathlon's dataset, evaluation methodology, and end-to-end data science pipeline design through an experimental study with three state-of-the-art AD and ED techniques.

Download Full-text

Towards Machine Learning-based Anomaly Detection on Time-Series Data

Infocommunications journal ◽

10.36244/icj.2021.1.5 ◽

2021 ◽

Vol 13 (1) ◽

pp. 35-44

Author(s):

Daniel Vajda ◽

Adrian Pekar ◽

Karoly Farkas

Keyword(s):

Machine Learning ◽

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Short Term Memory ◽

Learning Algorithm ◽

State Of The Art ◽

Detection Methods ◽

Series Data ◽

Rich Information

The complexity of network infrastructures is exponentially growing. Real-time monitoring of these infrastructures is essential to secure their reliable operation. The concept of telemetry has been introduced in recent years to foster this process by streaming time-series data that contain feature-rich information concerning the state of network components. In this paper, we focus on a particular application of telemetry — anomaly detection on time-series data. We rigorously examined state-of-the-art anomaly detection methods. Upon close inspection of the methods, we observed that none of them suits our requirements as they typically face several limitations when applied on time-series data. This paper presents Alter-Re2, an improved version of ReRe, a state-of-the-art Long Short- Term Memory-based machine learning algorithm. Throughout a systematic examination, we demonstrate that by introducing the concepts of ageing and sliding window, the major limitations of ReRe can be overcome. We assessed the efficacy of Alter-Re2 using ten different datasets and achieved promising results. Alter-Re2 performs three times better on average when compared to ReRe.

Download Full-text

Anomaly detection in multidimensional time series— A graph-based approach

Journal of Physics: Complexity ◽

10.1088/2632-072x/ac392c ◽

2021 ◽

Author(s):

Marcus Erz ◽

Jeremy Floyd Kielman ◽

Bahar Selvi Uzun ◽

Gabriele Stefanie Guehring

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Distance Measures ◽

Series Data ◽

Data Set ◽

Research Areas ◽

Multidimensional Time Series ◽

Wide Range ◽

Time Frames

Abstract As the digital transformation is taking place, more and more data is being generated and collected.To generate meaningful information and knowledge researchers use various data mining techniques. In addition to classification, clustering, and forecasting, outlier or anomaly detection is one of the most important research areas in time series analysis. In this paper we present a method for detecting anomalies in multidimensional time series using a graph-based algorithm. We transform time series data to graphs prior to calculating the outlier since it offers a wide range of graph-based methods for anomaly detection. Furthermore the dynamics of the data is taken into consideration by implementing a window of a certain size that leads to multiple graphs in different time frames. We use feature extraction and aggregation to finally compare distance measures of two time-dependent graphs. The effectiveness of our algorithm is demonstrated on the Numenta Anomaly Benchmark with various anomaly types as well as the KPI-Anomaly-Detection data set of 2018 AIOps competition.

Download Full-text

An Enhanced Seasonal-Hybrid ESD Technique for Robust Anomaly Detection on Time Series

10.5753/sbrc.2018.2422 ◽

2018 ◽

Author(s):

Rafael G. Vieira ◽

Marcos A. Leone Filho ◽

Robinson Semolini

Keyword(s):

Time Series ◽

Time Series Data ◽

State Of The Art ◽

Statistical Technique ◽

Series Data ◽

Simulation Studies ◽

Detection Techniques ◽

Research Activities ◽

Decomposition Procedure ◽

Wide Range

Nowadays, time series data underlies countless research activities. Despite the wide range of techniques to capture and process all this information, issues such as analyzing large amounts of data and detecting unusual behaviors on them still pose a great challenge. In this context, this paper suggests SHESD+, a statistical technique that combines the Extreme Studentized Deviate (ESD) test and a decomposition procedure based on Loess to detect anomalies on time series data. The proposed technique employs robust metrics to identify anomalies in a more proper and accurate manner, even in the presence of trend and seasonal spikes. Simulation studies are carried out to evaluate the effectiveness of the SH-ESD+ using the published Numenta Anomaly Benchmark (NAB) collection. Computational results show that the SH-ESD+ performs consistently when compared against state-of-the-art and classic detection techniques.

Download Full-text

An Anomaly Detection Method with Exemplar Subsequence for Time Series Data

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.136.363 ◽

2016 ◽

Vol 136 (3) ◽

pp. 363-372

Author(s):

Takaaki Nakamura ◽

Makoto Imamura ◽

Masashi Tatedoko ◽

Norio Hirai

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Detection Method ◽

Series Data

Download Full-text

LRZ Convolution: An Algorithm for Automatic Anomaly Detection in Time-series Data

32nd International Conference on Scientific and Statistical Database Management ◽

10.1145/3400903.3400904 ◽

2020 ◽

Author(s):

Arunprasad P. Marathe

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Series Data

Download Full-text

Change Point Enhanced Anomaly Detection for IoT Time Series Data

Water ◽

10.3390/w13121633 ◽

2021 ◽

Vol 13 (12) ◽

pp. 1633

Author(s):

Elena-Simona Apostol ◽

Ciprian-Octavian Truică ◽

Florin Pop ◽

Christian Esposito

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Change Point ◽

Time Series Data ◽

Multivariate Time Series ◽

Change Point Detection ◽

Change Points ◽

Series Data ◽

Prediction And Forecasting ◽

Point Detection

Due to the exponential growth of the Internet of Things networks and the massive amount of time series data collected from these networks, it is essential to apply efficient methods for Big Data analysis in order to extract meaningful information and statistics. Anomaly detection is an important part of time series analysis, improving the quality of further analysis, such as prediction and forecasting. Thus, detecting sudden change points with normal behavior and using them to discriminate between abnormal behavior, i.e., outliers, is a crucial step used to minimize the false positive rate and to build accurate machine learning models for prediction and forecasting. In this paper, we propose a rule-based decision system that enhances anomaly detection in multivariate time series using change point detection. Our architecture uses a pipeline that automatically manages to detect real anomalies and remove the false positives introduced by change points. We employ both traditional and deep learning unsupervised algorithms, in total, five anomaly detection and five change point detection algorithms. Additionally, we propose a new confidence metric based on the support for a time series point to be an anomaly and the support for the same point to be a change point. In our experiments, we use a large real-world dataset containing multivariate time series about water consumption collected from smart meters. As an evaluation metric, we use Mean Absolute Error (MAE). The low MAE values show that the algorithms accurately determine anomalies and change points. The experimental results strengthen our assumption that anomaly detection can be improved by determining and removing change points as well as validates the correctness of our proposed rules in real-world scenarios. Furthermore, the proposed rule-based decision support systems enable users to make informed decisions regarding the status of the water distribution network and perform effectively predictive and proactive maintenance.

Download Full-text