A Review on Anomaly Detection in Time Series

Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.

Download Full-text

Variance error of multi-classification based anomaly detection for time series data

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-204699 ◽

2020 ◽

pp. 1-16

Author(s):

Baoquan Wang ◽

Tonghai Jiang ◽

Xi Zhou ◽

Bo Ma ◽

Fan Zhao ◽

...

Keyword(s):

Neural Network ◽

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Short Term Memory ◽

Computational Cost ◽

Reconstruction Error ◽

Detection Methods ◽

Series Data ◽

Data Set

For abnormal detection of time series data, the supervised anomaly detection methods require labeled data. While the range of outlier factors used by the existing semi-supervised methods varies with data, model and time, the threshold for determining abnormality is difficult to obtain, in addition, the computational cost of the way to calculate outlier factors from other data points in the data set is also very large. These make such methods difficult to practically apply. This paper proposes a framework named LSTM-VE which uses clustering combined with visualization method to roughly label normal data, and then uses the normal data to train long short-term memory (LSTM) neural network for semi-supervised anomaly detection. The variance error (VE) of the normal data category classification probability sequence is used as outlier factor. The framework enables anomaly detection based on deep learning to be practically applied and using VE avoids the shortcomings of existing outlier factors and gains a better performance. In addition, the framework is easy to expand because the LSTM neural network can be replaced with other classification models. Experiments on the labeled and real unlabeled data sets prove that the framework is better than replicator neural networks with reconstruction error (RNN-RS) and has good scalability as well as practicability.

Download Full-text

Subsequence Time Series Clustering

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch286 ◽

2011 ◽

pp. 1871-1876

Author(s):

Jason Chen

Keyword(s):

Time Series ◽

Traditional Method ◽

Time Series Data ◽

Large Data ◽

Series Data ◽

Data Sets ◽

Data Set ◽

Time Series Clustering ◽

Mining Community ◽

Representative Points

Clustering analysis is a tool used widely in the Data Mining community and beyond (Everitt et al. 2001). In essence, the method allows us to “summarise” the information in a large data set X by creating a very much smaller set C of representative points (called centroids) and a membership map relating each point in X to its representative in C. An obvious but special type of data set that one might want to cluster is a time series data set. Such data has a temporal ordering on its elements, in contrast to non-time series data sets. In this article we explore the area of time series clustering, focusing mainly on a surprising recent result showing that the traditional method for time series clustering is meaningless. We then survey the literature of recent papers and go on to argue how time series clustering can be made meaningful.

Download Full-text

A simple method for unsupervised anomaly detection: An application to Web time series data

PLoS ONE ◽

10.1371/journal.pone.0262463 ◽

2022 ◽

Vol 17 (1) ◽

pp. e0262463

Author(s):

Keisuke Yoshihara ◽

Kei Takahashi

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Density Ratio ◽

Series Data ◽

Specific Information ◽

Data Set ◽

Simple Method ◽

Page View ◽

Benchmark Data

We propose a simple anomaly detection method that is applicable to unlabeled time series data and is sufficiently tractable, even for non-technical entities, by using the density ratio estimation based on the state space model. Our detection rule is based on the ratio of log-likelihoods estimated by the dynamic linear model, i.e. the ratio of log-likelihood in our model to that in an over-dispersed model that we will call the NULL model. Using the Yahoo S5 data set and the Numenta Anomaly Benchmark data set, publicly available and commonly used benchmark data sets, we find that our method achieves better or comparable performance compared to the existing methods. The result implies that it is essential in time series anomaly detection to incorporate the specific information on time series data into the model. In addition, we apply the proposed method to unlabeled Web time series data, specifically, daily page view and average session duration data on an electronic commerce site that deals in insurance goods to show the applicability of our method to unlabeled real-world data. We find that the increase in page view caused by e-mail newsletter deliveries is less likely to contribute to completing an insurance contract. The result also suggests the importance of the simultaneous monitoring of more than one time series.

Download Full-text

Case Studies in Multi-unit LongitudinalModels with Random Coefficients and Patterned Correlation Structure

Austrian Journal of Statistics ◽

10.17713/ajs.v29i2.503 ◽

2016 ◽

Vol 29 (2) ◽

pp. 93-110

Author(s):

Johannes Ledolter

Keyword(s):

Time Series ◽

Model Building ◽

Time Series Data ◽

Random Coefficients ◽

Correlation Structure ◽

Model Specification ◽

Series Data ◽

Data Sets ◽

Economic Time Series ◽

Data Set

Modelling issues in multi-unit longitudinal models with random coefficients and patterned correlation structure are illustrated in the context of three data sets. The first data set deals with short time series data on annual death rates and alcohol consumption of twenty-five European countries. The second data set deals with glaceologic time series data on snow temperature at 14 different locations within a small glacier in the Austrian Alps. The third data set consists of annual economic time series on factor productivity, anddomestic and foreign research/development (R&D) capital stocks. A practical model building approach–consisting of model specification, estimation, and diagnostic checking–is outlined in the context of these three data sets.

Download Full-text

Anomaly detection in multidimensional time series— A graph-based approach

Journal of Physics: Complexity ◽

10.1088/2632-072x/ac392c ◽

2021 ◽

Author(s):

Marcus Erz ◽

Jeremy Floyd Kielman ◽

Bahar Selvi Uzun ◽

Gabriele Stefanie Guehring

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Distance Measures ◽

Series Data ◽

Data Set ◽

Research Areas ◽

Multidimensional Time Series ◽

Wide Range ◽

Time Frames

Abstract As the digital transformation is taking place, more and more data is being generated and collected.To generate meaningful information and knowledge researchers use various data mining techniques. In addition to classification, clustering, and forecasting, outlier or anomaly detection is one of the most important research areas in time series analysis. In this paper we present a method for detecting anomalies in multidimensional time series using a graph-based algorithm. We transform time series data to graphs prior to calculating the outlier since it offers a wide range of graph-based methods for anomaly detection. Furthermore the dynamics of the data is taken into consideration by implementing a window of a certain size that leads to multiple graphs in different time frames. We use feature extraction and aggregation to finally compare distance measures of two time-dependent graphs. The effectiveness of our algorithm is demonstrated on the Numenta Anomaly Benchmark with various anomaly types as well as the KPI-Anomaly-Detection data set of 2018 AIOps competition.

Download Full-text

The CH-IRP data set: a decade of fortnightly data on δ2H and δ18O in streamflow and precipitation in Switzerland

Earth System Science Data ◽

10.5194/essd-12-3057-2020 ◽

2020 ◽

Vol 12 (4) ◽

pp. 3057-3066

Author(s):

Maria Staudinger ◽

Stefan Seeger ◽

Barbara Herbstritt ◽

Michael Stoelzle ◽

Jan Seibert ◽

...

Keyword(s):

Time Series ◽

Time Series Data ◽

Isotope Composition ◽

Series Data ◽

Data Repository ◽

Data Sets ◽

Individual Study ◽

Data Set ◽

Flow Pathways

Abstract. The stable isotopes of oxygen and hydrogen, 18O and 2H, provide information on water flow pathways and hydrologic catchment functioning. Here a data set of time series data on precipitation and streamflow isotope composition in medium-sized Swiss catchments, CH-IRP, is presented that is unique in terms of its long-term multi-catchment coverage along an alpine to pre-alpine gradient. The data set comprises fortnightly time series of both δ2H and δ18O as well as deuterium excess from streamflow for 23 sites in Switzerland, together with summary statistics of the sampling at each station. Furthermore, time series of δ18O and δ2H in precipitation are provided for each catchment derived from interpolated data sets from the ISOT, GNIP and ANIP networks. For each station we compiled relevant metadata describing both the sampling conditions and catchment characteristics and climate information. Lab standards and errors are provided, and potentially problematic measurements are indicated to help the user decide on the applicability for individual study purposes. For the future, the measurements are planned to be continued at 14 stations as a long-term isotopic measurement network, and the CH-IRP data set will, thus, continuously be extended. The data set can be downloaded from data repository Zenodo at https://doi.org/10.5281/zenodo.4057967 (Staudinger et al., 2020).

Download Full-text

Time series event correlation with DTW and Hierarchical Clustering methods

10.7287/peerj.preprints.27959v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Srishti Mishra ◽

Zohair Shafi ◽

Santanu Pathak

Keyword(s):

Time Series ◽

Hierarchical Clustering ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Multiple Time ◽

Clustering Methods ◽

Event Correlation ◽

Data Set ◽

Causation Analysis

Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.

Download Full-text