scholarly journals Experimental Comparison and Survey of Twelve Time Series Anomaly Detection Algorithms

2021 ◽  
Vol 72 ◽  
pp. 849-899
Author(s):  
Cynthia Freeman ◽  
Jonathan Merriman ◽  
Ian Beaver ◽  
Abdullah Mueen

The existence of an anomaly detection method that is optimal for all domains is a myth. Thus, there exists a plethora of anomaly detection methods which increases every year for a wide variety of domains. But a strength can also be a weakness; given this massive library of methods, how can one select the best method for their application? Current literature is focused on creating new anomaly detection methods or large frameworks for experimenting with multiple methods at the same time. However, and especially as the literature continues to expand, an extensive evaluation of every anomaly detection method is simply not feasible. To reduce this evaluation burden, we present guidelines to intelligently choose the optimal anomaly detection methods based on the characteristics the time series displays such as seasonality, trend, level change concept drift, and missing time steps. We provide a comprehensive experimental validation and survey of twelve anomaly detection methods over different time series characteristics to form guidelines based on several metrics: the AUC (Area Under the Curve), windowed F-score, and Numenta Anomaly Benchmark (NAB) scoring model. Applying our methodologies can save time and effort by surfacing the most promising anomaly detection methods instead of experimenting extensively with a rapidly expanding library of anomaly detection methods, especially in an online setting.

Author(s):  
Cynthia Freeman ◽  
Ian Beaver ◽  
Abdullah Mueen

The existence of a time series anomaly detection method that performs well for all domains is a myth. Given a massive library of available methods, how can one select the best method for their application? An extensive evaluation of every anomaly detection method is not feasible. Many existing anomaly detection systems do not include an avenue for human feedback, essential given the subjective nature of what even is anomalous. We present a technique for improving univariate time series anomaly detection through automatic algorithm selection and human-in-the-loop false-positive removement. These determinations were made by extensively experimenting with over 30 pre-annotated time series from the open-source Numenta Anomaly Benchmark repository. Once the highest performing anomaly detection methods are selected via these characteristics, humans can annotate the predicted outliers which are used to tune anomaly scores via subsequence similarity search and improve the selected methods for their data, increasing evaluation scores and reducing the need for annotation by 70% on predicted anomalies where annotation is used to improve F-scores.


2016 ◽  
Vol 136 (3) ◽  
pp. 363-372
Author(s):  
Takaaki Nakamura ◽  
Makoto Imamura ◽  
Masashi Tatedoko ◽  
Norio Hirai

2016 ◽  
Author(s):  
Milan Flach ◽  
Fabian Gans ◽  
Alexander Brenning ◽  
Joachim Denzler ◽  
Markus Reichstein ◽  
...  

Abstract. Today, many processes at the Earth's surface are constantly monitored by multiple data streams. These observations have become central to advance our understanding of e.g. vegetation dynamics in response to climate or land use change. Another set of important applications is monitoring effects of climatic extreme events, other disturbances such as fires, or abrupt land transitions. One important methodological question is how to reliably detect anomalies in an automated and generic way within multivariate data streams, which typically vary seasonally and are interconnected across variables. Although many algorithms have been proposed for detecting anomalies in multivariate data, only few have been investigated in the context of Earth system science applications. In this study, we systematically combine and compare feature extraction and anomaly detection algorithms for detecting anomalous events. Our aim is to identify suitable workflows for automatically detecting anomalous patterns in multivariate Earth system data streams. We rely on artificial data that mimic typical properties and anomalies in multivariate spatiotemporal Earth observations. This artificial experiment is needed as there is no 'gold standard' for the identification of anomalies in real Earth observations. Our results show that a well chosen feature extraction step (e.g. subtracting seasonal cycles, or dimensionality reduction) is more important than the choice of a particular anomaly detection algorithm. Nevertheless, we identify 3 detection algorithms (k-nearest neighbours mean distance, kernel density estimation, a recurrence approach) and their combinations (ensembles) that outperform other multivariate approaches as well as univariate extreme event detection methods. Our results therefore provide an effective workflow to automatically detect anomalies in Earth system science data.


2016 ◽  
Vol 8 (3) ◽  
pp. 327-333 ◽  
Author(s):  
Rimas Ciplinskas ◽  
Nerijus Paulauskas

New and existing methods of cyber-attack detection are constantly being developed and improved because there is a great number of attacks and the demand to protect from them. In prac-tice, current methods of attack detection operates like antivirus programs, i. e. known attacks signatures are created and attacks are detected by using them. These methods have a drawback – they cannot detect new attacks. As a solution, anomaly detection methods are used. They allow to detect deviations from normal network behaviour that may show a new type of attack. This article introduces a new method that allows to detect network flow anomalies by using local outlier factor algorithm. Accom-plished research allowed to identify groups of features which showed the best results of anomaly flow detection according the highest values of precision, recall and F-measure. Kibernetinių atakų gausa ir įvairovė bei siekis nuo jų apsisaugoti verčia nuolat kurti naujus ir tobulinti jau esamus atakų aptikimo metodus. Kaip rodo praktika, dabartiniai atakų atpažinimo metodai iš esmės veikia pagal antivirusinių programų principą, t.y. sudaromi žinomų atakų šablonai, kuriais remiantis yra aptinkamos atakos, tačiau pagrindinis tokių metodų trūkumas – negalėjimas aptikti naujų, dar nežinomų atakų. Šiai problemai spręsti yra pasitelkiami anomalijų aptikimo metodai, kurie leidžia aptikti nukrypimus nuo normalios tinklo būsenos. Straipsnyje yra pateiktas naujas metodas, leidžiantis aptikti kompiuterių tinklo paketų srauto anomalijas taikant lokalių išskirčių faktorių algoritmą. Atliktas tyrimas leido surasti požymių grupes, kurias taikant anomalūs tinklo srautai yra atpažįstami geriausiai, t. y. pasiekiamos didžiausios tikslumo, atkuriamumo ir F-mato reikšmės.


2020 ◽  
Vol 39 (4) ◽  
pp. 5243-5252
Author(s):  
Zhen Lei ◽  
Liang Zhu ◽  
Youliang Fang ◽  
Xiaolei Li ◽  
Beizhan Liu

Pattern recognition technology is applied to bridge health monitoring to solve abnormalities in bridge health monitoring data. Testing is of great significance. For abnormal data detection, this paper proposes a single variable pattern anomaly detection method based on KNN distance and a multivariate time series anomaly detection method based on the covariance matrix and singular value decomposition. This method first performs compression and segmentation on the original data sequence based on important points to obtain multiple time subsequences, then calculates the pattern distance between each time subsequence according to the similarity measure of the time series, and finally selects the abnormal mode according to the KNN method. In this paper, the reliability of the method is verified through experiments. The experimental results in this paper show that the 5/7/9 / 11-nearest neighbors point to a specific number of nodes. Combined with the original time series diagram corresponding to the time zone view, in this paragraph in the time, the value of the temperature sensor No. 6 stays at 32.5 degrees Celsius for up to one month. The detection algorithm controls the number of MTS subsequences through sliding windows and sliding intervals. The execution time is not large, and the value of K is different. Although the calculated results are different, most of the most obvious abnormal sequences can be detected. The results of this paper provide a certain reference value for the study of abnormal detection of bridge health monitoring data.


Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5895
Author(s):  
Jiansu Pu ◽  
Jingwen Zhang ◽  
Hui Shao ◽  
Tingting Zhang ◽  
Yunbo Rao

The development of the Internet has made social communication increasingly important for maintaining relationships between people. However, advertising and fraud are also growing incredibly fast and seriously affect our daily life, e.g., leading to money and time losses, trash information, and privacy problems. Therefore, it is very important to detect anomalies in social networks. However, existing anomaly detection methods cannot guarantee the correct rate. Besides, due to the lack of labeled data, we also cannot use the detection results directly. In other words, we still need human analysts in the loop to provide enough judgment for decision making. To help experts analyze and explore the results of anomaly detection in social networks more objectively and effectively, we propose a novel visualization system, egoDetect, which can detect the anomalies in social communication networks efficiently. Based on the unsupervised anomaly detection method, the system can detect the anomaly without training and get the overview quickly. Then we explore an ego’s topology and the relationship between egos and alters by designing a novel glyph based on the egocentric network. Besides, it also provides rich interactions for experts to quickly navigate to the interested users for further exploration. We use an actual call dataset provided by an operator to evaluate our system. The result proves that our proposed system is effective in the anomaly detection of social networks.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Xuguang Liu

Aiming at the anomaly detection problem in sensor data, traditional algorithms usually only focus on the continuity of single-source data and ignore the spatiotemporal correlation between multisource data, which reduces detection accuracy to a certain extent. Besides, due to the rapid growth of sensor data, centralized cloud computing platforms cannot meet the real-time detection needs of large-scale abnormal data. In order to solve this problem, a real-time detection method for abnormal data of IoT sensors based on edge computing is proposed. Firstly, sensor data is represented as time series; K-nearest neighbor (KNN) algorithm is further used to detect outliers and isolated groups of the data stream in time series. Secondly, an improved DBSCAN (Density Based Spatial Clustering of Applications with Noise) algorithm is proposed by considering spatiotemporal correlation between multisource data. It can be set according to sample characteristics in the window and overcomes the slow convergence problem using global parameters and large samples, then makes full use of data correlation to complete anomaly detection. Moreover, this paper proposes a distributed anomaly detection model for sensor data based on edge computing. It performs data processing on computing resources close to the data source as much as possible, which improves the overall efficiency of data processing. Finally, simulation results show that the proposed method has higher computational efficiency and detection accuracy than traditional methods and has certain feasibility.


Anomaly detection has vital role in data preprocessing and also in the mining of outstanding points for marketing, network sensors, fraud detection, intrusion detection, stock market analysis. Recent studies have been found to concentrate more on outlier detection for real time datasets. Anomaly detection study is at present focuses on the expansion of innovative machine learning methods and on enhancing the computation time. Sentiment mining is the process to discover how people feel about a particular topic. Though many anomaly detection techniques have been proposed, it is also notable that the research focus lacks a comparative performance evaluation in sentiment mining datasets. In this study, three popular unsupervised anomaly detection algorithms such as density based, statistical based and cluster based anomaly detection methods are evaluated on movie review sentiment mining dataset. This paper will set a base for anomaly detection methods in sentiment mining research. The results show that density based (LOF) anomaly detection method suits best for the movie review sentiment dataset.


Author(s):  
Baoquan Wang ◽  
Tonghai Jiang ◽  
Xi Zhou ◽  
Bo Ma ◽  
Fan Zhao ◽  
...  

For abnormal detection of time series data, the supervised anomaly detection methods require labeled data. While the range of outlier factors used by the existing semi-supervised methods varies with data, model and time, the threshold for determining abnormality is difficult to obtain, in addition, the computational cost of the way to calculate outlier factors from other data points in the data set is also very large. These make such methods difficult to practically apply. This paper proposes a framework named LSTM-VE which uses clustering combined with visualization method to roughly label normal data, and then uses the normal data to train long short-term memory (LSTM) neural network for semi-supervised anomaly detection. The variance error (VE) of the normal data category classification probability sequence is used as outlier factor. The framework enables anomaly detection based on deep learning to be practically applied and using VE avoids the shortcomings of existing outlier factors and gains a better performance. In addition, the framework is easy to expand because the LSTM neural network can be replaced with other classification models. Experiments on the labeled and real unlabeled data sets prove that the framework is better than replicator neural networks with reconstruction error (RNN-RS) and has good scalability as well as practicability.


Author(s):  
Eamonn Keogh ◽  
Li Keogh ◽  
John C. Handley

Compression-based data mining is a universal approach to clustering, classification, dimensionality reduction, and anomaly detection. It is motivated by results in bioinformatics, learning, and computational theory that are not well known outside those communities. It is based on an easily computed compression dissimilarity measure (CDM) between objects obtained by compression. The basic concept is easy to understand, but its foundations are rigorously formalized in information theory. The similarity between any two objects (XML files, time series, text strings, molecules, etc.) can be obtained using a universal lossless compressor. The compression dissimilarity measure is the size of the compressed concatenation of the two objects divided by the sum of the compressed sizes of each of the objects. The intuition is that if two objects are similar, lossless compressor will remove the redundancy between them and the resulting size of the concatenated object should be close the size of the larger of the two compressed constituent objects. The larger the CDM between two objects, the more dissimilar they are. Classification, clustering and anomaly detection algorithms can then use this dissimilarity measure in a wide variety of applications. Many of these are described in (Keogh et al., 2004), (Keogh et al. 2007), and references therein. This approach works well when (1) objects are large and it is computationally expensive to compute other distances (e.g., very long strings); or (2) there are no natural distances between the objects or none that are reasonable from first principles. CDM is “parameter-free” and thus avoids over-fitting the data or relying upon assumptions that may be incorrect (Keogh et al., 2004).


Sign in / Sign up

Export Citation Format

Share Document