Towards Machine Learning-based Anomaly Detection on Time-Series Data

Daniel Vajda; Adrian Pekar; Karoly Farkas

doi:10.36244/icj.2021.1.5

Towards Machine Learning-based Anomaly Detection on Time-Series Data

Infocommunications journal ◽

10.36244/icj.2021.1.5 ◽

2021 ◽

Vol 13 (1) ◽

pp. 35-44

Author(s):

Daniel Vajda ◽

Adrian Pekar ◽

Karoly Farkas

Keyword(s):

Machine Learning ◽

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Short Term Memory ◽

Learning Algorithm ◽

State Of The Art ◽

Detection Methods ◽

Series Data ◽

Rich Information

The complexity of network infrastructures is exponentially growing. Real-time monitoring of these infrastructures is essential to secure their reliable operation. The concept of telemetry has been introduced in recent years to foster this process by streaming time-series data that contain feature-rich information concerning the state of network components. In this paper, we focus on a particular application of telemetry — anomaly detection on time-series data. We rigorously examined state-of-the-art anomaly detection methods. Upon close inspection of the methods, we observed that none of them suits our requirements as they typically face several limitations when applied on time-series data. This paper presents Alter-Re2, an improved version of ReRe, a state-of-the-art Long Short- Term Memory-based machine learning algorithm. Throughout a systematic examination, we demonstrate that by introducing the concepts of ageing and sliding window, the major limitations of ReRe can be overcome. We assessed the efficacy of Alter-Re2 using ten different datasets and achieved promising results. Alter-Re2 performs three times better on average when compared to ReRe.

Download Full-text

Towards Accurate Run-Time Hardware-Assisted Stealthy Malware Detection: A Lightweight, Yet Effective Time Series CNN-Based Approach

Cryptography ◽

10.3390/cryptography5040028 ◽

2021 ◽

Vol 5 (4) ◽

pp. 28

Author(s):

Hossein Sayadi ◽

Yifeng Gao ◽

Hosein Mohammadi Makrani ◽

Jessica Lin ◽

Paulo Cesar Costa ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

State Of The Art ◽

Malware Detection ◽

Detection Performance ◽

Malicious Code ◽

Detection Methods ◽

Series Data ◽

Run Time

According to recent security analysis reports, malicious software (a.k.a. malware) is rising at an alarming rate in numbers, complexity, and harmful purposes to compromise the security of modern computer systems. Recently, malware detection based on low-level hardware features (e.g., Hardware Performance Counters (HPCs) information) has emerged as an effective alternative solution to address the complexity and performance overheads of traditional software-based detection methods. Hardware-assisted Malware Detection (HMD) techniques depend on standard Machine Learning (ML) classifiers to detect signatures of malicious applications by monitoring built-in HPC registers during execution at run-time. Prior HMD methods though effective have limited their study on detecting malicious applications that are spawned as a separate thread during application execution, hence detecting stealthy malware patterns at run-time remains a critical challenge. Stealthy malware refers to harmful cyber attacks in which malicious code is hidden within benign applications and remains undetected by traditional malware detection approaches. In this paper, we first present a comprehensive review of recent advances in hardware-assisted malware detection studies that have used standard ML techniques to detect the malware signatures. Next, to address the challenge of stealthy malware detection at the processor’s hardware level, we propose StealthMiner, a novel specialized time series machine learning-based approach to accurately detect stealthy malware trace at run-time using branch instructions, the most prominent HPC feature. StealthMiner is based on a lightweight time series Fully Convolutional Neural Network (FCN) model that automatically identifies potentially contaminated samples in HPC-based time series data and utilizes them to accurately recognize the trace of stealthy malware. Our analysis demonstrates that using state-of-the-art ML-based malware detection methods is not effective in detecting stealthy malware samples since the captured HPC data not only represents malware but also carries benign applications’ microarchitectural data. The experimental results demonstrate that with the aid of our novel intelligent approach, stealthy malware can be detected at run-time with 94% detection performance on average with only one HPC feature, outperforming the detection performance of state-of-the-art HMD and general time series classification methods by up to 42% and 36%, respectively.

Download Full-text

Variance error of multi-classification based anomaly detection for time series data

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-204699 ◽

2020 ◽

pp. 1-16

Author(s):

Baoquan Wang ◽

Tonghai Jiang ◽

Xi Zhou ◽

Bo Ma ◽

Fan Zhao ◽

...

Keyword(s):

Neural Network ◽

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Short Term Memory ◽

Computational Cost ◽

Reconstruction Error ◽

Detection Methods ◽

Series Data ◽

Data Set

For abnormal detection of time series data, the supervised anomaly detection methods require labeled data. While the range of outlier factors used by the existing semi-supervised methods varies with data, model and time, the threshold for determining abnormality is difficult to obtain, in addition, the computational cost of the way to calculate outlier factors from other data points in the data set is also very large. These make such methods difficult to practically apply. This paper proposes a framework named LSTM-VE which uses clustering combined with visualization method to roughly label normal data, and then uses the normal data to train long short-term memory (LSTM) neural network for semi-supervised anomaly detection. The variance error (VE) of the normal data category classification probability sequence is used as outlier factor. The framework enables anomaly detection based on deep learning to be practically applied and using VE avoids the shortcomings of existing outlier factors and gains a better performance. In addition, the framework is easy to expand because the LSTM neural network can be replaced with other classification models. Experiments on the labeled and real unlabeled data sets prove that the framework is better than replicator neural networks with reconstruction error (RNN-RS) and has good scalability as well as practicability.

Download Full-text

Studies on the GAN-based Anomaly Detection Methods for the Time Series Data

IEEE Access ◽

10.1109/access.2021.3078553 ◽

2021 ◽

pp. 1-1

Author(s):

Chang-Ki Lee ◽

Yu-Jeong Cheon ◽

Wook-Yeon Hwang

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Detection Methods ◽

Series Data

Download Full-text

A Novel Deep Learning Approach for Anomaly Detection of Time Series Data

Scientific Programming ◽

10.1155/2021/6636270 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Zhiwei Ji ◽

Jiaheng Gong ◽

Jiarui Feng

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Short Term Memory ◽

Training Data ◽

Series Data ◽

Real Time System ◽

Statistical Strategy ◽

Anomaly Classification ◽

Real World Datasets

Anomalies in time series, also called “discord,” are the abnormal subsequences. The occurrence of anomalies in time series may indicate that some faults or disease will occur soon. Therefore, development of novel computational approaches for anomaly detection (discord search) in time series is of great significance for state monitoring and early warning of real-time system. Previous studies show that many algorithms were successfully developed and were used for anomaly classification, e.g., health monitoring, traffic detection, and intrusion detection. However, the anomaly detection of time series was not well studied. In this paper, we proposed a long short-term memory- (LSTM-) based anomaly detection method (LSTMAD) for discord search from univariate time series data. LSTMAD learns the structural features from normal (nonanomalous) training data and then performs anomaly detection via a statistical strategy based on the prediction error for observed data. In our experimental evaluation using public ECG datasets and real-world datasets, LSTMAD detects anomalies more accurately than other existing approaches in comparison.

Download Full-text

Predicting machine failure using recurrent neural network-gated recurrent unit (RNN-GRU) through time series data

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i2.2036 ◽

2021 ◽

Vol 10 (2) ◽

pp. 870-878

Author(s):

Zainuddin Z. ◽

P. Akhir E. A. ◽

Hasan M. H.

Keyword(s):

Neural Network ◽

Time Series ◽

Recurrent Neural Network ◽

Time Series Data ◽

Short Term Memory ◽

Learning Algorithm ◽

The State ◽

Series Data ◽

Machine Failure ◽

Gated Recurrent Unit

Time series data often involves big size environment that lead to high dimensionality problem. Many industries are generating time series data that continuously update each second. The arising of machine learning may help in managing the data. It can forecast future instance while handling large data issues. Forecasting is related to predicting task of an upcoming event to avoid any circumstances happen in current environment. It helps those sectors such as production to foresee the state of machine in line with saving the cost from sudden breakdown as unplanned machine failure can disrupt the operation and loss up to millions. Thus, this paper offers a deep learning algorithm named recurrent neural network-gated recurrent unit (RNN-GRU) to forecast the state of machines producing the time series data in an oil and gas sector. RNN-GRU is an affiliation of recurrent neural network (RNN) that can control consecutive data due to the existence of update and reset gates. The gates decided on the necessary information to be kept in the memory. RNN-GRU is a simpler structure of long short-term memory (RNN-LSTM) with 87% of accuracy on prediction.

Download Full-text

Temporal convolutional neural (TCN) network for an effective weather forecasting using time-series data from the local weather station

Soft Computing ◽

10.1007/s00500-020-04954-0 ◽

2020 ◽

Vol 24 (21) ◽

pp. 16453-16482 ◽

Cited By ~ 3

Author(s):

Pradeep Hewage ◽

Ardhendu Behera ◽

Marcello Trovati ◽

Ella Pereira ◽

Morteza Ghahremani ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Weather Forecasting ◽

State Of The Art ◽

Series Data ◽

Proposed Model ◽

Medium Range ◽

Weather Stations ◽

Local Weather

Abstract Non-predictive or inaccurate weather forecasting can severely impact the community of users such as farmers. Numerical weather prediction models run in major weather forecasting centers with several supercomputers to solve simultaneous complex nonlinear mathematical equations. Such models provide the medium-range weather forecasts, i.e., every 6 h up to 18 h with grid length of 10–20 km. However, farmers often depend on more detailed short-to medium-range forecasts with higher-resolution regional forecasting models. Therefore, this research aims to address this by developing and evaluating a lightweight and novel weather forecasting system, which consists of one or more local weather stations and state-of-the-art machine learning techniques for weather forecasting using time-series data from these weather stations. To this end, the system explores the state-of-the-art temporal convolutional network (TCN) and long short-term memory (LSTM) networks. Our experimental results show that the proposed model using TCN produces better forecasting compared to the LSTM and other classic machine learning approaches. The proposed model can be used as an efficient localized weather forecasting tool for the community of users, and it could be run on a stand-alone personal computer.

Download Full-text

Different Techniques used in Stock Market Prediction

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9275.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 60-62

Keyword(s):

Machine Learning ◽

Time Series ◽

Stock Market ◽

Time Series Data ◽

Short Term Memory ◽

Moving Average ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Series Data ◽

Autoregressive Integrated Moving Average

The stock market has been one of the primary revenue streams for many for years. The stock market is often incalculable and uncertain; therefore predicting the ups and downs of the stock market is an uphill task even for the financial experts, which they been trying to tackle without any little success. But it is now possible to predict stock markets due to rapid improvement in technology which led to better processing speed and more accurate algorithms. It is necessary to forswear the misconception that prediction of stock market is only meant for people who have expertise in finance; hence an application can be developed to guide the user about the tempo of the stock market and risk associated with it.The prediction of prices in stock market is a complicated task, and there are various techniques that are used to solve the problem, this paper investigates some of these techniques and compares the accuracy of each of the methods. Forecasting the time series data is important topic in many economics, statistics, finance and business. Of the many techniques in forecasting time series data such as the Autoregressive, Moving Average, and the Autoregressive Integrated Moving Average, it is the Autoregressive Integrated Moving Average that has higher accuracy and higher precision than other methods. And with recent advancement in computational power of processors and advancement in knowledge of machine learning techniques and deep learning, new algorithms could be made to tackle the problem of predicting the stock market. This paper investigates one of such machine learning algorithms to forecast time series data such as Long Short Term Memory. It is compared with traditional algorithms such as the ARIMA method, to determine how superior the LSTM is compared to the traditional methods for predicting the stock market.

Download Full-text

Deep Quantile Regression for Unsupervised Anomaly Detection in Time-Series

SN Computer Science ◽

10.1007/s42979-021-00866-4 ◽

2021 ◽

Vol 2 (6) ◽

Author(s):

Ahmad Idris Tambuwal ◽

Daniel Neagu

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Quantile Regression ◽

Gaussian Distribution ◽

Time Series Data ◽

Prediction Interval ◽

Data Interpretation ◽

Research Literature ◽

Detection Methods ◽

Series Data

AbstractTime-series anomaly detection receives increasing research interest given the growing number of data-rich application domains. Recent additions to anomaly detection methods in research literature include deep neural networks (DNNs: e.g., RNN, CNN, and Autoencoder). The nature and performance of these algorithms in sequence analysis enable them to learn hierarchical discriminative features and time-series temporal nature. However, their performance is affected by usually assuming a Gaussian distribution on the prediction error, which is either ranked, or threshold to label data instances as anomalous or not. An exact parametric distribution is often not directly relevant in many applications though. This will potentially produce faulty decisions from false anomaly predictions due to high variations in data interpretation. The expectations are to produce outputs characterized by a level of confidence. Thus, implementations need the Prediction Interval (PI) that quantify the level of uncertainty associated with the DNN point forecasts, which helps in making better-informed decision and mitigates against false anomaly alerts. An effort has been made in reducing false anomaly alerts through the use of quantile regression for identification of anomalies, but it is limited to the use of quantile interval to identify uncertainties in the data. In this paper, an improve time-series anomaly detection method called deep quantile regression anomaly detection (DQR-AD) is proposed. The proposed method go further to used quantile interval (QI) as anomaly score and compare it with threshold to identify anomalous points in time-series data. The tests run of the proposed method on publicly available anomaly benchmark datasets demonstrate its effective performance over other methods that assumed Gaussian distribution on the prediction or reconstruction cost for detection of anomalies. This shows that our method is potentially less sensitive to data distribution than existing approaches.

Download Full-text

A machine learning approach for the spatiotemporal forecasting of ecological phenomena using dates of species occurrence records

10.1101/435289 ◽

2018 ◽

Author(s):

César Capinha

Keyword(s):

Machine Learning ◽

Time Series ◽

Environmental Conditions ◽

Time Series Data ◽

Learning Algorithm ◽

Series Data ◽

List Type ◽

Species Occurrence ◽

Mushroom Species ◽

Occurrence Records

AbstractSpatiotemporal forecasts of ecological phenomena are highly useful and significant in scientific and socio-economic applications. Nevertheless, developing the correlative models to make these forecasts is often stalled by the inadequate availability of the ecological time-series data. On the contrary, considerable amounts of temporally discrete biological records are being stored in public databases, and often include the sites and dates of the observation. While these data are reasonably suitable for the development of spatiotemporal forecast models, this possibility remains mostly untested.In this paper, we test an approach to develop spatiotemporal forecasts based on the dates and locations found in species occurrence records. This approach is based on ‘time-series classification’, a field of machine learning, and involves the application of a machine-learning algorithm to classify between time-series representing the environmental conditions that precede the occurrence records and time-series representing other environmental conditions, such as those that generally occur in the sites of the records. We employed this framework to predict the timing of emergence of fruiting bodies of two mushroom species (Boletus edulis and Macrolepiota procera) in countries of Europe, from 2009 to 2015. We compared the predictions from this approach with those from a ‘null’ model, based on the calendar dates of the records.Forecasts made from the environmental-based approach were consistently superior to those drawn from the date-based approach, averaging an area under the receiver operating characteristic curve (AUC) of 0.9 for B. edulis and 0.88 for M. procera, compared to an average AUC of 0.83 achieved by the null models for both species. Prediction errors were distributed across the study area and along the years, lending support to the spatiotemporal representativeness of the values of accuracy measured.Our approach, based on species occurrence records, was able to provide useful forecasts of the timing of emergence of two mushroom species across Europe. Given the increased availability and information contained in this type of records, particularly those supplemented with photographs, the range of events that could be possible to forecast is vast.

Download Full-text

Anomaly Detection with Machine Learning Algorithms and Big Data in Electricity Consumption

Sustainability ◽

10.3390/su131910963 ◽

2021 ◽

Vol 13 (19) ◽

pp. 10963

Author(s):

Simona-Vasilica Oprea ◽

Adela Bâra ◽

Florina Camelia Puican ◽

Ioan Cosmin Radu

Keyword(s):

Machine Learning ◽

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Hybrid Approach ◽

Electricity Consumption ◽

Machine Learning Algorithms ◽

Series Data ◽

Smart Meters ◽

Linear Discriminant

When analyzing smart metering data, both reading errors and frauds can be identified. The purpose of this analysis is to alert the utility companies to suspicious consumption behavior that could be further investigated with on-site inspections or other methods. The use of Machine Learning (ML) algorithms to analyze consumption readings can lead to the identification of malfunctions, cyberattacks interrupting measurements, or physical tampering with smart meters. Fraud detection is one of the classical anomaly detection examples, as it is not easy to label consumption or transactional data. Furthermore, frauds differ in nature, and learning is not always possible. In this paper, we analyze large datasets of readings provided by smart meters installed in a trial study in Ireland by applying a hybrid approach. More precisely, we propose an unsupervised ML technique to detect anomalous values in the time series, establish a threshold for the percentage of anomalous readings from the total readings, and then label that time series as suspicious or not. Initially, we propose two types of algorithms for anomaly detection for unlabeled data: Spectral Residual-Convolutional Neural Network (SR-CNN) and an anomaly trained model based on martingales for determining variations in time-series data streams. Then, the Two-Class Boosted Decision Tree and Fisher Linear Discriminant analysis are applied on the previously processed dataset. By training the model, we obtain the required capabilities of detecting suspicious consumers proved by an accuracy of 90%, precision score of 0.875, and F1 score of 0.894.

Download Full-text