An Anomaly Detection and Explainability Framework using Convolutional Autoencoders for Data Storage Systems

Anomaly detection in data storage systems is a challenging problem due to the high dimensional sequential data involved, and lack of labels. The state of the art for automating anomaly detection in these systems typically relies on hand crafted rules and thresholds which mainly allow to distinguish between normal and abnormal behavior of each indicator in isolation. In this work we present an end-to-end framework based on convolutional autoencoders which not only allows for anomaly detection on multivariate time series data, but also provides explainability. This is done by identifying similar historic anomalies and extracting the most influential indicators. These are then presented to relevant personnel such as system designers and architects, or to support engineers for further analysis. We demonstrate the application of this framework along with an intuitive interactive web interface which was developed for data storage system anomaly detection. We discuss how this framework along with its explainability aspects enables support engineers to effectively tackle abnormal behaviors, all while allowing for crucial feedback.

Download Full-text

Change Point Enhanced Anomaly Detection for IoT Time Series Data

Water ◽

10.3390/w13121633 ◽

2021 ◽

Vol 13 (12) ◽

pp. 1633

Author(s):

Elena-Simona Apostol ◽

Ciprian-Octavian Truică ◽

Florin Pop ◽

Christian Esposito

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Change Point ◽

Time Series Data ◽

Multivariate Time Series ◽

Change Point Detection ◽

Change Points ◽

Series Data ◽

Prediction And Forecasting ◽

Point Detection

Due to the exponential growth of the Internet of Things networks and the massive amount of time series data collected from these networks, it is essential to apply efficient methods for Big Data analysis in order to extract meaningful information and statistics. Anomaly detection is an important part of time series analysis, improving the quality of further analysis, such as prediction and forecasting. Thus, detecting sudden change points with normal behavior and using them to discriminate between abnormal behavior, i.e., outliers, is a crucial step used to minimize the false positive rate and to build accurate machine learning models for prediction and forecasting. In this paper, we propose a rule-based decision system that enhances anomaly detection in multivariate time series using change point detection. Our architecture uses a pipeline that automatically manages to detect real anomalies and remove the false positives introduced by change points. We employ both traditional and deep learning unsupervised algorithms, in total, five anomaly detection and five change point detection algorithms. Additionally, we propose a new confidence metric based on the support for a time series point to be an anomaly and the support for the same point to be a change point. In our experiments, we use a large real-world dataset containing multivariate time series about water consumption collected from smart meters. As an evaluation metric, we use Mean Absolute Error (MAE). The low MAE values show that the algorithms accurately determine anomalies and change points. The experimental results strengthen our assumption that anomaly detection can be improved by determining and removing change points as well as validates the correctness of our proposed rules in real-world scenarios. Furthermore, the proposed rule-based decision support systems enable users to make informed decisions regarding the status of the water distribution network and perform effectively predictive and proactive maintenance.

Download Full-text

Clustering-based anomaly detection in multivariate time series data

Applied Soft Computing ◽

10.1016/j.asoc.2020.106919 ◽

2021 ◽

Vol 100 ◽

pp. 106919

Author(s):

Jinbo Li ◽

Hesam Izakian ◽

Witold Pedrycz ◽

Iqbal Jamal

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Multivariate Time Series ◽

Series Data

Download Full-text

REQUEST BALANCING METHOD FOR INCREASING THEIR PROCESSING EFFICIENCY WITH INFORMATION REPLICATION IN A DISTRIBUTED DATA STORAGE SYSTEM

TECHNICAL SCIENCES AND TECHNOLOG IES ◽

10.25140/2411-5363-2021-2(24)-75-82 ◽

2021 ◽

pp. 75-82

Author(s):

Igor Boyarshin ◽

Anna Doroshenko ◽

Pavlo Rehida

Keyword(s):

Data Storage ◽

Storage Systems ◽

Storage System ◽

New Method ◽

Distributed Data ◽

Processing Efficiency ◽

Distributed Data Storage ◽

Shared Data ◽

Multiple Data ◽

Data Storage System

The article describes a new method of improving efficiency of the systems that deal with storage and providing access of shared data of many users by utilizing replication. Existing methods of load balancing in data storage systems are described, namely RR and WRR. A new method of request balancing among multiple data storage nodes is proposed, that is able to adjust to input request stream intensity in real time and utilize disk space efficiently while doing so.

Download Full-text

An anomaly detection approach based on the combination of LSTM autoencoder and isolation forest for multivariate time series data

Developments of Artificial Intelligence Technologies in Computation and Robotics ◽

10.1142/9789811223334_0071 ◽

2020 ◽

Author(s):

Phuong Hanh Tran ◽

Cédric Heuchenne ◽

Sébastien Thomassey

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Multivariate Time Series ◽

Series Data ◽

Detection Approach ◽

Isolation Forest

Download Full-text

A File System Construction Method in Sequential Data Storage System

Procedia Environmental Sciences ◽

10.1016/j.proenv.2012.01.339 ◽

2012 ◽

Vol 12 ◽

pp. 714-720

Author(s):

Wei Liu ◽

Chao Wang ◽

Chen Zhou

Keyword(s):

Data Storage ◽

File System ◽

Storage System ◽

Construction Method ◽

Sequential Data ◽

System Construction ◽

Data Storage System

Download Full-text

Benchmarking database systems for Genomic Selection implementation

Database ◽

10.1093/database/baz096 ◽

2019 ◽

Vol 2019 ◽

Cited By ~ 1

Author(s):

Yaw Nti-Addae ◽

Dave Matthews ◽

Victor Jun Ulat ◽

Raza Syed ◽

Guilhem Sempéré ◽

...

Keyword(s):

Genomic Selection ◽

Data Storage ◽

Storage Systems ◽

Storage System ◽

Database Systems ◽

Turnaround Time ◽

Breeding Programs ◽

Open Source Data ◽

Data Extract ◽

Data Storage System

Abstract Motivation With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. Results We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix. Availability http://gobiin1.bti.cornell.edu:6083/projects/GBM/repos/benchmarking/browse

Download Full-text

Storage systems for IT infrastructure

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2020.02-03.082 ◽

2020 ◽

pp. 082-093

Author(s):

S.Yu. Punda ◽

◽

Keyword(s):

Data Storage ◽

System Performance ◽

Storage Systems ◽

Storage System ◽

It Infrastructure ◽

Solid State Drives ◽

Advantages And Disadvantages ◽

System A ◽

Data Storage System ◽

Business User

A review of modern data storage architectures was conducted, the advantages and disadvantages of each of them were given. The data storage systems of the IBM FlashSystem family were analyzed, as well as Spectrum Virtualize software, which is responsible for virtualization, compression, distribution and replication of data stored on the storage system. A mathematical model of the data storage system of IBM Storwize v5030E was developed. Well-known metrics are used to evaluate its performance when using spindle and solid-state drives. The effect of hardware and software data compression on system performance has been experimentally revealed. Recommendations are formulated by which it is possible to determine which media and which technology stack should be used by a business user to complete the tasks assigned to him.

Download Full-text

MILAD: Robust Anomaly Detection for Electric Vehicles with Label Noise

Journal of Physics Conference Series ◽

10.1088/1742-6596/2132/1/012047 ◽

2021 ◽

Vol 2132 (1) ◽

pp. 012047

Author(s):

Yu Ye ◽

Bailin Feng ◽

Wujun Tao

Keyword(s):

Anomaly Detection ◽

Electric Vehicles ◽

Time Series Data ◽

Multivariate Time Series ◽

Series Data ◽

Battery System ◽

Label Noise ◽

Safety Hazard ◽

Detection Algorithms ◽

Potential Safety

Abstract One of the bottlenecks restricting the development of electric vehicle industry is the safety problem. Although numerous of anomaly detection algorithms for electric vehicles have been proposed, most of them may perform poorly due to the complexity and unpredictability of real scenes. We consider that there may be a certain degree of potential safety hazard in the battery system of electric vehicles before, during and after the process of faults in the real scenes, that is, label noise. In order to solve this problem, we propose a Multi-Instance Learning based Anomaly Detection (MILAD) framework, to perform anomaly detection for electric vehicles with label noise problem. Extensive cross validation experiments fully verify that the framework can effectively detect the existence of abnormal conditions in the presence of label noise in multivariate time series data.

Download Full-text

Overview of Big-Data-Intensive Storage and Its Technologies

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch002 ◽

2018 ◽

pp. 33-74

Author(s):

Richard S. Segall ◽

Jeffrey S. Cook

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Systems ◽

Storage System ◽

Management Strategies ◽

Sensor Data ◽

Data Intensive Computing ◽

Data Intensive ◽

Future Challenges ◽

Data Storage System

This chapter deals with a detailed discussion on the storage systems for data-intensive computing using Big Data. The chapter begins with a brief introduction about data-intensive computing and types of parallel processing approaches. It also highlights the points that display how data-intensive computing systems differ from other forms of computing. A discussion on the importance of Big Data computing is put forth. The current and future challenges of storage in genomics are discussed in detail. Also, storage and data management strategies are given. The chapter's focus is then on the software challenges for storage. Storage use cases are provided like DataDirect Networks, SDSC, etc. The list of storage tools and their details are provided. A small section discusses the sensor data storage system. Then a table is provided that shows the top 10 cloud storage systems for data-intensive computing using Big Data in the world. Top 500 Big Data storage servers statistics are also displayed effectively by the images from Top500 website.

Download Full-text

Anomaly detection based on multivariate data for the aircraft hydraulic system

Proceedings of the Institution of Mechanical Engineers Part I Journal of Systems and Control Engineering ◽

10.1177/0959651820954577 ◽

2020 ◽

pp. 095965182095457

Author(s):

Hongsheng Yan ◽

Jianzhong Sun ◽

Hongfu Zuo

Keyword(s):

Decision Making ◽

Anomaly Detection ◽

Hydraulic System ◽

Time Series Data ◽

Short Term Memory ◽

Detection Efficiency ◽

Multivariate Time Series ◽

Monitoring Data ◽

Series Data ◽

The Difference

It is almost impossible to detect the health status of the aircraft hydraulic system via a single variable, because of the complexity and the coupling relationship between components of the system. To serve the purpose, a novel anomaly detection method considering multivariate monitoring data is proposed in this article. The unsupervised auto-encoder model with the long short-term memory layers is used to reconstruct multivariate time series data, and a new comprehensive decision-making index based on two conventional ones is proposed to measure the difference between the observation and the reconstruction. Then, the health threshold of the decision-making index can be calculated by the kernel density estimation. The flight data are divided into several samples, and the anomaly detection of each sample is determined by the specific rule. The healthy status of each flight is determined by voting based on the detection results of all samples included in the flight. The performance of the proposed method is validated on the real continuous monitoring data, and the results confirm that the proposed model overcomes the problems of multistage and multivariate parameters in the anomaly detection of the aircraft system and improves the detection efficiency.

Download Full-text