From anomaly detection to rumour detection using data streams of social platforms

Abstract. Today, many processes at the Earth's surface are constantly monitored by multiple data streams. These observations have become central to advancing our understanding of vegetation dynamics in response to climate or land use change. Another set of important applications is monitoring effects of extreme climatic events, other disturbances such as fires, or abrupt land transitions. One important methodological question is how to reliably detect anomalies in an automated and generic way within multivariate data streams, which typically vary seasonally and are interconnected across variables. Although many algorithms have been proposed for detecting anomalies in multivariate data, only a few have been investigated in the context of Earth system science applications. In this study, we systematically combine and compare feature extraction and anomaly detection algorithms for detecting anomalous events. Our aim is to identify suitable workflows for automatically detecting anomalous patterns in multivariate Earth system data streams. We rely on artificial data that mimic typical properties and anomalies in multivariate spatiotemporal Earth observations like sudden changes in basic characteristics of time series such as the sample mean, the variance, changes in the cycle amplitude, and trends. This artificial experiment is needed as there is no gold standard for the identification of anomalies in real Earth observations. Our results show that a well-chosen feature extraction step (e.g., subtracting seasonal cycles, or dimensionality reduction) is more important than the choice of a particular anomaly detection algorithm. Nevertheless, we identify three detection algorithms (k-nearest neighbors mean distance, kernel density estimation, a recurrence approach) and their combinations (ensembles) that outperform other multivariate approaches as well as univariate extreme-event detection methods. Our results therefore provide an effective workflow to automatically detect anomalies in Earth system science data.

Download Full-text

Multivariate Anomaly Detection for Earth Observations: A Comparison of Algorithms and Feature Extraction Techniques

10.5194/esd-2016-51 ◽

2016 ◽

Cited By ~ 1

Author(s):

Milan Flach ◽

Fabian Gans ◽

Alexander Brenning ◽

Joachim Denzler ◽

Markus Reichstein ◽

...

Keyword(s):

Feature Extraction ◽

Anomaly Detection ◽

Data Streams ◽

Multivariate Data ◽

Detection Methods ◽

Earth System ◽

Earth System Science ◽

System Science ◽

Detection Algorithms ◽

Earth Observations

Abstract. Today, many processes at the Earth's surface are constantly monitored by multiple data streams. These observations have become central to advance our understanding of e.g. vegetation dynamics in response to climate or land use change. Another set of important applications is monitoring effects of climatic extreme events, other disturbances such as fires, or abrupt land transitions. One important methodological question is how to reliably detect anomalies in an automated and generic way within multivariate data streams, which typically vary seasonally and are interconnected across variables. Although many algorithms have been proposed for detecting anomalies in multivariate data, only few have been investigated in the context of Earth system science applications. In this study, we systematically combine and compare feature extraction and anomaly detection algorithms for detecting anomalous events. Our aim is to identify suitable workflows for automatically detecting anomalous patterns in multivariate Earth system data streams. We rely on artificial data that mimic typical properties and anomalies in multivariate spatiotemporal Earth observations. This artificial experiment is needed as there is no 'gold standard' for the identification of anomalies in real Earth observations. Our results show that a well chosen feature extraction step (e.g. subtracting seasonal cycles, or dimensionality reduction) is more important than the choice of a particular anomaly detection algorithm. Nevertheless, we identify 3 detection algorithms (k-nearest neighbours mean distance, kernel density estimation, a recurrence approach) and their combinations (ensembles) that outperform other multivariate approaches as well as univariate extreme event detection methods. Our results therefore provide an effective workflow to automatically detect anomalies in Earth system science data.

Download Full-text

Solution Pattern for Anomaly Detection in Financial Data Streams

Communications in Computer and Information Science - New Trends in Databases and Information Systems ◽

10.1007/978-3-030-30278-8_10 ◽

2019 ◽

pp. 77-84

Author(s):

Maciej Zakrzewicz ◽

Marek Wojciechowski ◽

Paweł Gławiński

Keyword(s):

Anomaly Detection ◽

Data Streams ◽

Financial Data

Download Full-text

A Survey of Anomaly Detection Using Data Mining Methods for Hypertext Transfer Protocol Web Services

Journal of Computer Science ◽

10.3844/jcssp.2015.89.97 ◽

2015 ◽

Vol 11 (1) ◽

pp. 89-97 ◽

Cited By ~ 3

Author(s):

Mohsen Kakavand ◽

Norwati Mustapha ◽

Aida Mustapha ◽

Mohd Taufik Abdullah ◽

Hamed Riahi

Keyword(s):

Data Mining ◽

Web Services ◽

Anomaly Detection ◽

Hypertext Transfer Protocol ◽

Mining Methods ◽

Using Data ◽

Transfer Protocol

Download Full-text

A Dynamic Subspace Anomaly Detection Method Using Generic Algorithm for Streaming Network Data

Handbook of Research on Emerging Developments in Data Privacy - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-7381-6.ch018 ◽

2015 ◽

pp. 403-425

Author(s):

Ji Zhang

Keyword(s):

Anomaly Detection ◽

Data Streams ◽

Training Data ◽

Detection Methods ◽

Network Data ◽

Data Generation ◽

Research Attention ◽

Network Connection ◽

Dimensional Network ◽

Anomaly Classification

A great deal of research attention has been paid to data mining on data streams in recent years. In this chapter, the authors carry out a case study of anomaly detection in large and high-dimensional network connection data streams using Stream Projected Outlier deTector (SPOT) that is proposed in Zhang et al. (2009) to detect anomalies from data streams using subspace analysis. SPOT is deployed on 1999 KDD CUP anomaly detection application. Innovative approaches for training data generation, anomaly classification, false positive reduction, and adoptive detection subspace generation are proposed in this chapter as well. Experimental results demonstrate that SPOT is effective and efficient in detecting anomalies from network data streams and outperforms existing anomaly detection methods.

Download Full-text

ELOF: fast and memory-efficient anomaly detection algorithm in data streams

Soft Computing ◽

10.1007/s00500-020-05442-1 ◽

2020 ◽

Author(s):

Yun Yang ◽

Liang Chen ◽

ChongJun Fan

Keyword(s):

Anomaly Detection ◽

Data Streams ◽

Detection Algorithm ◽

Memory Efficient

Download Full-text

Data-Driven Modelling of Smart Building Ventilation Subsystem

Journal of Sensors ◽

10.1155/2019/3572019 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14 ◽

Cited By ~ 5

Author(s):

Grigore Stamatescu ◽

Iulia Stamatescu ◽

Nicoleta Arghira ◽

Ioana Fagarasan

Keyword(s):

Data Mining ◽

Data Streams ◽

Data Driven ◽

Support Vector ◽

Commercial Building ◽

Monitoring And Control ◽

Smart Building ◽

Building Ventilation ◽

Using Data ◽

Rich Data

Considering the advances in building monitoring and control through networks of interconnected devices, effective handling of the associated rich data streams is becoming an important challenge. In many situations, the application of conventional system identification or approximate grey-box models, partly theoretic and partly data driven, is either unfeasible or unsuitable. The paper discusses and illustrates an application of black-box modelling achieved using data mining techniques with the purpose of smart building ventilation subsystem control. We present the implementation and evaluation of a data mining methodology on collected data from over one year of operation. The case study is carried out on four air handling units of a modern campus building for preliminary decision support for facility managers. The data processing and learning framework is based on two steps: raw data streams are compressed using the Symbolic Aggregate Approximation method, followed by the resulting segments being input into a Support Vector Machine algorithm. The results are useful for deriving the behaviour of each equipment in various modi of operation and can be built upon for fault detection or energy efficiency applications. Challenges related to online operation within a commercial Building Management System are also discussed as the approach shows promise for deployment.

Download Full-text

Sequential Model-Free Anomaly Detection for Big Data Streams

2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton) ◽

10.1109/allerton.2019.8919759 ◽

2019 ◽

Author(s):

Mehmet Necip Kurt ◽

Yasin Yilmaz ◽

Xiaodong Wang

Keyword(s):

Big Data ◽

Anomaly Detection ◽

Data Streams ◽

Sequential Model ◽

Model Free ◽

Big Data Streams

Download Full-text