Online Clustering for Novelty Detection and Concept Drift in Data Streams

Concept drift robust adaptive novelty detection for data streams

Neurocomputing ◽

10.1016/j.neucom.2018.04.069 ◽

2018 ◽

Vol 309 ◽

pp. 46-53 ◽

Cited By ~ 7

Author(s):

Matous Cejnek ◽

Ivo Bukovsky

Keyword(s):

Data Streams ◽

Concept Drift ◽

Novelty Detection ◽

Robust Adaptive

Download Full-text

Concept Drift Adaptation Techniques in Distributed Environment for Real-World Data Streams

Smart Cities ◽

10.3390/smartcities4010021 ◽

2021 ◽

Vol 4 (1) ◽

pp. 349-371

Author(s):

Hassan Mehmood ◽

Panos Kostakos ◽

Marta Cortes ◽

Theodoros Anagnostopoulos ◽

Susanna Pirttikangas ◽

...

Keyword(s):

Real World ◽

Data Streams ◽

Smart City ◽

Smart Cities ◽

Concept Drift ◽

Distributed Environment ◽

Real World Data ◽

Unique Challenge ◽

World Data ◽

Concept Drift Detection

Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.

Download Full-text

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

Entropy ◽

10.3390/e23070859 ◽

2021 ◽

Vol 23 (7) ◽

pp. 859

Author(s):

Abdulaziz O. AlQabbany ◽

Aqil M. Azmi

Keyword(s):

Big Data ◽

Random Forest ◽

Real Time ◽

Data Streams ◽

Learning Algorithm ◽

Concept Drift ◽

The United States ◽

Careful Consideration ◽

Data Sets ◽

Stream Data

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

Download Full-text

Predicting Concept Drift in Data Streams Using Metadata Clustering

2018 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2018.8489606 ◽

2018 ◽

Cited By ~ 1

Author(s):

Robert Anderson ◽

Yun Sing Koh ◽

Gillian Dobbie

Keyword(s):

Data Streams ◽

Concept Drift

Download Full-text

Automatically Optimized Gradient Boosting Trees for Classifying Large Volume High Cardinality Data Streams Under Concept Drift

The NeurIPS '18 Competition - The Springer Series on Challenges in Machine Learning ◽

10.1007/978-3-030-29135-8_13 ◽

2019 ◽

pp. 317-335

Author(s):

Jobin Wilson ◽

Amit Kumar Meher ◽

Bivin Vinodkumar Bindu ◽

Santanu Chaudhury ◽

Brejesh Lall ◽

...

Keyword(s):

Large Volume ◽

Data Streams ◽

Concept Drift ◽

Gradient Boosting

Download Full-text

Knowledge Discovery From Evolving Data Streams

Advances in Business Information Systems and Analytics - Machine Learning Techniques for Improved Business Analytics ◽

10.4018/978-1-5225-3534-8.ch002 ◽

2019 ◽

pp. 19-39

Author(s):

Prasanna Lakshmi Kompalli

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Data Stream Mining ◽

Time Data ◽

Stream Mining ◽

New Challenges ◽

Mining Data Streams ◽

Different Sources

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.

Download Full-text

Concept Drift Detection in Data Streams

Practical Machine Learning for Streaming Data with Python ◽

10.1007/978-1-4842-6867-4_2 ◽

2021 ◽

pp. 31-55

Author(s):

Sayan Putatunda

Keyword(s):

Data Streams ◽

Concept Drift ◽

Concept Drift Detection

Download Full-text

On learning guarantees to unsupervised concept drift detection on data streams

Expert Systems with Applications ◽

10.1016/j.eswa.2018.08.054 ◽

2019 ◽

Vol 117 ◽

pp. 90-102 ◽

Cited By ~ 9

Author(s):

Rodrigo F. de Mello ◽

Yule Vaz ◽

Carlos H. Grossi ◽

Albert Bifet

Keyword(s):

Data Streams ◽

Concept Drift ◽

Concept Drift Detection

Download Full-text

On ensemble components selection in data streams scenario with reoccurring concept-drift

2017 IEEE Symposium Series on Computational Intelligence (SSCI) ◽

10.1109/ssci.2017.8285362 ◽

2017 ◽

Cited By ~ 9

Author(s):

Piotr Duda ◽

Maciej Jaworski ◽

Leszek Rutkowski

Keyword(s):

Data Streams ◽

Concept Drift

Download Full-text

Novelty Detection and Online Learning for Chunk Data Streams

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2020.2965531 ◽

2020 ◽

pp. 1-1

Author(s):

Yi Wang ◽

Yi Ding ◽

Xiangjian He ◽

Xin Fan ◽

Chi Lin ◽

...

Keyword(s):

Online Learning ◽

Data Streams ◽

Novelty Detection

Download Full-text