Adaptive Ensemble Active Learning for Drifting Data Stream Mining

Learning from data streams is among the most vital contemporary fields in machine learning and data mining. Streams pose new challenges to learning systems, due to their volume and velocity, as well as ever-changing nature caused by concept drift. Vast majority of works for data streams assume a fully supervised learning scenario, having an unrestricted access to class labels. This assumption does not hold in real-world applications, where obtaining ground truth is costly and time-consuming. Therefore, we need to carefully select which instances should be labeled, as usually we are working under a strict label budget. In this paper, we propose a novel active learning approach based on ensemble algorithms that is capable of using multiple base classifiers during the label query process. It is a plug-in solution, capable of working with most of existing streaming ensemble classifiers. We realize this process as a Multi-Armed Bandit problem, obtaining an efficient and adaptive ensemble active learning procedure by selecting the most competent classifier from the pool for each query. In order to better adapt to concept drifts, we guide our instance selection by measuring the generalization capabilities of our classifiers. This adaptive solution leads not only to better instance selection under sparse access to class labels, but also to improved adaptation to various types of concept drift and increasing the diversity of the underlying ensemble classifier.

Download Full-text

A novel concept drift detection method in data streams using ensemble classifiers

Intelligent Data Analysis ◽

10.3233/ida-150207 ◽

2016 ◽

Vol 20 (6) ◽

pp. 1329-1350 ◽

Cited By ~ 8

Author(s):

Mahdie Dehghan ◽

Hamid Beigy ◽

Poorya ZareMoodi

Keyword(s):

Data Streams ◽

Detection Method ◽

Concept Drift ◽

Ensemble Classifiers ◽

Concept Drift Detection ◽

Novel Concept

Download Full-text

An ensemble classifier method for classifying data streams with recurrent concept drift

4th International Conference on Awareness Science and Technology ◽

10.1109/icawst.2012.6469580 ◽

2012 ◽

Author(s):

Guiying Wei ◽

Tao Zhang ◽

Sen Wu ◽

Lei Zou

Keyword(s):

Data Streams ◽

Concept Drift ◽

Ensemble Classifier

Download Full-text

A comprehensive active learning method for multiclass imbalanced data streams with concept drift

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.106778 ◽

2021 ◽

Vol 215 ◽

pp. 106778

Author(s):

Weike Liu ◽

Hang Zhang ◽

Zhaoyun Ding ◽

Qingbao Liu ◽

Cheng Zhu

Keyword(s):

Active Learning ◽

Data Streams ◽

Concept Drift ◽

Imbalanced Data ◽

Learning Method ◽

Active Learning Method

Download Full-text

An Ensemble Classifier Algorithm for Mining Data Streams Based on Concept Drift

2017 10th International Symposium on Computational Intelligence and Design (ISCID) ◽

10.1109/iscid.2017.121 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yushui Geng ◽

Jianguo Zhang

Keyword(s):

Data Streams ◽

Concept Drift ◽

Ensemble Classifier ◽

Mining Data Streams

Download Full-text

Minority Resampling Based Ensemble Framework Using Enhanced Early Drift Detection Method For Imbalanced Data Streams

10.21203/rs.3.rs-141880/v1 ◽

2021 ◽

Author(s):

Priya S ◽

Annie Uthra

Keyword(s):

Data Streams ◽

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Class Imbalance ◽

Current Data ◽

Classification Model ◽

Ensemble Classifiers ◽

K Nearest Neighbor ◽

Jaccard Similarity

Abstract As the data mining applications are increasing popularly, large volumes of data streams are generated over the period of time. The main problem in data streams is that it exhibits a high degree of class imbalance and distribution of data changes over time. In this paper, Timely Drift Detection and Minority Resampling Technique (TDDMRT) based on K-nearest neighbor and Jaccard similarity is proposed to handle the class imbalance by finding the current ratio of class labels. The Enhanced Early Drift Detection Method (EEDDM) is proposed for detecting the concept drift and the Minority Resampling Method (KNN-JS) determines whether the current data stream should be regarded as imbalance and it resamples the minority instances in the drifting data stream. The K-Nearest Neighbors technique is used to resample the minority classes and the Jaccard similarity measure is established over the resampled data to generate the synthetic data similar to the original data and it is handled by ensemble classifiers. The proposed ensemble based classification model outperforms the existing over sampling and under sampling techniques with accuracy of 98.52%.

Download Full-text

An active learning method for data streams with concept drift

2016 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2016.7840667 ◽

2016 ◽

Cited By ~ 1

Author(s):

Cheong Hee Park ◽

Youngsoon Kang

Keyword(s):

Active Learning ◽

Data Streams ◽

Concept Drift ◽

Learning Method ◽

Active Learning Method

Download Full-text

Deterministic Concept Drift Detection in Ensemble Classifier Based Data Stream Classification Process

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2019010103 ◽

2019 ◽

Vol 11 (1) ◽

pp. 29-48 ◽

Cited By ~ 2

Author(s):

Mohammed Ahmed Ali Abdualrhman ◽

M C Padma

Keyword(s):

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Ensemble Classifier ◽

Experimental Result ◽

Process Time ◽

Stream Classification ◽

Data Stream Classification ◽

Proposed Model ◽

Concept Drift Detection

The data in streaming environment tends to be non-stationary. Hence, frequent and irregular changes occur in data, which usually denotes as a concept drift related to the process of classifying data streams. Depiction of the concept drift in traditional phase of data stream mining demands availability of labelled samples; however, incorporating the label to a streamlining transaction is infeasible in terms of process time and resource utilization. In this article, deterministic concept drift detection (DCDD) in ensemble classifier-based data stream classification process is proposed, which can depict a concept drift regardless of the labels assigned to samples. The depicted model of DCDD is evaluated by experimental study on dataset called poker-hand. The experimental result showing that the proposed model is accurate and scalable to detect concept drift with high drift detection rate and minimal false alarming and missing rate that compared to other contemporary models.

Download Full-text

Concept Drift Adaptation Techniques in Distributed Environment for Real-World Data Streams

Smart Cities ◽

10.3390/smartcities4010021 ◽

2021 ◽

Vol 4 (1) ◽

pp. 349-371

Author(s):

Hassan Mehmood ◽

Panos Kostakos ◽

Marta Cortes ◽

Theodoros Anagnostopoulos ◽

Susanna Pirttikangas ◽

...

Keyword(s):

Real World ◽

Data Streams ◽

Smart City ◽

Smart Cities ◽

Concept Drift ◽

Distributed Environment ◽

Real World Data ◽

Unique Challenge ◽

World Data ◽

Concept Drift Detection

Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.

Download Full-text

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

Entropy ◽

10.3390/e23070859 ◽

2021 ◽

Vol 23 (7) ◽

pp. 859

Author(s):

Abdulaziz O. AlQabbany ◽

Aqil M. Azmi

Keyword(s):

Big Data ◽

Random Forest ◽

Real Time ◽

Data Streams ◽

Learning Algorithm ◽

Concept Drift ◽

The United States ◽

Careful Consideration ◽

Data Sets ◽

Stream Data

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

Download Full-text

Online Active Learning for Drifting Data Streams

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3091681 ◽

2021 ◽

pp. 1-15

Author(s):

Sanmin Liu ◽

Shan Xue ◽

Jia Wu ◽

Chuan Zhou ◽

Jian Yang ◽

...

Keyword(s):

Active Learning ◽

Data Streams

Download Full-text