An Ensemble Classifier Algorithm for Mining Data Streams Based on Concept Drift

2019 ◽

pp. 19-39

Author(s):

Prasanna Lakshmi Kompalli

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Data Stream Mining ◽

Time Data ◽

Stream Mining ◽

New Challenges ◽

Mining Data Streams ◽

Different Sources

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.

Download Full-text

An ensemble classifier method for classifying data streams with recurrent concept drift

4th International Conference on Awareness Science and Technology ◽

10.1109/icawst.2012.6469580 ◽

2012 ◽

Author(s):

Guiying Wei ◽

Tao Zhang ◽

Sen Wu ◽

Lei Zou

Keyword(s):

Data Streams ◽

Concept Drift ◽

Ensemble Classifier

Download Full-text

Adaptive Ensemble Active Learning for Drifting Data Stream Mining

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/383 ◽

2019 ◽

Cited By ~ 3

Author(s):

Bartosz Krawczyk ◽

Alberto Cano

Keyword(s):

Active Learning ◽

Data Streams ◽

Concept Drift ◽

Ground Truth ◽

Ensemble Classifier ◽

Instance Selection ◽

Ensemble Classifiers ◽

Query Process ◽

Unrestricted Access ◽

Class Labels

Learning from data streams is among the most vital contemporary fields in machine learning and data mining. Streams pose new challenges to learning systems, due to their volume and velocity, as well as ever-changing nature caused by concept drift. Vast majority of works for data streams assume a fully supervised learning scenario, having an unrestricted access to class labels. This assumption does not hold in real-world applications, where obtaining ground truth is costly and time-consuming. Therefore, we need to carefully select which instances should be labeled, as usually we are working under a strict label budget. In this paper, we propose a novel active learning approach based on ensemble algorithms that is capable of using multiple base classifiers during the label query process. It is a plug-in solution, capable of working with most of existing streaming ensemble classifiers. We realize this process as a Multi-Armed Bandit problem, obtaining an efficient and adaptive ensemble active learning procedure by selecting the most competent classifier from the pool for each query. In order to better adapt to concept drifts, we guide our instance selection by measuring the generalization capabilities of our classifiers. This adaptive solution leads not only to better instance selection under sparse access to class labels, but also to improved adaptation to various types of concept drift and increasing the diversity of the underlying ensemble classifier.

Download Full-text

Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift

The Scientific World JOURNAL ◽

10.1155/2015/235810 ◽

2015 ◽

Vol 2015 ◽

pp. 1-14 ◽

Cited By ~ 6

Author(s):

Agustín Ortíz Díaz ◽

José del Campo-Ávila ◽

Gonzalo Ramos-Jiménez ◽

Isvani Frías Blanco ◽

Yailé Caballero Mota ◽

...

Keyword(s):

Data Mining ◽

Data Streams ◽

Concept Drift ◽

Learning Algorithms ◽

Large Data ◽

Different Types ◽

Benchmark Datasets ◽

Mining Data Streams ◽

Concept Drifts

The treatment of large data streams in the presence of concept drifts is one of the main challenges in the field of data mining, particularly when the algorithms have to deal with concepts that disappear and then reappear. This paper presents a new algorithm, called Fast Adapting Ensemble (FAE), which adapts very quickly to both abrupt and gradual concept drifts, and has been specifically designed to deal with recurring concepts. FAE processes the learning examples in blocks of the same size, but it does not have to wait for the batch to be complete in order to adapt its base classification mechanism. FAE incorporates a drift detector to improve the handling of abrupt concept drifts and stores a set of inactive classifiers that represent old concepts, which are activated very quickly when these concepts reappear. We compare our new algorithm with various well-known learning algorithms, taking into account, common benchmark datasets. The experiments show promising results from the proposed algorithm (regarding accuracy and runtime), handling different types of concept drifts.

Download Full-text

Ensemble Classifier for Mining Data Streams

Procedia Computer Science ◽

10.1016/j.procs.2014.08.120 ◽

2014 ◽

Vol 35 ◽

pp. 397-406 ◽

Cited By ~ 11

Author(s):

Ireneusz Czarnowski ◽

Piotr Jędrzejowicz

Keyword(s):

Data Streams ◽

Ensemble Classifier ◽

Mining Data Streams

Download Full-text

Batch Weighted Ensemble for Mining Data Streams with Concept Drift

Lecture Notes in Computer Science - Foundations of Intelligent Systems ◽

10.1007/978-3-642-21916-0_32 ◽

2011 ◽

pp. 290-299 ◽

Cited By ~ 5

Author(s):

Magdalena Deckert

Keyword(s):

Data Streams ◽

Concept Drift ◽

Mining Data Streams

Download Full-text

Deterministic Concept Drift Detection in Ensemble Classifier Based Data Stream Classification Process

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2019010103 ◽

2019 ◽

Vol 11 (1) ◽

pp. 29-48 ◽

Cited By ~ 2

Author(s):

Mohammed Ahmed Ali Abdualrhman ◽

M C Padma

Keyword(s):

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Ensemble Classifier ◽

Experimental Result ◽

Process Time ◽

Stream Classification ◽

Data Stream Classification ◽

Proposed Model ◽

Concept Drift Detection

The data in streaming environment tends to be non-stationary. Hence, frequent and irregular changes occur in data, which usually denotes as a concept drift related to the process of classifying data streams. Depiction of the concept drift in traditional phase of data stream mining demands availability of labelled samples; however, incorporating the label to a streamlining transaction is infeasible in terms of process time and resource utilization. In this article, deterministic concept drift detection (DCDD) in ensemble classifier-based data stream classification process is proposed, which can depict a concept drift regardless of the labels assigned to samples. The depicted model of DCDD is evaluated by experimental study on dataset called poker-hand. The experimental result showing that the proposed model is accurate and scalable to detect concept drift with high drift detection rate and minimal false alarming and missing rate that compared to other contemporary models.

Download Full-text

Advances on Concept Drift Detection in Regression Tasks Using Social Networks Theory

International Journal of Natural Computing Research ◽

10.4018/ijncr.2015010102 ◽

2015 ◽

Vol 5 (1) ◽

pp. 26-41 ◽

Cited By ~ 4

Author(s):

Jean Paul Barddal ◽

Heitor Murilo Gomes ◽

Fabrício Enembreck

Keyword(s):

Social Networks ◽

Data Streams ◽

Concept Drift ◽

Synthetic Data ◽

Scale Free Network ◽

Scale Free ◽

Detection Algorithms ◽

Adaptive Window ◽

Free Network ◽

Mining Data Streams

Mining data streams is one of the main studies in machine learning area due to its application in many knowledge areas. One of the major challenges on mining data streams is concept drift, which requires the learner to discard the current concept and adapt to a new one. Ensemble-based drift detection algorithms have been used successfully to the classification task but usually maintain a fixed size ensemble of learners running the risk of needlessly spending processing time and memory. In this paper the authors present improvements to the Scale-free Network Regressor (SFNR), a dynamic ensemble-based method for regression that employs social networks theory. In order to detect concept drifts SFNR uses the Adaptive Window (ADWIN) algorithm. Results show improvements in accuracy, especially in concept drift situations and better performance compared to other state-of-the-art algorithms in both real and synthetic data.

Download Full-text

Concept Drift Adaptation Techniques in Distributed Environment for Real-World Data Streams

Smart Cities ◽

10.3390/smartcities4010021 ◽

2021 ◽

Vol 4 (1) ◽

pp. 349-371

Author(s):

Hassan Mehmood ◽

Panos Kostakos ◽

Marta Cortes ◽

Theodoros Anagnostopoulos ◽

Susanna Pirttikangas ◽

...

Keyword(s):

Real World ◽

Data Streams ◽

Smart City ◽

Smart Cities ◽

Concept Drift ◽

Distributed Environment ◽

Real World Data ◽

Unique Challenge ◽

World Data ◽

Concept Drift Detection

Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.

Download Full-text

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

Entropy ◽

10.3390/e23070859 ◽

2021 ◽

Vol 23 (7) ◽

pp. 859

Author(s):

Abdulaziz O. AlQabbany ◽

Aqil M. Azmi

Keyword(s):

Big Data ◽

Random Forest ◽

Real Time ◽

Data Streams ◽

Learning Algorithm ◽

Concept Drift ◽

The United States ◽

Careful Consideration ◽

Data Sets ◽

Stream Data

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

Download Full-text

An Ensemble Classifier Algorithm for Mining Data Streams Based on Concept Drift

Knowledge Discovery From Evolving Data Streams

An ensemble classifier method for classifying data streams with recurrent concept drift

Adaptive Ensemble Active Learning for Drifting Data Stream Mining

Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift

Ensemble Classifier for Mining Data Streams

Batch Weighted Ensemble for Mining Data Streams with Concept Drift

Deterministic Concept Drift Detection in Ensemble Classifier Based Data Stream Classification Process

Advances on Concept Drift Detection in Regression Tasks Using Social Networks Theory

Concept Drift Adaptation Techniques in Distributed Environment for Real-World Data Streams

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

Export Citation Format