An Ensemble Classifier Algorithm for Mining Data Streams Based on Concept Drift

Author(s):  
Yushui Geng ◽  
Jianguo Zhang
Author(s):  
Prasanna Lakshmi Kompalli

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.


Author(s):  
Bartosz Krawczyk ◽  
Alberto Cano

Learning from data streams is among the most vital contemporary fields in machine learning and data mining. Streams pose new challenges to learning systems, due to their volume and velocity, as well as ever-changing nature caused by concept drift. Vast majority of works for data streams assume a fully supervised learning scenario, having an unrestricted access to class labels. This assumption does not hold in real-world applications, where obtaining ground truth is costly and time-consuming. Therefore, we need to carefully select which instances should be labeled, as usually we are working under a strict label budget. In this paper, we propose a novel active learning approach based on ensemble algorithms that is capable of using multiple base classifiers during the label query process. It is a plug-in solution, capable of working with most of existing streaming ensemble classifiers. We realize this process as a Multi-Armed Bandit problem, obtaining an efficient and adaptive ensemble active learning procedure by selecting the most competent classifier from the pool for each query. In order to better adapt to concept drifts, we guide our instance selection by measuring the generalization capabilities of our classifiers. This adaptive solution leads not only to better instance selection under sparse access to class labels, but also to improved adaptation to various types of concept drift and increasing the diversity of the underlying ensemble classifier.


2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Agustín Ortíz Díaz ◽  
José del Campo-Ávila ◽  
Gonzalo Ramos-Jiménez ◽  
Isvani Frías Blanco ◽  
Yailé Caballero Mota ◽  
...  

The treatment of large data streams in the presence of concept drifts is one of the main challenges in the field of data mining, particularly when the algorithms have to deal with concepts that disappear and then reappear. This paper presents a new algorithm, called Fast Adapting Ensemble (FAE), which adapts very quickly to both abrupt and gradual concept drifts, and has been specifically designed to deal with recurring concepts. FAE processes the learning examples in blocks of the same size, but it does not have to wait for the batch to be complete in order to adapt its base classification mechanism. FAE incorporates a drift detector to improve the handling of abrupt concept drifts and stores a set of inactive classifiers that represent old concepts, which are activated very quickly when these concepts reappear. We compare our new algorithm with various well-known learning algorithms, taking into account, common benchmark datasets. The experiments show promising results from the proposed algorithm (regarding accuracy and runtime), handling different types of concept drifts.


2014 ◽  
Vol 35 ◽  
pp. 397-406 ◽  
Author(s):  
Ireneusz Czarnowski ◽  
Piotr Jędrzejowicz

2019 ◽  
Vol 11 (1) ◽  
pp. 29-48 ◽  
Author(s):  
Mohammed Ahmed Ali Abdualrhman ◽  
M C Padma

The data in streaming environment tends to be non-stationary. Hence, frequent and irregular changes occur in data, which usually denotes as a concept drift related to the process of classifying data streams. Depiction of the concept drift in traditional phase of data stream mining demands availability of labelled samples; however, incorporating the label to a streamlining transaction is infeasible in terms of process time and resource utilization. In this article, deterministic concept drift detection (DCDD) in ensemble classifier-based data stream classification process is proposed, which can depict a concept drift regardless of the labels assigned to samples. The depicted model of DCDD is evaluated by experimental study on dataset called poker-hand. The experimental result showing that the proposed model is accurate and scalable to detect concept drift with high drift detection rate and minimal false alarming and missing rate that compared to other contemporary models.


2015 ◽  
Vol 5 (1) ◽  
pp. 26-41 ◽  
Author(s):  
Jean Paul Barddal ◽  
Heitor Murilo Gomes ◽  
Fabrício Enembreck

Mining data streams is one of the main studies in machine learning area due to its application in many knowledge areas. One of the major challenges on mining data streams is concept drift, which requires the learner to discard the current concept and adapt to a new one. Ensemble-based drift detection algorithms have been used successfully to the classification task but usually maintain a fixed size ensemble of learners running the risk of needlessly spending processing time and memory. In this paper the authors present improvements to the Scale-free Network Regressor (SFNR), a dynamic ensemble-based method for regression that employs social networks theory. In order to detect concept drifts SFNR uses the Adaptive Window (ADWIN) algorithm. Results show improvements in accuracy, especially in concept drift situations and better performance compared to other state-of-the-art algorithms in both real and synthetic data.


Smart Cities ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 349-371
Author(s):  
Hassan Mehmood ◽  
Panos Kostakos ◽  
Marta Cortes ◽  
Theodoros Anagnostopoulos ◽  
Susanna Pirttikangas ◽  
...  

Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


Sign in / Sign up

Export Citation Format

Share Document