Efficient Ensemble Classification for Multi-Label Data Streams with Concept Drift

Most existing multi-label data streams classification methods focus on extending single-label streams classification approaches to multi-label cases, without considering the special characteristics of multi-label stream data, such as label dependency, concept drift, and recurrent concepts. Motivated by these challenges, we devise an efficient ensemble paradigm for multi-label data streams classification. The algorithm deploys a novel change detection based on Jensen–Shannon divergence to identify different kinds of concept drift in data streams. Moreover, our method tries to consider label dependency by pruning away infrequent label combinations to enhance classification performance. Empirical results on both synthetic and real-world datasets have demonstrated its effectiveness.

Download Full-text

An overview of complex data stream ensemble classification

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211100 ◽

2021 ◽

pp. 1-29

Author(s):

Xilong Zhang ◽

Meng Han ◽

Hongxin Wu ◽

Muhang Li ◽

Zhiqiang Chen

Keyword(s):

Data Streams ◽

Concept Drift ◽

Rapid Development ◽

Complex Structure ◽

Classification Performance ◽

Ensemble Classification ◽

Future Research ◽

Complex Data ◽

Advantages And Disadvantages ◽

Application Fields

With the rapid development of information technology, data streams in various fields are showing the characteristics of rapid arrival, complex structure and timely processing. Complex types of data streams make the classification performance worse. However, ensemble classification has become one of the main methods of processing data streams. Ensemble classification performance is better than traditional single classifiers. This article introduces the ensemble classification algorithms of complex data streams for the first time. Then overview analyzes the advantages and disadvantages of these algorithms for steady-state, concept drift, imbalanced, multi-label and multi-instance data streams. At the same time, the application fields of data streams are also introduced which summarizes the ensemble algorithms processing text, graph and big data streams. Moreover, it comprehensively summarizes the verification technology, evaluation indicators and open source platforms of complex data streams mining algorithms. Finally, the challenges and future research directions of ensemble learning algorithms dealing with uncertain, multi-type, delayed, multi-type concept drift data streams are given.

Download Full-text

Dynamically Adjusting Diversity in Ensembles for the Classification of Data Streams with Concept Drift

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3466616 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Juan I. G. Hidalgo ◽

Silas G. T. C. Santos ◽

Roberto S. M. Barros

Keyword(s):

Parameter Estimation ◽

Real World ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Estimation Method ◽

Estimation Procedure ◽

Dynamic Parameter ◽

Real World Datasets ◽

Concept Drifts

A data stream can be defined as a system that continually generates a lot of data over time. Today, processing data streams requires new demands and challenging tasks in the data mining and machine learning areas. Concept Drift is a problem commonly characterized as changes in the distribution of the data within a data stream. The implementation of new methods for dealing with data streams where concept drifts occur requires algorithms that can adapt to several scenarios to improve its performance in the different experimental situations where they are tested. This research proposes a strategy for dynamic parameter adjustment in the presence of concept drifts. Parameter Estimation Procedure (PEP) is a general method proposed for dynamically adjusting parameters which is applied to the diversity parameter (λ) of several classification ensembles commonly used in the area. To this end, the proposed estimation method (PEP) was used to create Boosting-like Online Learning Ensemble with Parameter Estimation (BOLE-PE), Online AdaBoost-based M1 with Parameter Estimation (OABM1-PE), and Oza and Russell’s Online Bagging with Parameter Estimation (OzaBag-PE), based on the existing ensembles BOLE, OABM1, and OzaBag, respectively. To validate them, experiments were performed with artificial and real-world datasets using Hoeffding Tree (HT) as base classifier. The accuracy results were statistically evaluated using a variation of the Friedman test and the Nemenyi post-hoc test. The experimental results showed that the application of the dynamic estimation in the diversity parameter (λ) produced good results in most scenarios, i.e., the modified methods have improved accuracy in the experiments with both artificial and real-world datasets.

Download Full-text

Concept Drift Adaptation Techniques in Distributed Environment for Real-World Data Streams

Smart Cities ◽

10.3390/smartcities4010021 ◽

2021 ◽

Vol 4 (1) ◽

pp. 349-371

Author(s):

Hassan Mehmood ◽

Panos Kostakos ◽

Marta Cortes ◽

Theodoros Anagnostopoulos ◽

Susanna Pirttikangas ◽

...

Keyword(s):

Real World ◽

Data Streams ◽

Smart City ◽

Smart Cities ◽

Concept Drift ◽

Distributed Environment ◽

Real World Data ◽

Unique Challenge ◽

World Data ◽

Concept Drift Detection

Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.

Download Full-text

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

Entropy ◽

10.3390/e23070859 ◽

2021 ◽

Vol 23 (7) ◽

pp. 859

Author(s):

Abdulaziz O. AlQabbany ◽

Aqil M. Azmi

Keyword(s):

Big Data ◽

Random Forest ◽

Real Time ◽

Data Streams ◽

Learning Algorithm ◽

Concept Drift ◽

The United States ◽

Careful Consideration ◽

Data Sets ◽

Stream Data

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

Download Full-text

Weighted Ensemble Classification of Multi-label Data Streams

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-319-57529-2_43 ◽

2017 ◽

pp. 551-562 ◽

Cited By ~ 4

Author(s):

Lulu Wang ◽

Hong Shen ◽

Hui Tian

Keyword(s):

Data Streams ◽

Ensemble Classification ◽

Label Data

Download Full-text

A Review of Classification and Novel Class Detection Technique of Data Streams

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v3i2c.2891 ◽

2012 ◽

Vol 3 (2) ◽

pp. 314-316

Author(s):

Manish Rai ◽

Rekha Pandit

Keyword(s):

Machine Learning ◽

Data Streams ◽

Concept Drift ◽

Data Classification ◽

Classification Model ◽

Infinite Length ◽

Stream Data ◽

Machine Learning Technique ◽

Feature Evaluation ◽

Learning Technique

Stream data classification suffered from a problem of infinite length, concept evaluation, feature evaluation and data drift. Data stream labeling is more challenging than label static data because of several unique properties of data streams. Data streams are suppose to have infinite length, which makes it difficult to store and use all the historical data for training. Earlier multi-pass machine learning technique is not directly applied to data streams. Data streams discover concept-drift, which occurs when the discontinue concept of the data changes over time. In order to address concept drift, a classification model must endlessly adapt itself to the most recent concept. Various authors reduce these problem using machine learning approach and feature optimization technique. In this paper we present various method for reducing such problem occurred in stream data classification. Here we also discuss a machine learning technique for feature evaluation process for generation of novel class.

Download Full-text

Learning from Unbalanced Stream Data in Non-Stationary Environments Using Logistic Regression Model

Handbook of Research on Natural Computing for Optimization Problems - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-0058-2.ch023 ◽

2016 ◽

pp. 561-582

Author(s):

Pallavi Digambarrao Kulkarni ◽

Roshani Ade

Keyword(s):

Learning Strategies ◽

Real World ◽

Incremental Learning ◽

Concept Drift ◽

Data Distribution ◽

Class Imbalance ◽

Learning Approaches ◽

Stream Data ◽

Future Data ◽

Distribution Generation

There are several deep learning approaches that can be applied for analyzing situations in real world problems and inventing their solution in a scientific technique. Supervised data mining methods that predicts instance values, using previously obtained results from already collected data are pretty popular due to their intelligence in machine learning area. Stream data is continuous form of data which can be handled by using incremental learning approach. Stream data learning may face several challenges in real world like concept drift or class imbalance. Concept drift occurs in non-stationary environment where data distribution generation function is dynamic in nature and has no fixed formula to predict the future data distribution nature. Neural network techniques are intelligent enough to improve performance of algorithmic systems that work in such problem domains. This chapter briefly describes how MLP technique is integrated in system so that the system becomes a complete framework for handling unbalanced data with concept drift in the incremental learning strategies.

Download Full-text

A Classifier Graph Based Recurring Concept Detection and Prediction Approach

Computational Intelligence and Neuroscience ◽

10.1155/2018/4276291 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Yange Sun ◽

Zhihai Wang ◽

Yang Bai ◽

Honghua Dai ◽

Saeid Nahavandi

Keyword(s):

Real World ◽

Data Streams ◽

Concept Drift ◽

Learning Performance ◽

Concept Detection ◽

Real World Data ◽

Full Account ◽

World Data ◽

Prediction Approach ◽

Better Than

It is common in real-world data streams that previously seen concepts will reappear, which suggests a unique kind of concept drift, known as recurring concepts. Unfortunately, most of existing algorithms do not take full account of this case. Motivated by this challenge, a novel paradigm was proposed for capturing and exploiting recurring concepts in data streams. It not only incorporates a distribution-based change detector for handling concept drift but also captures recurring concept by storing recurring concepts in a classifier graph. The possibility of detecting recurring drifts allows reusing previously learnt models and enhancing the overall learning performance. Extensive experiments on both synthetic and real-world data streams reveal that the approach performs significantly better than the state-of-the-art algorithms, especially when concepts reappear.

Download Full-text

ADES: A New Ensemble Diversity-Based Approach for Handling Concept Drift

Mobile Information Systems ◽

10.1155/2021/5549300 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Tinofirei Museba ◽

Fulufhelo Nelwamondo ◽

Khmaies Ouahada

Keyword(s):

Machine Learning ◽

Real World ◽

Data Streams ◽

Predictive Models ◽

Concept Drift ◽

Dynamic Environments ◽

Real World Data ◽

World Data ◽

Different Types ◽

Concept Drifts

Beyond applying machine learning predictive models to static tasks, a significant corpus of research exists that applies machine learning predictive models to streaming environments that incur concept drift. With the prevalence of streaming real-world applications that are associated with changes in the underlying data distribution, the need for applications that are capable of adapting to evolving and time-varying dynamic environments can be hardly overstated. Dynamic environments are nonstationary and change with time and the target variables to be predicted by the learning algorithm and often evolve with time, a phenomenon known as concept drift. Most work in handling concept drift focuses on updating the prediction model so that it can recover from concept drift while little effort has been dedicated to the formulation of a learning system that is capable of learning different types of drifting concepts at any time with minimum overheads. This work proposes a novel and evolving data stream classifier called Adaptive Diversified Ensemble Selection Classifier (ADES) that significantly optimizes adaptation to different types of concept drifts at any time and improves convergence to new concepts by exploiting different amounts of ensemble diversity. The ADES algorithm generates diverse base classifiers, thereby optimizing the margin distribution to exploit ensemble diversity to formulate an ensemble classifier that generalizes well to unseen instances and provides fast recovery from different types of concept drift. Empirical experiments conducted on both artificial and real-world data streams demonstrate that ADES can adapt to different types of drifts at any given time. The prediction performance of ADES is compared to three other ensemble classifiers designed to handle concept drift using both artificial and real-world data streams. The comparative evaluation performed demonstrated the ability of ADES to handle different types of concept drifts. The experimental results, including statistical test results, indicate comparable performances with other algorithms designed to handle concept drift and prove their significance and effectiveness.

Download Full-text

CONCEPT DRIFT IN STREAMING DATA: A SYSTEMATIC LITERATURE REVIEW

KIET Journal of Computing and Information Sciences ◽

10.51153/kjcis.v4i1.43 ◽

2021 ◽

Vol 4 (1) ◽

pp. 17

Author(s):

Tariq Mahmood ◽

Tatheer Fatima

Keyword(s):

Machine Learning ◽

Literature Review ◽

Systematic Literature Review ◽

Data Streams ◽

Concept Drift ◽

Streaming Data ◽

Machine Learning Techniques ◽

Underlying Distribution ◽

Learning Techniques ◽

Real World Datasets

World is generating immeasurable amount of data every minute, that needs to be analyzed for better decision making. In order to fulfil this demand of faster analytics, businesses are adopting efficient stream processing and machine learning techniques. However, data streams are particularly challenging to handle. One of the prominent problems faced while dealing with streaming data is concept drift. Concept drift is described as, an unexpected change in the underlying distribution of the streaming data that can be observed as time passes. In this work, we have conducted a systematic literature review to discover several methods that deal with the problem of concept drift. Most frequently used supervised and unsupervised techniques have been reviewed and we have also surveyed commonly used publicly available artificial and real-world datasets that are used to deal with concept drift issues.

Download Full-text