Ensemble based on Accuracy and Diversity Weighting for Evolving Data Streams

The International Arab Journal of Information Technology ◽

10.34028/iajit/19/1/11 ◽

2022 ◽

Author(s):

Yange Sun ◽

Han Shao ◽

Bencai Zhang

Keyword(s):

Ensemble Learning ◽

Data Streams ◽

Concept Drift ◽

Ensemble Methods ◽

Current Data ◽

Ensemble Classification ◽

Crucial Issue ◽

Base Classifier ◽

Real World Applications ◽

Different Types

Ensemble classification is an actively researched paradigm that has received much attention due to increasing real-world applications. The crucial issue of ensemble learning is to construct a pool of base classifiers with accuracy and diversity. In this paper, unlike conventional data-streams oriented ensemble methods, we propose a novel Measure via both Accuracy and Diversity (MAD) instead of one of them to supervise ensemble learning. Based on MAD, a novel online ensemble method called Accuracy and Diversity weighted Ensemble (ADE) effectively handles concept drift in data streams. ADE mainly uses the following three steps to construct a concept-drift oriented ensemble: for the current data window, 1) a new base classifier is constructed based on the current concept when drift detect, 2) MAD is used to measure the performance of ensemble members, and 3) a newly built classifier replaces the worst base classifier. If the newly constructed classifier is the worst one, the replacement has not occurred. Comparing with the state-of-art algorithms, ADE exceeds the current best-related algorithm by 2.38% in average classification accuracy. Experimental results show that the proposed method can effectively adapt to different types of drifts.

Download Full-text

Microcluster-Based Incremental Ensemble Learning for Noisy, Nonstationary Data Streams

Complexity ◽

10.1155/2020/6147378 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Sanmin Liu ◽

Shan Xue ◽

Fanzhen Liu ◽

Jieren Cheng ◽

Xiulai Li ◽

...

Keyword(s):

Ensemble Learning ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Majority Vote ◽

Stream Classification ◽

Model Stability ◽

Data Stream Classification ◽

Nonstationary Data ◽

Synthetic Datasets

Data stream classification becomes a promising prediction work with relevance to many practical environments. However, under the environment of concept drift and noise, the research of data stream classification faces lots of challenges. Hence, a new incremental ensemble model is presented for classifying nonstationary data streams with noise. Our approach integrates three strategies: incremental learning to monitor and adapt to concept drift; ensemble learning to improve model stability; and a microclustering procedure that distinguishes drift from noise and predicts the labels of incoming instances via majority vote. Experiments with two synthetic datasets designed to test for both gradual and abrupt drift show that our method provides more accurate classification in nonstationary data streams with noise than the two popular baselines.

Download Full-text

Semi-supervised Ensemble Learning of Data Streams in the Presence of Concept Drift

Lecture Notes in Computer Science - Hybrid Artificial Intelligent Systems ◽

10.1007/978-3-642-28931-6_50 ◽

2012 ◽

pp. 526-537 ◽

Cited By ~ 7

Author(s):

Zahra Ahmadi ◽

Hamid Beigy

Keyword(s):

Ensemble Learning ◽

Data Streams ◽

Concept Drift

Download Full-text

Minority Resampling Based Ensemble Framework Using Enhanced Early Drift Detection Method For Imbalanced Data Streams

10.21203/rs.3.rs-141880/v1 ◽

2021 ◽

Author(s):

Priya S ◽

Annie Uthra

Keyword(s):

Data Streams ◽

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Class Imbalance ◽

Current Data ◽

Classification Model ◽

Ensemble Classifiers ◽

K Nearest Neighbor ◽

Jaccard Similarity

Abstract As the data mining applications are increasing popularly, large volumes of data streams are generated over the period of time. The main problem in data streams is that it exhibits a high degree of class imbalance and distribution of data changes over time. In this paper, Timely Drift Detection and Minority Resampling Technique (TDDMRT) based on K-nearest neighbor and Jaccard similarity is proposed to handle the class imbalance by finding the current ratio of class labels. The Enhanced Early Drift Detection Method (EEDDM) is proposed for detecting the concept drift and the Minority Resampling Method (KNN-JS) determines whether the current data stream should be regarded as imbalance and it resamples the minority instances in the drifting data stream. The K-Nearest Neighbors technique is used to resample the minority classes and the Jaccard similarity measure is established over the resampled data to generate the synthetic data similar to the original data and it is handled by ensemble classifiers. The proposed ensemble based classification model outperforms the existing over sampling and under sampling techniques with accuracy of 98.52%.

Download Full-text

Dynamically adaptive and diverse dual ensemble learning approach for handling concept drift in data streams

Computational Intelligence ◽

10.1111/coin.12475 ◽

2021 ◽

Author(s):

Kanu Goel ◽

Shalini Batra

Keyword(s):

Ensemble Learning ◽

Data Streams ◽

Concept Drift ◽

Learning Approach

Download Full-text

Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift

The Scientific World JOURNAL ◽

10.1155/2015/235810 ◽

2015 ◽

Vol 2015 ◽

pp. 1-14 ◽

Cited By ~ 6

Author(s):

Agustín Ortíz Díaz ◽

José del Campo-Ávila ◽

Gonzalo Ramos-Jiménez ◽

Isvani Frías Blanco ◽

Yailé Caballero Mota ◽

...

Keyword(s):

Data Mining ◽

Data Streams ◽

Concept Drift ◽

Learning Algorithms ◽

Large Data ◽

Different Types ◽

Benchmark Datasets ◽

Mining Data Streams ◽

Concept Drifts

The treatment of large data streams in the presence of concept drifts is one of the main challenges in the field of data mining, particularly when the algorithms have to deal with concepts that disappear and then reappear. This paper presents a new algorithm, called Fast Adapting Ensemble (FAE), which adapts very quickly to both abrupt and gradual concept drifts, and has been specifically designed to deal with recurring concepts. FAE processes the learning examples in blocks of the same size, but it does not have to wait for the batch to be complete in order to adapt its base classification mechanism. FAE incorporates a drift detector to improve the handling of abrupt concept drifts and stores a set of inactive classifiers that represent old concepts, which are activated very quickly when these concepts reappear. We compare our new algorithm with various well-known learning algorithms, taking into account, common benchmark datasets. The experiments show promising results from the proposed algorithm (regarding accuracy and runtime), handling different types of concept drifts.

Download Full-text

An overview of complex data stream ensemble classification

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211100 ◽

2021 ◽

pp. 1-29

Author(s):

Xilong Zhang ◽

Meng Han ◽

Hongxin Wu ◽

Muhang Li ◽

Zhiqiang Chen

Keyword(s):

Data Streams ◽

Concept Drift ◽

Rapid Development ◽

Complex Structure ◽

Classification Performance ◽

Ensemble Classification ◽

Future Research ◽

Complex Data ◽

Advantages And Disadvantages ◽

Application Fields

With the rapid development of information technology, data streams in various fields are showing the characteristics of rapid arrival, complex structure and timely processing. Complex types of data streams make the classification performance worse. However, ensemble classification has become one of the main methods of processing data streams. Ensemble classification performance is better than traditional single classifiers. This article introduces the ensemble classification algorithms of complex data streams for the first time. Then overview analyzes the advantages and disadvantages of these algorithms for steady-state, concept drift, imbalanced, multi-label and multi-instance data streams. At the same time, the application fields of data streams are also introduced which summarizes the ensemble algorithms processing text, graph and big data streams. Moreover, it comprehensively summarizes the verification technology, evaluation indicators and open source platforms of complex data streams mining algorithms. Finally, the challenges and future research directions of ensemble learning algorithms dealing with uncertain, multi-type, delayed, multi-type concept drift data streams are given.

Download Full-text

ADES: A New Ensemble Diversity-Based Approach for Handling Concept Drift

Mobile Information Systems ◽

10.1155/2021/5549300 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Tinofirei Museba ◽

Fulufhelo Nelwamondo ◽

Khmaies Ouahada

Keyword(s):

Machine Learning ◽

Real World ◽

Data Streams ◽

Predictive Models ◽

Concept Drift ◽

Dynamic Environments ◽

Real World Data ◽

World Data ◽

Different Types ◽

Concept Drifts

Beyond applying machine learning predictive models to static tasks, a significant corpus of research exists that applies machine learning predictive models to streaming environments that incur concept drift. With the prevalence of streaming real-world applications that are associated with changes in the underlying data distribution, the need for applications that are capable of adapting to evolving and time-varying dynamic environments can be hardly overstated. Dynamic environments are nonstationary and change with time and the target variables to be predicted by the learning algorithm and often evolve with time, a phenomenon known as concept drift. Most work in handling concept drift focuses on updating the prediction model so that it can recover from concept drift while little effort has been dedicated to the formulation of a learning system that is capable of learning different types of drifting concepts at any time with minimum overheads. This work proposes a novel and evolving data stream classifier called Adaptive Diversified Ensemble Selection Classifier (ADES) that significantly optimizes adaptation to different types of concept drifts at any time and improves convergence to new concepts by exploiting different amounts of ensemble diversity. The ADES algorithm generates diverse base classifiers, thereby optimizing the margin distribution to exploit ensemble diversity to formulate an ensemble classifier that generalizes well to unseen instances and provides fast recovery from different types of concept drift. Empirical experiments conducted on both artificial and real-world data streams demonstrate that ADES can adapt to different types of drifts at any given time. The prediction performance of ADES is compared to three other ensemble classifiers designed to handle concept drift using both artificial and real-world data streams. The comparative evaluation performed demonstrated the ability of ADES to handle different types of concept drifts. The experimental results, including statistical test results, indicate comparable performances with other algorithms designed to handle concept drift and prove their significance and effectiveness.

Download Full-text

A Novel Ensemble Classification for Data Streams with Class Imbalance and Concept Drift

10.23940/ijpe.17.06.p15.945955 ◽

2017 ◽

Author(s):

Yange Sun

Keyword(s):

Data Streams ◽

Concept Drift ◽

Class Imbalance ◽

Ensemble Classification

Download Full-text

Efficient Ensemble Classification for Multi-Label Data Streams with Concept Drift

Information ◽

10.3390/info10050158 ◽

2019 ◽

Vol 10 (5) ◽

pp. 158 ◽

Cited By ~ 1

Author(s):

Yange Sun ◽

Han Shao ◽

Shasha Wang

Keyword(s):

Real World ◽

Data Streams ◽

Concept Drift ◽

Classification Performance ◽

Ensemble Classification ◽

Classification Methods ◽

Stream Data ◽

Label Data ◽

Real World Datasets ◽

Jensen Shannon Divergence

Most existing multi-label data streams classification methods focus on extending single-label streams classification approaches to multi-label cases, without considering the special characteristics of multi-label stream data, such as label dependency, concept drift, and recurrent concepts. Motivated by these challenges, we devise an efficient ensemble paradigm for multi-label data streams classification. The algorithm deploys a novel change detection based on Jensen–Shannon divergence to identify different kinds of concept drift in data streams. Moreover, our method tries to consider label dependency by pruning away infrequent label combinations to enhance classification performance. Empirical results on both synthetic and real-world datasets have demonstrated its effectiveness.

Download Full-text

Cost-sensitive ensemble learning: a unifying framework

Data Mining and Knowledge Discovery ◽

10.1007/s10618-021-00790-4 ◽

2021 ◽

Author(s):

George Petrides ◽

Wouter Verbeke

Keyword(s):

Random Forest ◽

Ensemble Learning ◽

Ensemble Methods ◽

Fine Grained ◽

Misclassification Errors ◽

Natural Extensions ◽

Different Types

AbstractOver the years, a plethora of cost-sensitive methods have been proposed for learning on data when different types of misclassification errors incur different costs. Our contribution is a unifying framework that provides a comprehensive and insightful overview on cost-sensitive ensemble methods, pinpointing their differences and similarities via a fine-grained categorization. Our framework contains natural extensions and generalisations of ideas across methods, be it AdaBoost, Bagging or Random Forest, and as a result not only yields all methods known to date but also some not previously considered.

Download Full-text