scholarly journals A Novel Ensemble Classification for Data Streams with Class Imbalance and Concept Drift

Author(s):  
Yange Sun
Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 2131 ◽  
Author(s):  
Affan Ahmed Toor ◽  
Muhammad Usman ◽  
Farah Younas ◽  
Alvis Cheuk M. Fong ◽  
Sajid Ali Khan ◽  
...  

With the increasing popularity of the Internet-of-Medical-Things (IoMT) and smart devices, huge volumes of data streams have been generated. This study aims to address the concept drift, which is a major challenge in the processing of voluminous data streams. Concept drift refers to overtime change in data distribution. It may occur in the medical domain, for example the medical sensors measuring for general healthcare or rehabilitation, which may switch their roles for ICU emergency operations when required. Detecting concept drifts becomes trickier when the class distributions in data are skewed, which is often true for medical sensors e-health data. Reactive Drift Detection Method (RDDM) is an efficient method for detecting long concepts. However, RDDM has a high error rate, and it does not handle class imbalance. We propose an Enhanced Reactive Drift Detection Method (ERDDM), which systematically generates strategies to handle concept drift with class imbalance in data streams. We conducted experiments to compare ERDDM with three contemporary techniques in terms of prediction error, drift detection delay, latency, and ability to handle data imbalance. The experimentation was done in Massive Online Analysis (MOA) on 48 synthetic datasets customized to possess the capabilities of data streams. ERDDM can handle abrupt and gradual drifts and performs better than all benchmarks in almost all experiments.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yange Sun ◽  
Meng Li ◽  
Lei Li ◽  
Han Shao ◽  
Yi Sun

Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.


Author(s):  
Alessio Bernardo ◽  
Emanuele Della Valle

AbstractThe world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minority oversampling technique (VFC-SMOTE). It is a novel meta-strategy to be prepended to any streaming machine learning classification algorithm aiming at oversampling the minority class using a new version of Smote and Borderline-Smote inspired by Data Sketching. We benchmarked VFC-SMOTE pipelines on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that VFC-SMOTE pipelines learn models whose minority class performances are better than state-of-the-art. Moreover, we analyze the time/memory consumption and the concept drift recovery speed.


2021 ◽  
Author(s):  
Priya S ◽  
Annie Uthra

Abstract As the data mining applications are increasing popularly, large volumes of data streams are generated over the period of time. The main problem in data streams is that it exhibits a high degree of class imbalance and distribution of data changes over time. In this paper, Timely Drift Detection and Minority Resampling Technique (TDDMRT) based on K-nearest neighbor and Jaccard similarity is proposed to handle the class imbalance by finding the current ratio of class labels. The Enhanced Early Drift Detection Method (EEDDM) is proposed for detecting the concept drift and the Minority Resampling Method (KNN-JS) determines whether the current data stream should be regarded as imbalance and it resamples the minority instances in the drifting data stream. The K-Nearest Neighbors technique is used to resample the minority classes and the Jaccard similarity measure is established over the resampled data to generate the synthetic data similar to the original data and it is handled by ensemble classifiers. The proposed ensemble based classification model outperforms the existing over sampling and under sampling techniques with accuracy of 98.52%.


2021 ◽  
pp. 1-29
Author(s):  
Xilong Zhang ◽  
Meng Han ◽  
Hongxin Wu ◽  
Muhang Li ◽  
Zhiqiang Chen

With the rapid development of information technology, data streams in various fields are showing the characteristics of rapid arrival, complex structure and timely processing. Complex types of data streams make the classification performance worse. However, ensemble classification has become one of the main methods of processing data streams. Ensemble classification performance is better than traditional single classifiers. This article introduces the ensemble classification algorithms of complex data streams for the first time. Then overview analyzes the advantages and disadvantages of these algorithms for steady-state, concept drift, imbalanced, multi-label and multi-instance data streams. At the same time, the application fields of data streams are also introduced which summarizes the ensemble algorithms processing text, graph and big data streams. Moreover, it comprehensively summarizes the verification technology, evaluation indicators and open source platforms of complex data streams mining algorithms. Finally, the challenges and future research directions of ensemble learning algorithms dealing with uncertain, multi-type, delayed, multi-type concept drift data streams are given.


Author(s):  
D. Himaja ◽  
T. Maruthi Padmaja ◽  
P. Radha Krishna

Learning from data streams with both online class imbalance and concept drift (OCI-CD) is receiving much attention in today's world. Due to this problem, the performance is affected for the current models that learn from both stationary as well as non-stationary environments. In the case of non-stationary environments, due to the imbalance, it is hard to spot the concept drift using conventional drift detection methods that aim at tracking the change detection based on the learner's performance. There is limited work on the combined problem from imbalanced evolving streams both from stationary and non-stationary environments. Here the data may be evolved with complete labels or with only limited labels. This chapter's main emphasis is to provide different methods for the purpose of resolving the issue of class imbalance in emerging streams, which involves changing and unchanging environments with supervised and availability of limited labels.


Author(s):  
Tirupathi Rao Gullipalli

One of the biggest challenges in the recent times in the field of data stream learning is to mitigate the presence of concept drift. There are numerous challenges in overcoming the concept drift, such as changing class ratio, huge volume of data and real time processing for effective knowledge discovery. Evolutionary search techniques are one of the new paradigms to handle huge dimensionality and scalability of the data streams. One of the finest and least applied evolutionary search approaches is the cuckoo search technique for data streams. To solve both the concept drift and class imbalance issues simultaneously, in this paper we have proposed an approach using nature inspired evolutionary optimizing technique known as Cuckoo Feature and Instance Selection (CFIS) algorithm. The performance evaluation of the proposed approach is done on an exclusive experimental setup of 15 data streams formed and compared with two data stream approach. Moreover, a set of six evaluation criteria’s are considered for showing overall better performance of the proposed approach in the presence of concept drift and class imbalance.


Information ◽  
2019 ◽  
Vol 10 (5) ◽  
pp. 158 ◽  
Author(s):  
Yange Sun ◽  
Han Shao ◽  
Shasha Wang

Most existing multi-label data streams classification methods focus on extending single-label streams classification approaches to multi-label cases, without considering the special characteristics of multi-label stream data, such as label dependency, concept drift, and recurrent concepts. Motivated by these challenges, we devise an efficient ensemble paradigm for multi-label data streams classification. The algorithm deploys a novel change detection based on Jensen–Shannon divergence to identify different kinds of concept drift in data streams. Moreover, our method tries to consider label dependency by pruning away infrequent label combinations to enhance classification performance. Empirical results on both synthetic and real-world datasets have demonstrated its effectiveness.


Author(s):  
Yong Zhang ◽  
Jiaxin Yu ◽  
Wenzhe Liu ◽  
Kaoru Ota

Data stream learning in non-stationary environments and skewed class distributions has been receiving more attention in machine learning communities. This paper proposes a novel ensemble classification method (ECSDS) for classifying data streams with skewed class distributions. In the proposed ensemble method, back-propagation neural network is selected as the base classifier. In order to demonstrate the effectiveness of our proposed method, we choose three baseline methods based on ECSDS and evaluate their overall performance on ten datasets from UCI machine learning repository. Moreover, the performance of incremental learning is also evaluated by these datasets. The experimental results show our proposed method can effectively deal with classification problems on non-stationary data streams with class imbalance.


Sign in / Sign up

Export Citation Format

Share Document