Minority Resampling Based Ensemble Framework Using Enhanced Early Drift Detection Method For Imbalanced Data Streams

Mapping Intimacies ◽

10.21203/rs.3.rs-141880/v1 ◽

2021 ◽

Author(s):

Priya S ◽

Annie Uthra

Keyword(s):

Data Streams ◽

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Class Imbalance ◽

Current Data ◽

Classification Model ◽

Ensemble Classifiers ◽

K Nearest Neighbor ◽

Jaccard Similarity

Abstract As the data mining applications are increasing popularly, large volumes of data streams are generated over the period of time. The main problem in data streams is that it exhibits a high degree of class imbalance and distribution of data changes over time. In this paper, Timely Drift Detection and Minority Resampling Technique (TDDMRT) based on K-nearest neighbor and Jaccard similarity is proposed to handle the class imbalance by finding the current ratio of class labels. The Enhanced Early Drift Detection Method (EEDDM) is proposed for detecting the concept drift and the Minority Resampling Method (KNN-JS) determines whether the current data stream should be regarded as imbalance and it resamples the minority instances in the drifting data stream. The K-Nearest Neighbors technique is used to resample the minority classes and the Jaccard similarity measure is established over the resampled data to generate the synthetic data similar to the original data and it is handled by ensemble classifiers. The proposed ensemble based classification model outperforms the existing over sampling and under sampling techniques with accuracy of 98.52%.

Mining Massive E-Health Data Streams for IoMT Enabled Healthcare Systems

Sensors ◽

10.3390/s20072131 ◽

2020 ◽

Vol 20 (7) ◽

pp. 2131 ◽

Cited By ~ 3

Author(s):

Affan Ahmed Toor ◽

Muhammad Usman ◽

Farah Younas ◽

Alvis Cheuk M. Fong ◽

Sajid Ali Khan ◽

...

Keyword(s):

Data Streams ◽

Detection Method ◽

Concept Drift ◽

Class Imbalance ◽

Health Data ◽

Smart Devices ◽

Detection Delay ◽

Medical Sensors ◽

Synthetic Datasets ◽

Almost All

With the increasing popularity of the Internet-of-Medical-Things (IoMT) and smart devices, huge volumes of data streams have been generated. This study aims to address the concept drift, which is a major challenge in the processing of voluminous data streams. Concept drift refers to overtime change in data distribution. It may occur in the medical domain, for example the medical sensors measuring for general healthcare or rehabilitation, which may switch their roles for ICU emergency operations when required. Detecting concept drifts becomes trickier when the class distributions in data are skewed, which is often true for medical sensors e-health data. Reactive Drift Detection Method (RDDM) is an efficient method for detecting long concepts. However, RDDM has a high error rate, and it does not handle class imbalance. We propose an Enhanced Reactive Drift Detection Method (ERDDM), which systematically generates strategies to handle concept drift with class imbalance in data streams. We conducted experiments to compare ERDDM with three contemporary techniques in terms of prediction error, drift detection delay, latency, and ability to handle data imbalance. The experimentation was done in Massive Online Analysis (MOA) on 48 synthetic datasets customized to possess the capabilities of data streams. ERDDM can handle abrupt and gradual drifts and performs better than all benchmarks in almost all experiments.

A novel concept drift detection method in data streams using ensemble classifiers

Intelligent Data Analysis ◽

10.3233/ida-150207 ◽

2016 ◽

Vol 20 (6) ◽

pp. 1329-1350 ◽

Cited By ~ 8

Author(s):

Mahdie Dehghan ◽

Hamid Beigy ◽

Poorya ZareMoodi

Keyword(s):

Data Streams ◽

Detection Method ◽

Concept Drift ◽

Ensemble Classifiers ◽

Concept Drift Detection ◽

Novel Concept

Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance

Computational Intelligence and Neuroscience ◽

10.1155/2021/8813806 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yange Sun ◽

Meng Li ◽

Lei Li ◽

Han Shao ◽

Yi Sun

Keyword(s):

Data Streams ◽

Data Stream ◽

Learning Strategy ◽

Concept Drift ◽

Class Imbalance ◽

Data Preprocessing ◽

Cost Information ◽

Detection Mechanism ◽

Stream Classification ◽

Data Stream Classification

Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.

A Survey of Class Imbalance Problem on Evolving Data Stream

Data Preprocessing, Active Learning, and Cost Perceptive Approaches for Resolving Data Imbalance - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-7371-6.ch002 ◽

2021 ◽

pp. 23-41

Author(s):

D. Himaja ◽

T. Maruthi Padmaja ◽

P. Radha Krishna

Keyword(s):

Change Detection ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Class Imbalance ◽

Detection Methods ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Learning From Data ◽

Main Emphasis

Learning from data streams with both online class imbalance and concept drift (OCI-CD) is receiving much attention in today's world. Due to this problem, the performance is affected for the current models that learn from both stationary as well as non-stationary environments. In the case of non-stationary environments, due to the imbalance, it is hard to spot the concept drift using conventional drift detection methods that aim at tracking the change detection based on the learner's performance. There is limited work on the combined problem from imbalanced evolving streams both from stationary and non-stationary environments. Here the data may be evolved with complete labels or with only limited labels. This chapter's main emphasis is to provide different methods for the purpose of resolving the issue of class imbalance in emerging streams, which involves changing and unchanging environments with supervised and availability of limited labels.

An Improved under Sampling Approaches for Concept Drift and Class Imbalance Data Streams using Improved Cuckoo Search Algorithm

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.1945 ◽

2021 ◽

Vol 12 (2) ◽

Author(s):

Tirupathi Rao Gullipalli

Keyword(s):

Data Streams ◽

Data Stream ◽

Search Algorithm ◽

Concept Drift ◽

Class Imbalance ◽

Cuckoo Search ◽

Cuckoo Search Algorithm ◽

Evolutionary Search ◽

Real Time Processing ◽

Under Sampling

One of the biggest challenges in the recent times in the field of data stream learning is to mitigate the presence of concept drift. There are numerous challenges in overcoming the concept drift, such as changing class ratio, huge volume of data and real time processing for effective knowledge discovery. Evolutionary search techniques are one of the new paradigms to handle huge dimensionality and scalability of the data streams. One of the finest and least applied evolutionary search approaches is the cuckoo search technique for data streams. To solve both the concept drift and class imbalance issues simultaneously, in this paper we have proposed an approach using nature inspired evolutionary optimizing technique known as Cuckoo Feature and Instance Selection (CFIS) algorithm. The performance evaluation of the proposed approach is done on an exclusive experimental setup of 15 data streams formed and compared with two data stream approach. Moreover, a set of six evaluation criteria’s are considered for showing overall better performance of the proposed approach in the presence of concept drift and class imbalance.

Handling Concept Drift in Data Stream Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j8857.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 548-550

Keyword(s):

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Research Work ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Stream Classification ◽

Data Stream Classification ◽

Imbalance Problem ◽

Major Factors

Data Streams are having huge volume and it can-not be stored permanently in the memory for processing. In this paper we would be mainly focusing on issues in data stream, the major factors which are affecting the accuracy of classifier like imbalance class and Concept Drift. The drift in Data Stream mining refers to the change in data. Such as Class imbalance problem notifies that the samples are in the classes are not equal. In our research work we are trying to identify the change (Drift) in data, we are trying to detect Imbalance class and noise from changed data. And According to the type of drift we are applying the algorithms and trying to make the stream more balance and noise free to improve classifier’s accuracy.

Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/333 ◽

2017 ◽

Cited By ~ 7

Author(s):

Yang Lu ◽

Yiu-ming Cheung ◽

Yuan Yan Tang

Keyword(s):

Data Streams ◽

Incremental Learning ◽

High Efficiency ◽

Concept Drift ◽

Computational Cost ◽

Class Imbalance ◽

Real Data ◽

Current Data ◽

Class Imbalance Problem ◽

Weighted Majority

Concept drifts occurring in data streams will jeopardize the accuracy and stability of the online learning process. If the data stream is imbalanced, it will be even more challenging to detect and cure the concept drift. In the literature, these two problems have been intensively addressed separately, but have yet to be well studied when they occur together. In this paper, we propose a chunk-based incremental learning method called Dynamic Weighted Majority for Imbalance Learning (DWMIL) to deal with the data streams with concept drift and class imbalance problem. DWMIL utilizes an ensemble framework by dynamically weighting the base classifiers according to their performance on the current data chunk. Compared with the existing methods, its merits are four-fold: (1) it can keep stable for non-drifted streams and quickly adapt to the new concept; (2) it is totally incremental, i.e. no previous data needs to be stored; (3) it keeps a limited number of classifiers to ensure high efficiency; and (4) it is simple and needs only one thresholding parameter. Experiments on both synthetic and real data sets with concept drift show that DWMIL performs better than the state-of-the-art competitors, with less computational cost.

Bhattacharyya Distance based Concept Drift Detection Method For evolving data stream

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115303 ◽

2021 ◽

pp. 115303

Author(s):

Ishwar Baidari ◽

Nagaraj Honnikoll

Keyword(s):

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Bhattacharyya Distance ◽

Concept Drift Detection ◽

Evolving Data

Knowledge Discovery From Evolving Data Streams

Advances in Business Information Systems and Analytics - Machine Learning Techniques for Improved Business Analytics ◽

10.4018/978-1-5225-3534-8.ch002 ◽

2019 ◽

pp. 19-39

Author(s):

Prasanna Lakshmi Kompalli

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Data Stream Mining ◽

Time Data ◽

Stream Mining ◽

New Challenges ◽

Mining Data Streams ◽

Different Sources

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.

Microcluster-Based Incremental Ensemble Learning for Noisy, Nonstationary Data Streams

Complexity ◽

10.1155/2020/6147378 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Sanmin Liu ◽

Shan Xue ◽

Fanzhen Liu ◽

Jieren Cheng ◽

Xiulai Li ◽

...

Keyword(s):

Ensemble Learning ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Majority Vote ◽

Stream Classification ◽

Model Stability ◽

Data Stream Classification ◽

Nonstationary Data ◽

Synthetic Datasets

Data stream classification becomes a promising prediction work with relevance to many practical environments. However, under the environment of concept drift and noise, the research of data stream classification faces lots of challenges. Hence, a new incremental ensemble model is presented for classifying nonstationary data streams with noise. Our approach integrates three strategies: incremental learning to monitor and adapt to concept drift; ensemble learning to improve model stability; and a microclustering procedure that distinguishes drift from noise and predicts the labels of incoming instances via majority vote. Experiments with two synthetic datasets designed to test for both gradual and abrupt drift show that our method provides more accurate classification in nonstationary data streams with noise than the two popular baselines.