Burst Detection-based Selective Classifier Resetting

Concept drift detection algorithms have historically been faithful to the aged architecture of forcefully resetting the base classifiers for each detected drift. This approach prevents underlying classifiers becoming outdated as the distribution of a data stream shifts from one concept to another. In situations where both concept drift and temporal dependence are present within a data stream, forced resetting can cause complications in classifier evaluation. Resetting the base classifier too frequently when temporal dependence is present can cause classifier performance to appear successful, when in fact this is misleading. In this research, a novel architectural method for determining base classifier resets, Burst Detection-based Selective Classifier Resetting (BD-SCR), is presented. BD-SCR statistically monitors changes in the temporal dependence of a data stream to determine if a base classifier should be reset for detected drifts. The experimental process compares the predictive performance of state-of-the-art drift detectors in comparison to the “No-Change” detector using BD-SCR to inform and control the resetting decision. Results show that BD-SCR effectively reduces the negative impact of temporal dependence during concept drift detection through a clear negation in the performance of the “No-Change” detector, but is capable of maintaining the predictive performance of state-of-the-art drift detection methods.

Download Full-text

Request-and-Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/421 ◽

2018 ◽

Cited By ~ 2

Author(s):

Shujian Yu ◽

Xiaoyang Wang ◽

José C. Príncipe

Keyword(s):

Hypothesis Testing ◽

Goodness Of Fit ◽

Concept Drift ◽

Predictive Performance ◽

Detection Methods ◽

The Novel ◽

Testing Framework ◽

Streaming Applications ◽

Benchmark Datasets ◽

Concept Drift Detection

One important assumption underlying common classification models is the stationarity of the data. However, in real-world streaming applications, the data concept indicated by the joint distribution of feature and label is not stationary but drifting over time. Concept drift detection aims to detect such drifts and adapt the model so as to mitigate any deterioration in the model's predictive performance. Unfortunately, most existing concept drift detection methods rely on a strong and over-optimistic condition that the true labels are available immediately for all already classified instances. In this paper, a novel Hierarchical Hypothesis Testing framework with Request-and-Reverify strategy is developed to detect concept drifts by requesting labels only when necessary. Two methods, namely Hierarchical Hypothesis Testing with Classification Uncertainty (HHT-CU) and Hierarchical Hypothesis Testing with Attribute-wise "Goodness-of-fit" (HHT-AG), are proposed respectively under the novel framework. In experiments with benchmark datasets, our methods demonstrate overwhelming advantages over state-of-the-art unsupervised drift detectors. More importantly, our methods even outperform DDM (the widely used supervised drift detector) when we use significantly fewer labels.

Download Full-text

Bhattacharyya Distance based Concept Drift Detection Method For evolving data stream

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115303 ◽

2021 ◽

pp. 115303

Author(s):

Ishwar Baidari ◽

Nagaraj Honnikoll

Keyword(s):

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Bhattacharyya Distance ◽

Concept Drift Detection ◽

Evolving Data

Download Full-text

Heuristic ensemble for unsupervised detection of multiple types of concept drift in data stream classification

Intelligent Decision Technologies ◽

10.3233/idt-210115 ◽

2021 ◽

pp. 1-14

Author(s):

Hanqing Hu ◽

Mehmed Kantardzic

Keyword(s):

Data Stream ◽

Concept Drift ◽

False Alarms ◽

Detection Accuracy ◽

Real World Data ◽

Traditional Concept ◽

Stream Classification ◽

Data Stream Classification ◽

Detection Algorithms ◽

Concept Drift Detection

Real-world data stream classification often deals with multiple types of concept drift, categorized by change characteristics such as speed, distribution, and severity. When labels are unavailable, traditional concept drift detection algorithms, used in stream classification frameworks, are often focused on only one type of concept drift. To overcome the limitations of traditional detection algorithms, this study proposed a Heuristic Ensemble Framework for Drift Detection (HEFDD). HEFDD aims to detect all types of concept drift by employing an ensemble of selected concept drift detection algorithms, each capable of detecting at least one type of concept drift. Experimental results show HEFDD provides significant improvement based on the z-score test when comparing detection accuracy with state-of-the-art individual algorithms. At the same time, HEFDD is able to reduce false alarms generated by individual concept drift detection algorithms.

Download Full-text

Applying Fourier Inspired Windows for Concept Drift Detection in Data Stream

2020 IEEE Calcutta Conference (CALCON) ◽

10.1109/calcon49167.2020.9106537 ◽

2020 ◽

Author(s):

Sumit Misra ◽

Dipan Biswas ◽

Sanjoy Kumar Saha ◽

Chandan Mazumdar

Keyword(s):

Data Stream ◽

Concept Drift ◽

Concept Drift Detection

Download Full-text

Data stream mining: methods and challenges for handling concept drift

SN Applied Sciences ◽

10.1007/s42452-019-1433-0 ◽

2019 ◽

Vol 1 (11) ◽

Cited By ~ 5

Author(s):

Scott Wares ◽

John Isaacs ◽

Eyad Elyan

Keyword(s):

Data Stream ◽

Concept Drift ◽

Relevant Literature ◽

Streaming Data ◽

Future Research ◽

Stream Mining ◽

Detection Algorithms ◽

The Past ◽

Concept Drift Detection ◽

The Impact

Abstract Mining and analysing streaming data is crucial for many applications, and this area of research has gained extensive attention over the past decade. However, there are several inherent problems that continue to challenge the hardware and the state-of-the art algorithmic solutions. Examples of such problems include the unbound size, varying speed and unknown data characteristics of arriving instances from a data stream. The aim of this research is to portray key challenges faced by algorithmic solutions for stream mining, particularly focusing on the prevalent issue of concept drift. A comprehensive discussion of concept drift and its inherent data challenges in the context of stream mining is presented, as is a critical, in-depth review of relevant literature. Current issues with the evaluative procedure for concept drift detectors is also explored, highlighting problems such as a lack of established base datasets and the impact of temporal dependence on concept drift detection. By exposing gaps in the current literature, this study suggests recommendations for future research which should aid in the progression of stream mining and concept drift detection algorithms.

Download Full-text

CD2A: Concept Drift Detection Approach Toward Imbalanced Data Stream

Lecture Notes in Electrical Engineering - Emerging Research in Electronics, Computer Science and Technology ◽

10.1007/978-981-13-5802-9_54 ◽

2019 ◽

pp. 597-612 ◽

Cited By ~ 2

Author(s):

Mohammed Ahmed Ali Abdualrhman ◽

M. C. Padma

Keyword(s):

Data Stream ◽

Concept Drift ◽

Imbalanced Data ◽

Detection Approach ◽

Concept Drift Detection

Download Full-text

Concept Drift Detection in Data Stream Clustering and its Application on Weather Data

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/ijaeis.2020010104 ◽

2020 ◽

Vol 11 (1) ◽

pp. 67-85 ◽

Cited By ~ 1

Author(s):

Namitha K. ◽

Santhosh Kumar G.

Keyword(s):

Data Streams ◽

Data Stream ◽

Weather Forecasting ◽

Concept Drift ◽

Clustering Algorithms ◽

Weather Data ◽

Stream Clustering ◽

Cluster Evolution ◽

Data Stream Clustering ◽

Concept Drift Detection

This article presents a stream mining framework to cluster the data stream and monitor its evolution. Even though concept drift is expected to be present in data streams, explicit drift detection is rarely done in stream clustering algorithms. The proposed framework is capable of explicit concept drift detection and cluster evolution analysis. Concept drift is caused by the changes in data distribution over time. Relationship between concept drift and the occurrence of physical events has been studied by applying the framework on the weather data stream. Experiments led to the conclusion that the concept drift accompanied by a change in the number of clusters indicates a significant weather event. This kind of online monitoring and its results can be utilized in weather forecasting systems in various ways. Weather data streams produced by automatic weather stations (AWS) are used to conduct this study.

Download Full-text

ElStream: An Ensemble Learning Approach for Concept Drift Detection in Dynamic Social Big Data Stream Learning

IEEE Access ◽

10.1109/access.2021.3076264 ◽

2021 ◽

pp. 1-1

Author(s):

Ahmed Abbasi ◽

Abdul Rehman Javed ◽

Chinmay Chakraborty ◽

Jamel Nebhen ◽

Wisha Zehra ◽

...

Keyword(s):

Big Data ◽

Ensemble Learning ◽

Data Stream ◽

Concept Drift ◽

Learning Approach ◽

Social Big Data ◽

Concept Drift Detection

Download Full-text

Concept Drift Detection on Data Stream for Revising DBSCAN Cluster

Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics ◽

10.1145/3405962.3405990 ◽

2020 ◽

Author(s):

Yasushi Miyata ◽

Hiroshi Ishikawa

Keyword(s):

Data Stream ◽

Concept Drift ◽

Concept Drift Detection

Download Full-text

Classification of Imbalanced Data Stream: Techniques and Challenges

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.92.9964 ◽

2021 ◽

Vol 9 (2) ◽

pp. 36-52

Author(s):

Mashaal A. Alfhaid ◽

Manal Abdullah

Keyword(s):

Data Mining ◽

Data Stream ◽

Concept Drift ◽

Class Imbalance ◽

Imbalanced Data ◽

Predictive Performance ◽

Knowledge Extraction ◽

Streaming Data ◽

Stream Data ◽

Stream Data Mining

As the number of generated data increases every day, this has brought the importance of data mining and knowledge extraction. In traditional data mining, offline status can be used for knowledge extraction. Nevertheless, dealing with stream data mining is different due to continuously arriving data that can be processed at a single scan besides the appearance of concept drift. As the pre-processing stage is critical in knowledge extraction, imbalanced stream data gain significant popularity in the last few years among researchers. Many real-world applications suffer from class imbalance including medical, business, fraud detection and etc. Learning from the supervised model includes classes whether it is binary- or multi-classes. These classes are often imbalance where it is divided into the majority (negative) class and minority (positive) class, which can cause a bias toward the majority class that leads to skew in predictive performance models. Handles imbalance streaming data is mandatory for more accurate and reliable learning models. In this paper, we will present an overview of data stream mining and its tools. Besides, summarize the problem of class imbalance and its different approaches. In addition, researchers will present the popular evaluation metrics and challenges prone from imbalanced streaming data.

Download Full-text