CDDM: Concept Drift Detection Model for Data Stream

<p>Data stream is the huge amount of data generated in various fields, including financial processes, social media activities, Internet of Things applications, and many others. Such data cannot be processed through traditional data mining algorithms due to several constraints, including limited memory, data speed, and dynamic environment. Concept Drift is known as the main constraint of data stream mining, mainly in the classification task. It refers to the change in the data stream underlining distribution over time. Thus, it results in accuracy deterioration of classification models and wrong predictions. Spam emails, consumer behavior changes, and adversary activates, are examples of Concept Drift. In this paper, a Concept Drift detection model is introduced, Concept Drift Detection Model (CDDM). It monitors the accuracy of the classification model over a sliding window, assuming the decline in accuracy indicates a drift occurrence. A modification over CDDM is a weighted version of the CDDM as W-CDDM.</p><p>Both models have evaluated against two real datasets and four artificial datasets. The experimental results of abrupt drift show that CDDM, W-CDDM outperforms the other models in the dataset of 100K and 1M instances, respectively. Regarding gradual drift, the W-CDDM overtook the rest in terms of accuracy, run time, and detection delays in the dataset of 100 K instances. While in the dataset of 1M instances, CDDM has got the highest accuracy using the NB classifier. Moreover, W-CDDM achieves the highest accuracy on real datasets.</p>

Download Full-text

Bhattacharyya Distance based Concept Drift Detection Method For evolving data stream

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115303 ◽

2021 ◽

pp. 115303

Author(s):

Ishwar Baidari ◽

Nagaraj Honnikoll

Keyword(s):

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Bhattacharyya Distance ◽

Concept Drift Detection ◽

Evolving Data

Download Full-text

Heuristic ensemble for unsupervised detection of multiple types of concept drift in data stream classification

Intelligent Decision Technologies ◽

10.3233/idt-210115 ◽

2021 ◽

pp. 1-14

Author(s):

Hanqing Hu ◽

Mehmed Kantardzic

Keyword(s):

Data Stream ◽

Concept Drift ◽

False Alarms ◽

Detection Accuracy ◽

Real World Data ◽

Traditional Concept ◽

Stream Classification ◽

Data Stream Classification ◽

Detection Algorithms ◽

Concept Drift Detection

Real-world data stream classification often deals with multiple types of concept drift, categorized by change characteristics such as speed, distribution, and severity. When labels are unavailable, traditional concept drift detection algorithms, used in stream classification frameworks, are often focused on only one type of concept drift. To overcome the limitations of traditional detection algorithms, this study proposed a Heuristic Ensemble Framework for Drift Detection (HEFDD). HEFDD aims to detect all types of concept drift by employing an ensemble of selected concept drift detection algorithms, each capable of detecting at least one type of concept drift. Experimental results show HEFDD provides significant improvement based on the z-score test when comparing detection accuracy with state-of-the-art individual algorithms. At the same time, HEFDD is able to reduce false alarms generated by individual concept drift detection algorithms.

Download Full-text

Applying Fourier Inspired Windows for Concept Drift Detection in Data Stream

2020 IEEE Calcutta Conference (CALCON) ◽

10.1109/calcon49167.2020.9106537 ◽

2020 ◽

Author(s):

Sumit Misra ◽

Dipan Biswas ◽

Sanjoy Kumar Saha ◽

Chandan Mazumdar

Keyword(s):

Data Stream ◽

Concept Drift ◽

Concept Drift Detection

Download Full-text

PSO Optimized Nearest Neighbor Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3574.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1508-1513

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Heuristic Algorithms ◽

Optimization Methods ◽

Classification Model ◽

Natural Phenomenon ◽

Wide Applicability ◽

Data Mining Algorithms ◽

Wide Range ◽

Mining Algorithms

Data mining can be considered to be an important aspects of information industry. Data mining has found a wide applicability in almost every field which deals with data. Out of the various techniques employed for data mining, Classification is a very commonly used tool for knowledge discovery. Various alternatives methods are available which can be used to create a classification model, out of which the most common and apprehensible one is KNN. In spite of KNN having a number of shortcomings and limitations in it, these can be overcome by with the help of alterations which can be made to the basic KNN algorithm. Due to its wide applicability, kNN has been the focus of extensive research and as a result, many alternatives have been performed with wide range of success in performance improvement. A major hardship being faced by the data mining applications is the large number of dimensions which render most of the data mining algorithms inefficient. The problem can be solved to some extent by using dimensionality reduction methods like PCA. Further improvements in the efficiency of the classification based mining algorithms can be achieved by using optimization methods. Meta-heuristic algorithms inspired by natural phenomenon like particle swarm optimization can be used very effectively for the purpose.

Download Full-text

Data stream mining: methods and challenges for handling concept drift

SN Applied Sciences ◽

10.1007/s42452-019-1433-0 ◽

2019 ◽

Vol 1 (11) ◽

Cited By ~ 5

Author(s):

Scott Wares ◽

John Isaacs ◽

Eyad Elyan

Keyword(s):

Data Stream ◽

Concept Drift ◽

Relevant Literature ◽

Streaming Data ◽

Future Research ◽

Stream Mining ◽

Detection Algorithms ◽

The Past ◽

Concept Drift Detection ◽

The Impact

Abstract Mining and analysing streaming data is crucial for many applications, and this area of research has gained extensive attention over the past decade. However, there are several inherent problems that continue to challenge the hardware and the state-of-the art algorithmic solutions. Examples of such problems include the unbound size, varying speed and unknown data characteristics of arriving instances from a data stream. The aim of this research is to portray key challenges faced by algorithmic solutions for stream mining, particularly focusing on the prevalent issue of concept drift. A comprehensive discussion of concept drift and its inherent data challenges in the context of stream mining is presented, as is a critical, in-depth review of relevant literature. Current issues with the evaluative procedure for concept drift detectors is also explored, highlighting problems such as a lack of established base datasets and the impact of temporal dependence on concept drift detection. By exposing gaps in the current literature, this study suggests recommendations for future research which should aid in the progression of stream mining and concept drift detection algorithms.

Download Full-text

CD2A: Concept Drift Detection Approach Toward Imbalanced Data Stream

Lecture Notes in Electrical Engineering - Emerging Research in Electronics, Computer Science and Technology ◽

10.1007/978-981-13-5802-9_54 ◽

2019 ◽

pp. 597-612 ◽

Cited By ~ 2

Author(s):

Mohammed Ahmed Ali Abdualrhman ◽

M. C. Padma

Keyword(s):

Data Stream ◽

Concept Drift ◽

Imbalanced Data ◽

Detection Approach ◽

Concept Drift Detection

Download Full-text

Concept Drift Detection in Data Stream Clustering and its Application on Weather Data

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/ijaeis.2020010104 ◽

2020 ◽

Vol 11 (1) ◽

pp. 67-85 ◽

Cited By ~ 1

Author(s):

Namitha K. ◽

Santhosh Kumar G.

Keyword(s):

Data Streams ◽

Data Stream ◽

Weather Forecasting ◽

Concept Drift ◽

Clustering Algorithms ◽

Weather Data ◽

Stream Clustering ◽

Cluster Evolution ◽

Data Stream Clustering ◽

Concept Drift Detection

This article presents a stream mining framework to cluster the data stream and monitor its evolution. Even though concept drift is expected to be present in data streams, explicit drift detection is rarely done in stream clustering algorithms. The proposed framework is capable of explicit concept drift detection and cluster evolution analysis. Concept drift is caused by the changes in data distribution over time. Relationship between concept drift and the occurrence of physical events has been studied by applying the framework on the weather data stream. Experiments led to the conclusion that the concept drift accompanied by a change in the number of clusters indicates a significant weather event. This kind of online monitoring and its results can be utilized in weather forecasting systems in various ways. Weather data streams produced by automatic weather stations (AWS) are used to conduct this study.

Download Full-text

Novel Class Detection with Concept Drift in Data Stream - AhtNODE

International Journal of Distributed Systems and Technologies ◽

10.4018/ijdst.2020010102 ◽

2020 ◽

Vol 11 (1) ◽

pp. 15-26

Author(s):

Jay Gandhi ◽

Vaibhav Gandhi

Keyword(s):

Data Stream ◽

Concept Drift ◽

Ensemble Classifier ◽

Streaming Data ◽

Classification Model ◽

Infinite Length ◽

The Novel ◽

Stream Data ◽

Hoeffding Tree ◽

Discovery Method

Data stream mining has become an interesting analysis topic and it is a growing interest in data discovery method. There are several applications supporting stream data processing like device network, electronic network, etc. Our approach AhtNODE (Adaptive Hoeffding Tree based NOvel class DEtection) detects novel class in the presence of concept drift in streaming data. It addresses there are three challenges of streaming data: infinite length, concept drift, and concept evolution. This approach automatically detects the novel class whenever it arrives in the data stream. It is a multi-class approach that distinguishes novel class from existing classes. The authors tend to apply the Adaptive Hoeffding Tree as a classification model that is also used to handle the concept drift situation. Previous approaches used the ensemble model to handle concept drift. In AHT, classification is done in the single pass. The experiment result proves the effectiveness of AhtNODE compared to existing ensemble classifier in terms of classification accuracy, speed and use of memory.

Download Full-text