Decision Tree Classification Algorithm within Concept Similarity

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.235.9 ◽

2012 ◽

Vol 235 ◽

pp. 9-14

Author(s):

Chun Hua Ju ◽

Li Li Mao

Keyword(s):

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Classification Algorithm ◽

Streaming Data ◽

Decision Tree Classification ◽

The Cost ◽

Prediction Efficiency ◽

Concept Similarity

Data stream mining has been applied in many domains, but the concept drifts of data streams bring great obstacles to data mining. Current researches about classification algorithm for streaming data with concept drift have achieved many successes, while they pay little attention to the iterancy of data streams, namely, the situation of the historical concept reappears. For this characteristic, this paper puts forward that it utilizes the classifier model of the historical concepts or high similarity concepts through calculating the concept similarity to classify and predict. In this way, we don’t need training any more. Meanwhile, it reduces the cost of update model, speeds up the classification of the rate and improves the prediction efficiency.

Download Full-text

DETECTION AND CLASSIFICATION OF CHANGES IN EVOLVING DATA STREAMS

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622006002179 ◽

2006 ◽

Vol 05 (04) ◽

pp. 659-670 ◽

Cited By ~ 26

Author(s):

MOHAMED MEDHAT GABER ◽

PHILIP S. YU

Keyword(s):

Data Streams ◽

Data Stream ◽

Weather Conditions ◽

High Volume ◽

Streaming Data ◽

Wide Range ◽

Change Characteristics ◽

History Of ◽

Scientific Phenomena

Data stream mining has attracted considerable attention over the past few years owing to the significance of its applications. Streaming data is often evolving over time. Capturing changes could be used for detecting an event or a phenomenon in various applications. Weather conditions, economical changes, astronomical, and scientific phenomena are among a wide range of applications. Because of the high volume and speed of data streams, it is computationally hard to capture these changes from raw data in real-time. In this paper, we propose a novel algorithm that we term as STREAM-DETECT to capture these changes in data stream distribution and/or domain using clustering result deviation. STREAM-DETECT is followed by a process of offline classification CHANGE-CLASS. This classification is concerned with the association of the history of change characteristics with the observed event or phenomenon. Experimental results show the efficiency of the proposed framework in both detecting the changes and classification accuracy.

Download Full-text

Classification of the drifting data streams using heterogeneous diversified dynamic class-weighted ensemble

PeerJ Computer Science ◽

10.7717/peerj-cs.459 ◽

2021 ◽

Vol 7 ◽

pp. e459

Author(s):

Martin Sarnovsky ◽

Michal Kolarik

Keyword(s):

Data Streams ◽

Concept Drift ◽

Ensemble Methods ◽

Predictive Performance ◽

Streaming Data ◽

Underlying Structure ◽

Adaptive Models ◽

Resource Requirements ◽

Continuous Stream

Data streams can be defined as the continuous stream of data coming from different sources and in different forms. Streams are often very dynamic, and its underlying structure usually changes over time, which may result to a phenomenon called concept drift. When solving predictive problems using the streaming data, traditional machine learning models trained on historical data may become invalid when such changes occur. Adaptive models equipped with mechanisms to reflect the changes in the data proved to be suitable to handle drifting streams. Adaptive ensemble models represent a popular group of these methods used in classification of drifting data streams. In this paper, we present the heterogeneous adaptive ensemble model for the data streams classification, which utilizes the dynamic class weighting scheme and a mechanism to maintain the diversity of the ensemble members. Our main objective was to design a model consisting of a heterogeneous group of base learners (Naive Bayes, k-NN, Decision trees), with adaptive mechanism which besides the performance of the members also takes into an account the diversity of the ensemble. The model was experimentally evaluated on both real-world and synthetic datasets. We compared the presented model with other existing adaptive ensemble methods, both from the perspective of predictive performance and computational resource requirements.

Download Full-text

Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data

Complex & Intelligent Systems ◽

10.1007/s40747-021-00456-0 ◽

2021 ◽

Author(s):

S. Priya ◽

R. Annie Uthra

Keyword(s):

Decision Making ◽

Deep Learning ◽

Concept Drift ◽

Class Imbalance ◽

Streaming Data ◽

Superior Performance ◽

Data Streaming ◽

Minority Class ◽

Concept Drift Detection

AbstractIn present times, data science become popular to support and improve decision-making process. Due to the accessibility of a wide application perspective of data streaming, class imbalance and concept drifting become crucial learning problems. The advent of deep learning (DL) models finds useful for the classification of concept drift in data streaming applications. This paper presents an effective class imbalance with concept drift detection (CIDD) using Adadelta optimizer-based deep neural networks (ADODNN), named CIDD-ADODNN model for the classification of highly imbalanced streaming data. The presented model involves four processes namely preprocessing, class imbalance handling, concept drift detection, and classification. The proposed model uses adaptive synthetic (ADASYN) technique for handling class imbalance data, which utilizes a weighted distribution for diverse minority class examples based on the level of difficulty in learning. Next, a drift detection technique called adaptive sliding window (ADWIN) is employed to detect the existence of the concept drift. Besides, ADODNN model is utilized for the classification processes. For increasing the classifier performance of the DNN model, ADO-based hyperparameter tuning process takes place to determine the optimal parameters of the DNN model. The performance of the presented model is evaluated using three streaming datasets namely intrusion detection (NSL KDDCup) dataset, Spam dataset, and Chess dataset. A detailed comparative results analysis takes place and the simulation results verified the superior performance of the presented model by obtaining a maximum accuracy of 0.9592, 0.9320, and 0.7646 on the applied KDDCup, Spam, and Chess dataset, respectively.

Download Full-text

Knowledge Discovery From Evolving Data Streams

Advances in Business Information Systems and Analytics - Machine Learning Techniques for Improved Business Analytics ◽

10.4018/978-1-5225-3534-8.ch002 ◽

2019 ◽

pp. 19-39

Author(s):

Prasanna Lakshmi Kompalli

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Data Stream Mining ◽

Time Data ◽

Stream Mining ◽

New Challenges ◽

Mining Data Streams ◽

Different Sources

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.

Download Full-text

CLASSIFICATION OF CONCEPT DRIFT IN EVOLVING DATA STREAM

Emerging Extended Reality Technologies For Industry 4.0 ◽

10.1002/9781119654674.ch11 ◽

2020 ◽

pp. 189-205

Author(s):

Mashail Althabiti ◽

Manal Abdullah

Keyword(s):

Data Stream ◽

Concept Drift ◽

Evolving Data

Download Full-text

Scalable real-time classification of data streams with concept drift

Future Generation Computer Systems ◽

10.1016/j.future.2017.03.026 ◽

2017 ◽

Vol 75 ◽

pp. 187-199 ◽

Cited By ~ 35

Author(s):

Mark Tennant ◽

Frederic Stahl ◽

Omer Rana ◽

João Bártolo Gomes

Keyword(s):

Real Time ◽

Data Streams ◽

Concept Drift ◽

Real Time Classification

Download Full-text

A Survey of Challenges Facing Streaming Data

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.84.8579 ◽

2020 ◽

Vol 8 (4) ◽

pp. 63-73

Author(s):

Sikha Bagui ◽

Katie Jin

Keyword(s):

Data Reduction ◽

Data Streams ◽

Data Stream ◽

Stream Processing ◽

Streaming Data ◽

Data Detection ◽

Data Stream Processing ◽

The Face ◽

Concept Drifts

This survey performs a thorough enumeration and analysis of existing methods for data stream processing. It is a survey of the challenges facing streaming data. The challenges addressed are preprocessing of streaming data, detection and dealing with concept drifts in streaming data, data reduction in the face of data streams, approximate queries and blocking operations in streaming data.

Download Full-text

EvolveCluster: an evolutionary clustering algorithm for streaming data

Evolving Systems ◽

10.1007/s12530-021-09408-y ◽

2021 ◽

Author(s):

Christian Nordahl ◽

Veselka Boeva ◽

Håkan Grahn ◽

Marie Persson Netz

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Streaming Data ◽

Evolutionary Clustering ◽

Stream Clustering ◽

The Past ◽

Data Stream Clustering ◽

Evolving Data

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.

Download Full-text

Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance

Computational Intelligence and Neuroscience ◽

10.1155/2021/8813806 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yange Sun ◽

Meng Li ◽

Lei Li ◽

Han Shao ◽

Yi Sun

Keyword(s):

Data Streams ◽

Data Stream ◽

Learning Strategy ◽

Concept Drift ◽

Class Imbalance ◽

Data Preprocessing ◽

Cost Information ◽

Detection Mechanism ◽

Stream Classification ◽

Data Stream Classification

Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.

Download Full-text

Microcluster-Based Incremental Ensemble Learning for Noisy, Nonstationary Data Streams

Complexity ◽

10.1155/2020/6147378 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Sanmin Liu ◽

Shan Xue ◽

Fanzhen Liu ◽

Jieren Cheng ◽

Xiulai Li ◽

...

Keyword(s):

Ensemble Learning ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Majority Vote ◽

Stream Classification ◽

Model Stability ◽

Data Stream Classification ◽

Nonstationary Data ◽

Synthetic Datasets

Data stream classification becomes a promising prediction work with relevance to many practical environments. However, under the environment of concept drift and noise, the research of data stream classification faces lots of challenges. Hence, a new incremental ensemble model is presented for classifying nonstationary data streams with noise. Our approach integrates three strategies: incremental learning to monitor and adapt to concept drift; ensemble learning to improve model stability; and a microclustering procedure that distinguishes drift from noise and predicts the labels of incoming instances via majority vote. Experiments with two synthetic datasets designed to test for both gradual and abrupt drift show that our method provides more accurate classification in nonstationary data streams with noise than the two popular baselines.

Download Full-text