data stream classification Latest Research Papers

The classification of data streams has become a significant and active research area. The principal characteristics of data streams are a large amount of arrival data, the high speed and rate of its arrival, and the change of their nature and distribution over time. Hoeffding Tree is a method to, incrementally, build decision trees. Since its proposition in the literature, it has become one of the most popular tools of data stream classification. Several improvements have since emerged. Hoeffding Anytime Tree was recently introduced and is considered one of the most promising algorithms. It offers a higher accuracy compared to the Hoeffding Tree in most scenarios, at a small additional computational cost. In this work, the authors contribute by proposing three improvements to the Hoeffding Anytime Tree. The improvements are tested on known benchmark datasets. The experimental results show that two of the proposed variants make better usage of Hoeffding Anytime Tree’s properties. They learn faster while providing the same desired accuracy.

Download Full-text

Heuristic ensemble for unsupervised detection of multiple types of concept drift in data stream classification

Intelligent Decision Technologies ◽

10.3233/idt-210115 ◽

2021 ◽

pp. 1-14

Author(s):

Hanqing Hu ◽

Mehmed Kantardzic

Keyword(s):

Data Stream ◽

Concept Drift ◽

False Alarms ◽

Detection Accuracy ◽

Real World Data ◽

Traditional Concept ◽

Stream Classification ◽

Data Stream Classification ◽

Detection Algorithms ◽

Concept Drift Detection

Real-world data stream classification often deals with multiple types of concept drift, categorized by change characteristics such as speed, distribution, and severity. When labels are unavailable, traditional concept drift detection algorithms, used in stream classification frameworks, are often focused on only one type of concept drift. To overcome the limitations of traditional detection algorithms, this study proposed a Heuristic Ensemble Framework for Drift Detection (HEFDD). HEFDD aims to detect all types of concept drift by employing an ensemble of selected concept drift detection algorithms, each capable of detecting at least one type of concept drift. Experimental results show HEFDD provides significant improvement based on the z-score test when comparing detection accuracy with state-of-the-art individual algorithms. At the same time, HEFDD is able to reduce false alarms generated by individual concept drift detection algorithms.

Download Full-text

Intrusion Detection over Network Packets using Data Stream Classification Algorithms

10.1109/ictai52525.2021.00157 ◽

2021 ◽

Author(s):

Gilberto Olimpio ◽

Pedro F. C. Silva ◽

Lasaro Camargos ◽

Rodrigo S. Miani ◽

Elaine R. de Faria

Keyword(s):

Intrusion Detection ◽

Data Stream ◽

Classification Algorithms ◽

Stream Classification ◽

Data Stream Classification ◽

Using Data

Download Full-text

Distributed Processing of Deep Learning Inference Models for Data Stream Classification

Journal of KIISE ◽

10.5626/jok.2021.48.10.1154 ◽

2021 ◽

Vol 48 (10) ◽

pp. 1154-1165

Author(s):

Hyojong Moon ◽

Siwoon Son ◽

Yang-Sae Moon

Keyword(s):

Deep Learning ◽

Data Stream ◽

Distributed Processing ◽

Inference Models ◽

Stream Classification ◽

Data Stream Classification

Download Full-text

Enhanced Data Stream Classification by Optimized Weight Updated Meta-learning: Continuous learning-based on Concept-Drift

International Journal of Web Information Systems ◽

10.1108/ijwis-01-2021-0007 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Maisnam Niranjan Singh ◽

Samitha Khaiyum

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Concept Drift ◽

Learning Model ◽

Streaming Data ◽

Spotted Hyena ◽

Continuous Learning ◽

Content Type ◽

Data Stream Classification ◽

Meta Learning

Purpose The aim of continuous learning is to obtain and fine-tune information gradually without removing the already existing information. Many conventional approaches in streaming data classification assume that all arrived new data is completely labeled. To regularize Neural Networks (NNs) by merging side information like user-provided labels or pair-wise constraints, incremental semi-supervised learning models need to be introduced. However, they are hard to implement, specifically in non-stationary environments because of the efficiency and sensitivity of such algorithms to parameters. The periodic update and maintenance of the decision method is the significant challenge in incremental algorithms whenever the new data arrives. Design/methodology/approach Hence, this paper plans to develop the meta-learning model for handling continuous or streaming data. Initially, the data pertain to continuous behavior is gathered from diverse benchmark source. Further, the classification of the data is performed by the Recurrent Neural Network (RNN), in which testing weight is adjusted or optimized by the new meta-heuristic algorithm. Here, the weight is updated for reducing the error difference between the target and the measured data when new data is given for testing. The optimized weight updated testing is performed by evaluating the concept-drift and classification accuracy. The new continuous learning by RNN is accomplished by the improved Opposition-based Novel Updating Spotted Hyena Optimization (ONU-SHO). Finally, the experiments with different datasets show that the proposed learning is improved over the conventional models. Findings From the analysis, the accuracy of the ONU-SHO based RNN (ONU-SHO-RNN) was 10.1% advanced than Decision Tree (DT), 7.6% advanced than Naive Bayes (NB), 7.4% advanced than k-nearest neighbors (KNN), 2.5% advanced than Support Vector Machine (SVM) 9.3% advanced than NN, and 10.6% advanced than RNN. Hence, it is confirmed that the ONU-SHO algorithm is performing well for acquiring the best data stream classification. Originality/value This paper introduces a novel meta-learning model using Opposition-based Novel Updating Spotted Hyena Optimization (ONU-SHO)-based Recurrent Neural Network (RNN) for handling continuous or streaming data. This is the first work utilizes a novel meta-learning model using Opposition-based Novel Updating Spotted Hyena Optimization (ONU-SHO)-based Recurrent Neural Network (RNN) for handling continuous or streaming data.

Download Full-text

Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance

Computational Intelligence and Neuroscience ◽

10.1155/2021/8813806 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yange Sun ◽

Meng Li ◽

Lei Li ◽

Han Shao ◽

Yi Sun

Keyword(s):

Data Streams ◽

Data Stream ◽

Learning Strategy ◽

Concept Drift ◽

Class Imbalance ◽

Data Preprocessing ◽

Cost Information ◽

Detection Mechanism ◽

Stream Classification ◽

Data Stream Classification

Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.

Download Full-text

A Clustering-based framework for Classifying Data Streams

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/448 ◽

2021 ◽

Author(s):

Xuyang Yan ◽

Abdollah Homaifar ◽

Mrinmoy Sarkar ◽

Abenezer Girma ◽

Edward Tunstel

Keyword(s):

Machine Learning ◽

Data Streams ◽

Cluster Structure ◽

Machine Learning Techniques ◽

Design Parameters ◽

Stream Classification ◽

Data Stream Classification ◽

Classification Framework ◽

Learning Techniques ◽

Comparable Performance

The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches either require an initial label set or rely on specialized design parameters. The overlap among classes and the labeling of data streams constitute other major challenges for classifying data streams. In this paper, we proposed a clustering-based data stream classification framework to handle non-stationary data streams without utilizing an initial label set. A density-based stream clustering procedure is used to capture novel concepts with a dynamic threshold and an effective active label querying strategy is introduced to continuously learn the new concepts from the data streams. The sub-cluster structure of each cluster is explored to handle the overlap among classes. Experimental results and quantitative comparison studies reveal that the proposed method provides statistically better or comparable performance than the existing methods.

Download Full-text

Dynamically Selected Ensemble for Data Stream Classification

10.1109/ijcnn52387.2021.9533702 ◽

2021 ◽

Author(s):

Lucca Portes Cavalheiro ◽

Alceu De Souza Britto ◽

Jean Paul Barddal ◽

Laurent Heutte

Keyword(s):

Data Stream ◽

Stream Classification ◽

Data Stream Classification

Download Full-text

data stream classification
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Data stream classification with ant colony optimisation

Towards Building a Flexible Online Learning Model for Data Stream Classification

Improvement of Data Stream Decision Trees

Heuristic ensemble for unsupervised detection of multiple types of concept drift in data stream classification

Intrusion Detection over Network Packets using Data Stream Classification Algorithms

Distributed Processing of Deep Learning Inference Models for Data Stream Classification

Enhanced Data Stream Classification by Optimized Weight Updated Meta-learning: Continuous learning-based on Concept-Drift

Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance

A Clustering-based framework for Classifying Data Streams

Dynamically Selected Ensemble for Data Stream Classification

Export Citation Format

data stream classificationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Data stream classification with ant colony optimisation

Towards Building a Flexible Online Learning Model for Data Stream Classification

Improvement of Data Stream Decision Trees

Heuristic ensemble for unsupervised detection of multiple types of concept drift in data stream classification

Intrusion Detection over Network Packets using Data Stream Classification Algorithms

Distributed Processing of Deep Learning Inference Models for Data Stream Classification

Enhanced Data Stream Classification by Optimized Weight Updated Meta-learning: Continuous learning-based on Concept-Drift

Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance

A Clustering-based framework for Classifying Data Streams

Dynamically Selected Ensemble for Data Stream Classification

data stream classification
Recently Published Documents