Classification Model for Data Streams Based on Similarity

Stream data classification suffered from a problem of infinite length, concept evaluation, feature evaluation and data drift. Data stream labeling is more challenging than label static data because of several unique properties of data streams. Data streams are suppose to have infinite length, which makes it difficult to store and use all the historical data for training. Earlier multi-pass machine learning technique is not directly applied to data streams. Data streams discover concept-drift, which occurs when the discontinue concept of the data changes over time. In order to address concept drift, a classification model must endlessly adapt itself to the most recent concept. Various authors reduce these problem using machine learning approach and feature optimization technique. In this paper we present various method for reducing such problem occurred in stream data classification. Here we also discuss a machine learning technique for feature evaluation process for generation of novel class.

Download Full-text

Minority Resampling Based Ensemble Framework Using Enhanced Early Drift Detection Method For Imbalanced Data Streams

10.21203/rs.3.rs-141880/v1 ◽

2021 ◽

Author(s):

Priya S ◽

Annie Uthra

Keyword(s):

Data Streams ◽

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Class Imbalance ◽

Current Data ◽

Classification Model ◽

Ensemble Classifiers ◽

K Nearest Neighbor ◽

Jaccard Similarity

Abstract As the data mining applications are increasing popularly, large volumes of data streams are generated over the period of time. The main problem in data streams is that it exhibits a high degree of class imbalance and distribution of data changes over time. In this paper, Timely Drift Detection and Minority Resampling Technique (TDDMRT) based on K-nearest neighbor and Jaccard similarity is proposed to handle the class imbalance by finding the current ratio of class labels. The Enhanced Early Drift Detection Method (EEDDM) is proposed for detecting the concept drift and the Minority Resampling Method (KNN-JS) determines whether the current data stream should be regarded as imbalance and it resamples the minority instances in the drifting data stream. The K-Nearest Neighbors technique is used to resample the minority classes and the Jaccard similarity measure is established over the resampled data to generate the synthetic data similar to the original data and it is handled by ensemble classifiers. The proposed ensemble based classification model outperforms the existing over sampling and under sampling techniques with accuracy of 98.52%.

Download Full-text

Concept Drift and Evolution Detection in Fusion Diagnosis With Evolving Data Streams

Volume 2A: 43rd Design Automation Conference ◽

10.1115/detc2017-68373 ◽

2017 ◽

Author(s):

Amirmahyar Abdolsamadi ◽

Pingfeng Wang

Keyword(s):

Data Streams ◽

Concept Drift ◽

Data Distribution ◽

Streaming Data ◽

Majority Voting ◽

Classification Model ◽

Engineering System ◽

Concept Evolution ◽

Adaptive Fusion

Health diagnosis interprets data streams acquired by smart sensors and makes inferences about health conditions of an engineering system thereby making critical operational decisions. A data stream is a flow of continuous data that face some challenges in data mining. This paper addresses concept drift and concept evolution as two major challenges in the classification of streaming data. Concept drift occurs as a result of data distribution changes. Concept evolution happens when new classes appear in the stream. These changes may cause the degradation of classification results over time. This paper presents an adaptive fusion learning approach to build a robust classification model. The proposed approach consists of three steps: (i) proposed fusion formulation using weighted majority voting (ii) active learning to labels selectively instead of querying for all true labels (iii) distance-based approach to monitoring the movement of data distribution. A diagnosis case study has been used to demonstrate the developed fusion diagnosis methodology.

Download Full-text

Data Stream Classification by Dynamic Incremental Semi-Supervised Fuzzy Clustering

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213019600091 ◽

2019 ◽

Vol 28 (08) ◽

pp. 1960009 ◽

Cited By ~ 9

Author(s):

Gabriella Casalino ◽

Giovanna Castellano ◽

Corrado Mencar

Keyword(s):

Fuzzy Clustering ◽

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Classification Model ◽

Real World Data ◽

Stream Classification ◽

Data Stream Classification ◽

Partially Labeled Data ◽

Classification Quality

A data stream classification method called DISSFCM (Dynamic Incremental Semi-Supervised FCM) is presented, which is based on an incremental semi-supervised fuzzy clustering algorithm. The method assumes that partially labeled data belonging to different classes are continuously available during time in form of chunks. Each chunk is processed by semi-supervised fuzzy clustering leading to a cluster-based classification model. The proposed DISSFCM is capable of dynamically adapting the number of clusters to data streams, by splitting low-quality clusters so as to improve classification quality. Experimental results on both synthetic and real-world data show the effectiveness of the proposed method in data stream classification.

Download Full-text

A Semi-Supervised clustering based classification model for classifying imbalanced data streams in the presence of scarcely labelled data

International Journal of Business Intelligence and Data Mining ◽

10.1504/ijbidm.2022.10034300 ◽

2022 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Kiran Bhowmick ◽

Meera Narvekar

Keyword(s):

Data Streams ◽

Imbalanced Data ◽

Classification Model ◽

Supervised Clustering

Download Full-text

Online Feature Extraction Algorithms for Data Streams

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.132.6 ◽

2012 ◽

Vol 132 (1) ◽

pp. 6-13

Author(s):

Seiichi Ozawa

Keyword(s):

Feature Extraction ◽

Data Streams

Download Full-text

Filtering of Mixed Data Streams with Orthogonal Polarization up to 50 Gbps in Micro-Ring/Bus Waveguide

2019 24th OptoElectronics and Communications Conference (OECC) and 2019 International Conference on Photonics in Switching and Computing (PSC) ◽

10.23919/ps.2019.8817775 ◽

2019 ◽

Author(s):

Zih-Chun Su ◽

Chih-Hsien Cheng ◽

Bo-Ji Huang ◽

Huai-Yung Wang ◽

Chun-Nien Liu ◽

...

Keyword(s):

Data Streams ◽

Mixed Data ◽

Orthogonal Polarization

Download Full-text

Improved Macro-clusters generation using Top-k shared Micro-clusters in Data Streams

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i10.400 ◽

2017 ◽

Vol 7 (10) ◽

pp. 52

Author(s):

LAKSHMI PRANEETHA

Keyword(s):

Real Time ◽

Data Streams ◽

Bloom Filter ◽

Scientific Applications ◽

Pruning Algorithm ◽

Density Data ◽

Data Points ◽

Short Time ◽

Information Streams

Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.

Download Full-text

Classification model with feature generation guided by cluster scoring

Information Science and Management Engineering II (Set) ◽

10.2495/isme20140511 ◽

2014 ◽

Author(s):

Wen Fan ◽

Ping Wang ◽

Hongyue Sun

Keyword(s):

Classification Model ◽

Feature Generation

Download Full-text