Concept Drift and Evolution Detection in Fusion Diagnosis With Evolving Data Streams

Volume 2A: 43rd Design Automation Conference ◽

10.1115/detc2017-68373 ◽

2017 ◽

Author(s):

Amirmahyar Abdolsamadi ◽

Pingfeng Wang

Keyword(s):

Data Streams ◽

Concept Drift ◽

Data Distribution ◽

Streaming Data ◽

Majority Voting ◽

Classification Model ◽

Engineering System ◽

Concept Evolution ◽

Adaptive Fusion

Health diagnosis interprets data streams acquired by smart sensors and makes inferences about health conditions of an engineering system thereby making critical operational decisions. A data stream is a flow of continuous data that face some challenges in data mining. This paper addresses concept drift and concept evolution as two major challenges in the classification of streaming data. Concept drift occurs as a result of data distribution changes. Concept evolution happens when new classes appear in the stream. These changes may cause the degradation of classification results over time. This paper presents an adaptive fusion learning approach to build a robust classification model. The proposed approach consists of three steps: (i) proposed fusion formulation using weighted majority voting (ii) active learning to labels selectively instead of querying for all true labels (iii) distance-based approach to monitoring the movement of data distribution. A diagnosis case study has been used to demonstrate the developed fusion diagnosis methodology.

Download Full-text

Design of a Robust Classification Fusion Platform for Structural Health Diagnostics

Volume 3A: 39th Design Automation Conference ◽

10.1115/detc2013-12601 ◽

2013 ◽

Author(s):

Prasanna Tamilselvan ◽

Pingfeng Wang ◽

Chao Hu

Keyword(s):

Majority Voting ◽

Classification Model ◽

Robust Classification ◽

Structural Health ◽

Study Results ◽

Attribute Classification ◽

Engineered Systems ◽

Multiple Membership ◽

Fusion Approach

Efficient health diagnostics provides benefits such as improved safety, improved reliability, and reduced costs for the operation and maintenance of engineered systems. This paper presents a multi-attribute classification fusion approach which leverages the strengths provided by multiple membership classifiers to form a robust classification model for structural health diagnostics. Health diagnosis using the developed approach consists of three primary steps: (i) fusion formulation using a k-fold cross validation model; (ii) diagnostics with multiple multi-attribute classifiers as member algorithms; and (iii) classification fusion through a weighted majority voting with dominance system. State-of-the-art classification techniques from three broad categories (i.e., supervised learning, unsupervised learning, and statistical inference) were employed as the member algorithms. The proposed classification fusion approach is demonstrated with a bearing health diagnostics problem. Case study results indicated that the proposed approach outperforms any stand-alone member algorithm with better diagnostic accuracy and robustness.

Download Full-text

A Review of Classification and Novel Class Detection Technique of Data Streams

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v3i2c.2891 ◽

2012 ◽

Vol 3 (2) ◽

pp. 314-316

Author(s):

Manish Rai ◽

Rekha Pandit

Keyword(s):

Machine Learning ◽

Data Streams ◽

Concept Drift ◽

Data Classification ◽

Classification Model ◽

Infinite Length ◽

Stream Data ◽

Machine Learning Technique ◽

Feature Evaluation ◽

Learning Technique

Stream data classification suffered from a problem of infinite length, concept evaluation, feature evaluation and data drift. Data stream labeling is more challenging than label static data because of several unique properties of data streams. Data streams are suppose to have infinite length, which makes it difficult to store and use all the historical data for training. Earlier multi-pass machine learning technique is not directly applied to data streams. Data streams discover concept-drift, which occurs when the discontinue concept of the data changes over time. In order to address concept drift, a classification model must endlessly adapt itself to the most recent concept. Various authors reduce these problem using machine learning approach and feature optimization technique. In this paper we present various method for reducing such problem occurred in stream data classification. Here we also discuss a machine learning technique for feature evaluation process for generation of novel class.

Download Full-text

Novel Class Detection with Concept Drift in Data Stream - AhtNODE

International Journal of Distributed Systems and Technologies ◽

10.4018/ijdst.2020010102 ◽

2020 ◽

Vol 11 (1) ◽

pp. 15-26

Author(s):

Jay Gandhi ◽

Vaibhav Gandhi

Keyword(s):

Data Stream ◽

Concept Drift ◽

Ensemble Classifier ◽

Streaming Data ◽

Classification Model ◽

Infinite Length ◽

The Novel ◽

Stream Data ◽

Hoeffding Tree ◽

Discovery Method

Data stream mining has become an interesting analysis topic and it is a growing interest in data discovery method. There are several applications supporting stream data processing like device network, electronic network, etc. Our approach AhtNODE (Adaptive Hoeffding Tree based NOvel class DEtection) detects novel class in the presence of concept drift in streaming data. It addresses there are three challenges of streaming data: infinite length, concept drift, and concept evolution. This approach automatically detects the novel class whenever it arrives in the data stream. It is a multi-class approach that distinguishes novel class from existing classes. The authors tend to apply the Adaptive Hoeffding Tree as a classification model that is also used to handle the concept drift situation. Previous approaches used the ensemble model to handle concept drift. In AHT, classification is done in the single pass. The experiment result proves the effectiveness of AhtNODE compared to existing ensemble classifier in terms of classification accuracy, speed and use of memory.

Download Full-text

Concept Drift Identification using Classifier Ensemble Approach

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i1.pp19-25 ◽

2018 ◽

Vol 8 (1) ◽

pp. 19

Author(s):

Leena Deshpande ◽

M. Narsing Rao

Keyword(s):

Concept Drift ◽

Data Distribution ◽

Ensemble Classifier ◽

Classification Model ◽

Data Sets ◽

Ensemble Approach ◽

Telecommunication Systems ◽

Traditional Classification ◽

Financial Domain ◽

New Feature

<p>Abstract:-In Internetworking system, the huge amount of data is scattered, generated and processed over the network. The data mining techniques are used to discover the unknown pattern from the underlying data. A traditional classification model is used to classify the data based on past labelled data. However in many current applications, data is increasing in size with fluctuating patterns. Due to this new feature may arrive in the data. It is present in many applications like sensornetwork, banking and telecommunication systems, financial domain, Electricity usage and prices based on its demand and supplyetc .Thus change in data distribution reduces the accuracy of classifying the data. It may discover some patterns as frequent while other patterns tend to disappear and wrongly classify. To mine such data distribution, traditionalclassification techniques may not be suitable as the distribution generating the items can change over time so data from the past may become irrelevant or even false for the current prediction. For handlingsuch varying pattern of data, concept drift mining approach is used to improve the accuracy of classification techniques. In this paper we have proposed ensemble approach for improving the accuracy of classifier. The ensemble classifier is applied on 3 different data sets. We investigated different features for the different chunk of data which is further given to ensemble classifier. We observed the proposed approach improves the accuracy of classifier for different chunks of data.</p>

Download Full-text

Minority Resampling Based Ensemble Framework Using Enhanced Early Drift Detection Method For Imbalanced Data Streams

10.21203/rs.3.rs-141880/v1 ◽

2021 ◽

Author(s):

Priya S ◽

Annie Uthra

Keyword(s):

Data Streams ◽

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Class Imbalance ◽

Current Data ◽

Classification Model ◽

Ensemble Classifiers ◽

K Nearest Neighbor ◽

Jaccard Similarity

Abstract As the data mining applications are increasing popularly, large volumes of data streams are generated over the period of time. The main problem in data streams is that it exhibits a high degree of class imbalance and distribution of data changes over time. In this paper, Timely Drift Detection and Minority Resampling Technique (TDDMRT) based on K-nearest neighbor and Jaccard similarity is proposed to handle the class imbalance by finding the current ratio of class labels. The Enhanced Early Drift Detection Method (EEDDM) is proposed for detecting the concept drift and the Minority Resampling Method (KNN-JS) determines whether the current data stream should be regarded as imbalance and it resamples the minority instances in the drifting data stream. The K-Nearest Neighbors technique is used to resample the minority classes and the Jaccard similarity measure is established over the resampled data to generate the synthetic data similar to the original data and it is handled by ensemble classifiers. The proposed ensemble based classification model outperforms the existing over sampling and under sampling techniques with accuracy of 98.52%.

Download Full-text

CONCEPT DRIFT IN STREAMING DATA: A SYSTEMATIC LITERATURE REVIEW

KIET Journal of Computing and Information Sciences ◽

10.51153/kjcis.v4i1.43 ◽

2021 ◽

Vol 4 (1) ◽

pp. 17

Author(s):

Tariq Mahmood ◽

Tatheer Fatima

Keyword(s):

Machine Learning ◽

Literature Review ◽

Systematic Literature Review ◽

Data Streams ◽

Concept Drift ◽

Streaming Data ◽

Machine Learning Techniques ◽

Underlying Distribution ◽

Learning Techniques ◽

Real World Datasets

World is generating immeasurable amount of data every minute, that needs to be analyzed for better decision making. In order to fulfil this demand of faster analytics, businesses are adopting efficient stream processing and machine learning techniques. However, data streams are particularly challenging to handle. One of the prominent problems faced while dealing with streaming data is concept drift. Concept drift is described as, an unexpected change in the underlying distribution of the streaming data that can be observed as time passes. In this work, we have conducted a systematic literature review to discover several methods that deal with the problem of concept drift. Most frequently used supervised and unsupervised techniques have been reviewed and we have also surveyed commonly used publicly available artificial and real-world datasets that are used to deal with concept drift issues.

Download Full-text

Method of Concept-Drifting Feature Extracting in Data Streams Based on Granular Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.50-51.934 ◽

2011 ◽

Vol 50-51 ◽

pp. 934-938

Author(s):

Chun Hua Ju ◽

Zhao Qian Shuai

Keyword(s):

Data Streams ◽

Granular Computing ◽

Concept Drift ◽

Concept Lattice ◽

Coincidence Degree ◽

Streaming Data ◽

Formal Concept ◽

Feature Extraction Method ◽

Feature Extracting ◽

Relaxation Matching

Business data streams are dynamic and easy to drift, extract concept-drifting feature is one important work of data streams mining. This paper describes the characteristics and the concept drift of data streams, and constructs the formal concept description model of streaming data based on granular computing firstly. Then, the paper proposes the concept lattice pairs’ based concept relaxation-matching coincidence degree algorithm; the feature extraction method is also described. Finally, experiment and analysis are presented in order to explain and evaluate the method.

Download Full-text

Anomalies Detection Using Isolation in Concept-Drifting Data Streams

Computers ◽

10.3390/computers10010013 ◽

2021 ◽

Vol 10 (1) ◽

pp. 13

Author(s):

Maurras Ulbricht Togbe ◽

Yousra Chabchoub ◽

Aliou Boly ◽

Mariam Barry ◽

Raja Chiky ◽

...

Keyword(s):

Anomaly Detection ◽

Half Space ◽

Data Streams ◽

Detection Efficiency ◽

Concept Drift ◽

Streaming Data ◽

Detection Methods ◽

Data Sets ◽

Stream Data ◽

Isolation Forest

Detecting anomalies in streaming data is an important issue for many application domains, such as cybersecurity, natural disasters, or bank frauds. Different approaches have been designed in order to detect anomalies: statistics-based, isolation-based, clustering-based, etc. In this paper, we present a structured survey of the existing anomaly detection methods for data streams with a deep view on Isolation Forest (iForest). We first provide an implementation of Isolation Forest Anomalies detection in Stream Data (IForestASD), a variant of iForest for data streams. This implementation is built on top of scikit-multiflow (River), which is an open source machine learning framework for data streams containing a single anomaly detection algorithm in data streams, called Streaming half-space trees. We performed experiments on different real and well known data sets in order to compare the performance of our implementation of IForestASD and half-space trees. Moreover, we extended the IForestASD algorithm to handle drifting data by proposing three algorithms that involve two main well known drift detection methods: ADWIN and KSWIN. ADWIN is an adaptive sliding window algorithm for detecting change in a data stream. KSWIN is a more recent method and it refers to the Kolmogorov–Smirnov Windowing method for concept drift detection. More precisely, we extended KSWIN to be able to deal with n-dimensional data streams. We validated and compared all of the proposed methods on both real and synthetic data sets. In particular, we evaluated the F1-score, the execution time, and the memory consumption. The experiments show that our extensions have lower resource consumption than the original version of IForestASD with a similar or better detection efficiency.

Download Full-text

A Generic Fusion Platform of Failure Diagnostics for Resilient Engineering System Design

Volume 2A: 41st Design Automation Conference ◽

10.1115/detc2015-47009 ◽

2015 ◽

Cited By ~ 1

Author(s):

Amirmahyar Abdolsamadi ◽

Pingfeng Wang ◽

Prasanna Tamilselvan

Keyword(s):

Majority Voting ◽

Classification Model ◽

Engineering System ◽

Complex Engineered Systems ◽

Attribute Classification ◽

Engineered Systems ◽

Fusion Diagnostics ◽

Engineering System Design ◽

Multiple Membership ◽

Fusion Approach

Effective health diagnostics provides benefits such as improved safety, improved reliability, and reduced costs for the operation and maintenance of complex engineered systems. This paper presents a multi-attribute classification fusion approach which leverages the strengths provided by multiple membership classifiers to form a robust classification model for structural health diagnostics. The developed classification fusion approach conducts the health diagnostics with three primary stages: (i) fusion formulation using a k-fold cross validation model; (ii) diagnostics with multiple multi-attribute classifiers as member algorithms; and (iii) classification fusion through a weighted majority voting with dominance system. State-of-the-art classification techniques from three broad categories (i.e., supervised learning, unsupervised learning, and statistical inference) are employed as the member algorithms. The developed classification fusion approach is demonstrated with the 2008 PHM challenge problem. The developed fusion diagnostics approach outperforms any stand-alone member algorithm with better diagnostic accuracy and robustness.

Download Full-text

Classification of the drifting data streams using heterogeneous diversified dynamic class-weighted ensemble

PeerJ Computer Science ◽

10.7717/peerj-cs.459 ◽

2021 ◽

Vol 7 ◽

pp. e459

Author(s):

Martin Sarnovsky ◽

Michal Kolarik

Keyword(s):

Data Streams ◽

Concept Drift ◽

Ensemble Methods ◽

Predictive Performance ◽

Streaming Data ◽

Underlying Structure ◽

Adaptive Models ◽

Resource Requirements ◽

Continuous Stream

Data streams can be defined as the continuous stream of data coming from different sources and in different forms. Streams are often very dynamic, and its underlying structure usually changes over time, which may result to a phenomenon called concept drift. When solving predictive problems using the streaming data, traditional machine learning models trained on historical data may become invalid when such changes occur. Adaptive models equipped with mechanisms to reflect the changes in the data proved to be suitable to handle drifting streams. Adaptive ensemble models represent a popular group of these methods used in classification of drifting data streams. In this paper, we present the heterogeneous adaptive ensemble model for the data streams classification, which utilizes the dynamic class weighting scheme and a mechanism to maintain the diversity of the ensemble members. Our main objective was to design a model consisting of a heterogeneous group of base learners (Naive Bayes, k-NN, Decision trees), with adaptive mechanism which besides the performance of the members also takes into an account the diversity of the ensemble. The model was experimentally evaluated on both real-world and synthetic datasets. We compared the presented model with other existing adaptive ensemble methods, both from the perspective of predictive performance and computational resource requirements.

Download Full-text