scholarly journals Concept Drift Identification using Classifier Ensemble Approach

Author(s):  
Leena Deshpande ◽  
M. Narsing Rao

<p>Abstract:-In Internetworking system, the huge amount of data is scattered, generated and processed over the network. The data mining techniques are used to discover the unknown pattern from the underlying data. A traditional classification model is used to classify the data based on past labelled data. However in many current applications, data is increasing in size with fluctuating patterns. Due to this new feature may arrive in the data. It is present in many applications like sensornetwork, banking and telecommunication systems, financial domain, Electricity usage and prices based on its demand and supplyetc .Thus change in data distribution reduces the accuracy of classifying the data. It may discover some patterns as frequent while other patterns tend to disappear and wrongly classify. To mine such data distribution, traditionalclassification techniques may not be suitable as the distribution generating the items can change over time so data from the past may become irrelevant or even false for the current prediction. For handlingsuch varying pattern of data, concept drift mining approach is used to improve the accuracy of classification techniques. In this paper we have proposed ensemble approach for improving the accuracy of classifier. The ensemble classifier is applied on 3 different data sets. We investigated different features for the different chunk of data which is further given to ensemble classifier. We observed the proposed approach improves the accuracy of classifier for different chunks of data.</p>

2015 ◽  
Vol 7 (2) ◽  
pp. 29-57 ◽  
Author(s):  
Nabil M. Hewahi ◽  
Ibrahim M. Elbouhissi

In data mining, the phenomenon of change in data distribution over time is known as concept drift. In this research, the authors introduce a new approach called Concepts Seeds Gathering and Dataset Updating algorithm (CSG-DU) that gives the traditional classification models the ability to adapt and cope with concept drift as time passes. CSG-DU is concerned with discovering new concepts in data stream and aims to increase the classification accuracy using any classification model when changes occur in the underlying concepts. The proposed approach has been tested using synthetic and real datasets. The experiments conducted show that after applying the authors' approach, the classification accuracy increased from low values to high and acceptable ones. Finally, a comparison study between CSG-DU and Set Formation for Delayed Labeling algorithm (SFDL) has been conducted; SFDL is an approach that handles sudden and gradual concept drift. CSG-DU results outperforms SFDL in terms of classification accuracy.


2019 ◽  
Vol 214 ◽  
pp. 04041 ◽  
Author(s):  
Davide Michelino ◽  
Silvio Pardi ◽  
Guido Russo ◽  
Bernardino Spisso

The implementation of cache systems in the computing model of HEP experiments enables to accelerate access to hot data sets by scientists, opening new scenarios of data distribution and enable to exploit the paradigm of storage-less sites. In this work, we present a study for the creation of an http data-federation ecosystem with caching functionality. We created plug-in integrated in the logic of a DPM Storage, able to reproduce a cache behaviour, taking advantage from the new feature introduced in the last version of Disk Pool Manager, called volatile-pool. Then we used Dynafed as lightweight federation system to aggregate a set of standard Grid Storage together with the caching system. With the designed setup, clients asking for a file present on the Data-Grid are automatically redirected to the cache, if the cache is the closest storage, thanks to the action of the geo-plugin run by Dynafed. As proof of the concept, we tested the whole system in a controlled environment within the Belle II computing infrastructure using a set of files located in production Storage Elements. Preliminary results demonstrate the proper functionality of the logic and encourage continuing the work.


2021 ◽  
Vol 7 ◽  
pp. e660
Author(s):  
Sanjeev Kumar ◽  
Ravendra Singh ◽  
Mohammad Zubair Khan ◽  
Abdulfattah Noorwali

DataStream mining is a challenging task for researchers because of the change in data distribution during classification, known as concept drift. Drift detection algorithms emphasize detecting the drift. The drift detection algorithm needs to be very sensitive to change in data distribution for detecting the maximum number of drifts in the data stream. But highly sensitive drift detectors lead to higher false-positive drift detections. This paper proposed a Drift Detection-based Adaptive Ensemble classifier for sentiment analysis and opinion mining, which uses these false-positive drift detections to benefit and minimize the negative impact of false-positive drift detection signals. The proposed method creates and adds a new classifier to the ensemble whenever a drift happens. A weighting mechanism is implemented, which provides weights to each classifier in the ensemble. The weight of the classifier decides the contribution of each classifier in the final classification results. The experiments are performed using different classification algorithms, and results are evaluated on the accuracy, precision, recall, and F1-measures. The proposed method is also compared with these state-of-the-art methods, OzaBaggingADWINClassifier, Accuracy Weighted Ensemble, Additive Expert Ensemble, Streaming Random Patches, and Adaptive Random Forest Classifier. The results show that the proposed method handles both true positive and false positive drifts efficiently.


2020 ◽  
Vol 11 (1) ◽  
pp. 15-26
Author(s):  
Jay Gandhi ◽  
Vaibhav Gandhi

Data stream mining has become an interesting analysis topic and it is a growing interest in data discovery method. There are several applications supporting stream data processing like device network, electronic network, etc. Our approach AhtNODE (Adaptive Hoeffding Tree based NOvel class DEtection) detects novel class in the presence of concept drift in streaming data. It addresses there are three challenges of streaming data: infinite length, concept drift, and concept evolution. This approach automatically detects the novel class whenever it arrives in the data stream. It is a multi-class approach that distinguishes novel class from existing classes. The authors tend to apply the Adaptive Hoeffding Tree as a classification model that is also used to handle the concept drift situation. Previous approaches used the ensemble model to handle concept drift. In AHT, classification is done in the single pass. The experiment result proves the effectiveness of AhtNODE compared to existing ensemble classifier in terms of classification accuracy, speed and use of memory.


2018 ◽  
Vol 9 (2) ◽  
pp. 35-44
Author(s):  
Marko Bohanec ◽  
Mirjana Kljajić Borštnar ◽  
Marko Robnik-Šikonja

Abstract Background: In practical use of machine learning models, users may add new features to an existing classification model, reflecting their (changed) empirical understanding of a field. New features potentially increase classification accuracy of the model or improve its interpretability. Objectives: We have introduced a guideline for determination of the sample size needed to reliably estimate the impact of a new feature. Methods/Approach: Our approach is based on the feature evaluation measure ReliefF and the bootstrap-based estimation of confidence intervals for feature ranks. Results: We test our approach using real world qualitative business-tobusiness sales forecasting data and two UCI data sets, one with missing values. The results show that new features with a high or a low rank can be detected using a relatively small number of instances, but features ranked near the border of useful features need larger samples to determine their impact. Conclusions: A combination of the feature evaluation measure ReliefF and the bootstrap-based estimation of confidence intervals can be used to reliably estimate the impact of a new feature in a given problem


Author(s):  
Amirmahyar Abdolsamadi ◽  
Pingfeng Wang

Health diagnosis interprets data streams acquired by smart sensors and makes inferences about health conditions of an engineering system thereby making critical operational decisions. A data stream is a flow of continuous data that face some challenges in data mining. This paper addresses concept drift and concept evolution as two major challenges in the classification of streaming data. Concept drift occurs as a result of data distribution changes. Concept evolution happens when new classes appear in the stream. These changes may cause the degradation of classification results over time. This paper presents an adaptive fusion learning approach to build a robust classification model. The proposed approach consists of three steps: (i) proposed fusion formulation using weighted majority voting (ii) active learning to labels selectively instead of querying for all true labels (iii) distance-based approach to monitoring the movement of data distribution. A diagnosis case study has been used to demonstrate the developed fusion diagnosis methodology.


2021 ◽  
pp. 1-13
Author(s):  
Qingtian Zeng ◽  
Xishi Zhao ◽  
Xiaohui Hu ◽  
Hua Duan ◽  
Zhongying Zhao ◽  
...  

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


Author(s):  
Danlei Xu ◽  
Lan Du ◽  
Hongwei Liu ◽  
Penghui Wang

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.


Author(s):  
Jiaoyan Chen ◽  
Freddy Lecue ◽  
Jeff Z. Pan ◽  
Huajun Chen

Data stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. In the semantic Web, data is interpreted in ontologies and its ordered sequence is represented as an ontology stream. Our work exploits the semantics of such streams to tackle the problem of concept drift i.e., unexpected changes in data distribution, causing most of models to be less accurate as time passes. To this end we revisited (i) semantic inference in the context of supervised stream learning, and (ii) models with semantic embeddings. The experiments show accurate prediction with data from Dublin and Beijing.


Sign in / Sign up

Export Citation Format

Share Document