ADAW: Age decay accuracy weighted ensemble method for drifting data stream mining

Dynamic environment data generators are very often in real-world that produce data streams. A data source of a dynamic environment generates data streams in which the underlying data distribution changes very frequently with respect to time and hence results in concept drifts. As compared to the stationary environment, learning in the dynamic environment is very difficult due to the presence of concept drifts. Learning in dynamic environment requires evolutionary and adaptive approaches to be accommodated with the learning algorithms. Ensemble methods are commonly used to build classifiers for learning in a dynamic environment. The ensemble methods of learning are generally described at three very crucial aspects, namely, the learning and testing method employed, result integration method and forgetting mechanism for old concepts. In this paper, we propose a novel approach called Age Decay Accuracy Weighted (ADAW) ensemble architecture for learning in concept drifting data streams. The ADAW method assigned weights to the component classifiers based on its accuracy and its remaining life-time in the ensemble is such a way that ensures maximum accuracy. We empirically evaluated ADAW on benchmark artificial drifting data stream generators and real datasets and compared its performance with ten well-known state-of-the-art existing methods. The experimental results show that ADAW outperforms over the existing methods.

Download Full-text

Adaptive Ensemble with Human Memorizing Characteristics for Data Stream Mining

Mathematical Problems in Engineering ◽

10.1155/2015/874032 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10

Author(s):

Yanhuang Jiang ◽

Qiangli Zhao ◽

Yutong Lu

Keyword(s):

Data Stream ◽

Data Stream Mining ◽

Memory Retention ◽

Stream Mining ◽

Mining System ◽

Complex Concept ◽

Knowledge Repository ◽

Component Classifier ◽

Concept Drifts ◽

Forgetting Mechanism

Combining several classifiers on sequential chunks of training instances is a popular strategy for data stream mining with concept drifts. This paper introduces human recalling and forgetting mechanisms into a data stream mining system and proposes a Memorizing Based Data Stream Mining (MDSM) model. In this model, each component classifier is regarded as a piece of knowledge that a human obtains through learning some materials and has a memory retention value reflecting its usefulness in the history. The classifiers with high memory retention values are reserved in a “knowledge repository.” When a new data chunk comes, most useful classifiers will be selected (recalled) from the repository and compose the current target ensemble. Based on MDSM, we put forward a new algorithm, MAE (Memorizing Based Adaptive Ensemble), which uses Ebbinghaus forgetting curve as the forgetting mechanism and adopts ensemble pruning as the recalling mechanism. Compared with four popular data stream mining approaches on the datasets with different concept drifts, the experimental results show that MAE achieves high and stable predicting accuracy, especially for the applications with recurring or complex concept drifts. The results also prove the effectiveness of MDSM model.

Download Full-text

A Survey of Challenges Facing Streaming Data

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.84.8579 ◽

2020 ◽

Vol 8 (4) ◽

pp. 63-73

Author(s):

Sikha Bagui ◽

Katie Jin

Keyword(s):

Data Reduction ◽

Data Streams ◽

Data Stream ◽

Stream Processing ◽

Streaming Data ◽

Data Detection ◽

Data Stream Processing ◽

The Face ◽

Concept Drifts

This survey performs a thorough enumeration and analysis of existing methods for data stream processing. It is a survey of the challenges facing streaming data. The challenges addressed are preprocessing of streaming data, detection and dealing with concept drifts in streaming data, data reduction in the face of data streams, approximate queries and blocking operations in streaming data.

Download Full-text

An Approximate Approach for Maintaining Recent Occurrences of Itemsets in a Sliding Window over Data Streams

Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development ◽

10.4018/978-1-60566-748-5.ch014 ◽

2010 ◽

pp. 308-327

Author(s):

Jia-Ling Koh ◽

Shu-Ning Shin ◽

Yuan-Bin Don

Keyword(s):

Data Streams ◽

Data Stream ◽

Traditional Approach ◽

Experimental Studies ◽

Dynamic Environment ◽

Sliding Window ◽

Fixed Time ◽

Frequent Itemsets ◽

Embedded Knowledge ◽

Data Elements

Recently, the data stream, which is an unbounded sequence of data elements generated at a rapid rate, provides a dynamic environment for collecting data sources. It is likely that the embedded knowledge in a data stream will change quickly as time goes by. Therefore, catching the recent trend of data is an important issue when mining frequent itemsets over data streams. Although the sliding window model proposed a good solution for this problem, the appearing information of patterns within a sliding window has to be maintained completely in the traditional approach. For estimating the approximate supports of patterns within a sliding window, the frequency changing point (FCP) method is proposed for monitoring the recent occurrences of itemsets over a data stream. In addition to a basic design proposed under the assumption that exact one transaction arrives at each time point, the FCP method is extended for maintaining recent patterns over a data stream where a block of various numbers of transactions (including zero or more transactions) is inputted within a fixed time unit. Accordingly, the recently frequent itemsets or representative patterns are discovered from the maintained structure approximately. Experimental studies demonstrate that the proposed algorithms achieve high true positive rates and guarantees no false dismissal to the results yielded. A theoretic analysis is provided for the guarantee. In addition, the authors’ approach outperforms the previously proposed method in terms of reducing the run-time memory usage significantly.

Download Full-text

A Novel Drift Detection Algorithm Based on Features’ Importance Analysis in a Data Streams Environment

Journal of Artificial Intelligence and Soft Computing Research ◽

10.2478/jaiscr-2020-0019 ◽

2020 ◽

Vol 10 (4) ◽

pp. 287-298

Author(s):

Piotr Duda ◽

Krzysztof Przybyszewski ◽

Lipo Wang

Keyword(s):

Random Forest ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Ensemble Methods ◽

Real Data ◽

Relevant Information ◽

Detection Algorithm ◽

Important Indicator ◽

Features Importance

AbstractThe training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of the features may additionally change over time. Such changes affect the performance of the classifier but can also be an important indicator of occurring concept-drift. In this work, we propose a new algorithm for data streams classification, called Random Forest with Features Importance (RFFI), which uses the measure of features importance as a drift detector. The RFFT algorithm implements solutions inspired by the Random Forest algorithm to the data stream scenarios. The proposed algorithm combines the ability of ensemble methods for handling slow changes in a data stream with a new method for detecting concept drift occurrence. The work contains an experimental analysis of the proposed algorithm, carried out on synthetic and real data.

Download Full-text

Dynamically Adjusting Diversity in Ensembles for the Classification of Data Streams with Concept Drift

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3466616 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Juan I. G. Hidalgo ◽

Silas G. T. C. Santos ◽

Roberto S. M. Barros

Keyword(s):

Parameter Estimation ◽

Real World ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Estimation Method ◽

Estimation Procedure ◽

Dynamic Parameter ◽

Real World Datasets ◽

Concept Drifts

A data stream can be defined as a system that continually generates a lot of data over time. Today, processing data streams requires new demands and challenging tasks in the data mining and machine learning areas. Concept Drift is a problem commonly characterized as changes in the distribution of the data within a data stream. The implementation of new methods for dealing with data streams where concept drifts occur requires algorithms that can adapt to several scenarios to improve its performance in the different experimental situations where they are tested. This research proposes a strategy for dynamic parameter adjustment in the presence of concept drifts. Parameter Estimation Procedure (PEP) is a general method proposed for dynamically adjusting parameters which is applied to the diversity parameter (λ) of several classification ensembles commonly used in the area. To this end, the proposed estimation method (PEP) was used to create Boosting-like Online Learning Ensemble with Parameter Estimation (BOLE-PE), Online AdaBoost-based M1 with Parameter Estimation (OABM1-PE), and Oza and Russell’s Online Bagging with Parameter Estimation (OzaBag-PE), based on the existing ensembles BOLE, OABM1, and OzaBag, respectively. To validate them, experiments were performed with artificial and real-world datasets using Hoeffding Tree (HT) as base classifier. The accuracy results were statistically evaluated using a variation of the Friedman test and the Nemenyi post-hoc test. The experimental results showed that the application of the dynamic estimation in the diversity parameter (λ) produced good results in most scenarios, i.e., the modified methods have improved accuracy in the experiments with both artificial and real-world datasets.

Download Full-text

Handling concept drifts and limited label problems using semi-supervised combine-merge Gaussian mixture model

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i6.3259 ◽

2021 ◽

Vol 10 (6) ◽

pp. 3361-3368

Author(s):

Ibnu Daqiqil Id ◽

Pardomuan Robinson Sihombing ◽

Supratman Zakir

Keyword(s):

Data Streams ◽

Data Stream ◽

High Speed ◽

Concept Drift ◽

Model Performance ◽

Gaussian Mixture ◽

Model Adaptation ◽

Model Accuracy ◽

Concept Drifts ◽

Do So

When predicting data streams, changes in data distribution may decrease model accuracy over time, thereby making the model obsolete. This phenomenon is known as concept drift. Detecting concept drifts and then adapting to them are critical operations to maintain model performance. However, model adaptation can only be made if labeled data is available. Labeling data is both costly and time-consuming because it has to be done by humans. Only part of the data can be labeled in the data stream because the data size is massive and appears at high speed. To solve these problems simultaneously, we apply a technique to update the model by employing both labeled and unlabeled instances to do so. The experiment results show that our proposed method can adapt to the concept drift with pseudo-labels and maintain its accuracy even though label availability is drastically reduced from 95% to 5%. The proposed method also has the highest overall accuracy and outperforms other methods in 5 of 10 datasets.

Download Full-text

Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency

Data Mining and Knowledge Discovery ◽

10.1007/s10618-021-00736-w ◽

2021 ◽

Author(s):

Ben Halstead ◽

Yun Sing Koh ◽

Patricia Riddle ◽

Russel Pears ◽

Mykola Pechenizkiy ◽

...

Keyword(s):

Data Streams ◽

Data Stream ◽

Memory Management ◽

Improve Performance ◽

Concept Evolution

Download Full-text

A Novel Approach for Finding Frequent Itemsets in Data Stream

International Journal of Intelligent Systems ◽

10.1002/int.21566 ◽

2013 ◽

Vol 28 (3) ◽

pp. 217-241 ◽

Cited By ~ 1

Author(s):

B. Chandra ◽

Shalini Bhaskar

Keyword(s):

Data Stream ◽

Frequent Itemsets ◽

Novel Approach

Download Full-text

Exploiting fractal dimension and a distributed evolutionary approach to classify data streams with concept drifts

Applied Soft Computing ◽

10.1016/j.asoc.2018.11.009 ◽

2019 ◽

Vol 75 ◽

pp. 284-297 ◽

Cited By ~ 2

Author(s):

Gianluigi Folino ◽

Massimo Guarascio ◽

Giuseppe Papuzzo

Keyword(s):

Fractal Dimension ◽

Data Streams ◽

Evolutionary Approach ◽

Concept Drifts

Download Full-text

Analysis of Data Stream Processing At Edge Layer for Internet of Things

Journal of ISMAC - June 2019 ◽

10.36548/jismac.2020.1.003 ◽

2020 ◽

Vol 2 (1) ◽

pp. 26-37

Author(s):

Dr. Pasumponpandian

Keyword(s):

Internet Of Things ◽

Data Streams ◽

Data Stream ◽

Smart Cities ◽

Stream Processing ◽

Middle Layer ◽

Cloud Services ◽

Decentralized Systems ◽

Data Stream Processing ◽

Edge Layer

The progress of internet of things at a rapid pace and simultaneous development of the technologies and the processing capabilities has paved way for the development of decentralized systems that are relying on cloud services. Though the decentralized systems are founded on cloud complexities still prevail in transferring all the information’s that are been sensed through the IOT devices to the cloud. This because of the huge streams of information’s gathered by certain applications and the expectation to have a timely response, incurring minimized delay, computing energy and enhanced reliability. So this kind of decentralization has led to the development of middle layer between the cloud and the IOT, and was termed as the Edge layer, meaning bringing down the service of the cloud to the user edge. The paper puts forth the analysis of the data stream processing in the edge layer taking in the complexities involved in the computing the data streams of IOT in an edge layer and puts forth the real time analytics in the edge layer to examine the data streams of the internet of things offering a data- driven insight for parking system in the smart cities.

Download Full-text