Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift

The treatment of large data streams in the presence of concept drifts is one of the main challenges in the field of data mining, particularly when the algorithms have to deal with concepts that disappear and then reappear. This paper presents a new algorithm, called Fast Adapting Ensemble (FAE), which adapts very quickly to both abrupt and gradual concept drifts, and has been specifically designed to deal with recurring concepts. FAE processes the learning examples in blocks of the same size, but it does not have to wait for the batch to be complete in order to adapt its base classification mechanism. FAE incorporates a drift detector to improve the handling of abrupt concept drifts and stores a set of inactive classifiers that represent old concepts, which are activated very quickly when these concepts reappear. We compare our new algorithm with various well-known learning algorithms, taking into account, common benchmark datasets. The experiments show promising results from the proposed algorithm (regarding accuracy and runtime), handling different types of concept drifts.

Download Full-text

ADES: A New Ensemble Diversity-Based Approach for Handling Concept Drift

Mobile Information Systems ◽

10.1155/2021/5549300 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Tinofirei Museba ◽

Fulufhelo Nelwamondo ◽

Khmaies Ouahada

Keyword(s):

Machine Learning ◽

Real World ◽

Data Streams ◽

Predictive Models ◽

Concept Drift ◽

Dynamic Environments ◽

Real World Data ◽

World Data ◽

Different Types ◽

Concept Drifts

Beyond applying machine learning predictive models to static tasks, a significant corpus of research exists that applies machine learning predictive models to streaming environments that incur concept drift. With the prevalence of streaming real-world applications that are associated with changes in the underlying data distribution, the need for applications that are capable of adapting to evolving and time-varying dynamic environments can be hardly overstated. Dynamic environments are nonstationary and change with time and the target variables to be predicted by the learning algorithm and often evolve with time, a phenomenon known as concept drift. Most work in handling concept drift focuses on updating the prediction model so that it can recover from concept drift while little effort has been dedicated to the formulation of a learning system that is capable of learning different types of drifting concepts at any time with minimum overheads. This work proposes a novel and evolving data stream classifier called Adaptive Diversified Ensemble Selection Classifier (ADES) that significantly optimizes adaptation to different types of concept drifts at any time and improves convergence to new concepts by exploiting different amounts of ensemble diversity. The ADES algorithm generates diverse base classifiers, thereby optimizing the margin distribution to exploit ensemble diversity to formulate an ensemble classifier that generalizes well to unseen instances and provides fast recovery from different types of concept drift. Empirical experiments conducted on both artificial and real-world data streams demonstrate that ADES can adapt to different types of drifts at any given time. The prediction performance of ADES is compared to three other ensemble classifiers designed to handle concept drift using both artificial and real-world data streams. The comparative evaluation performed demonstrated the ability of ADES to handle different types of concept drifts. The experimental results, including statistical test results, indicate comparable performances with other algorithms designed to handle concept drift and prove their significance and effectiveness.

Download Full-text

Knowledge Discovery From Evolving Data Streams

Advances in Business Information Systems and Analytics - Machine Learning Techniques for Improved Business Analytics ◽

10.4018/978-1-5225-3534-8.ch002 ◽

2019 ◽

pp. 19-39

Author(s):

Prasanna Lakshmi Kompalli

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Data Stream Mining ◽

Time Data ◽

Stream Mining ◽

New Challenges ◽

Mining Data Streams ◽

Different Sources

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.

Download Full-text

Ant Miner

International Journal of Artificial Intelligence and Machine Learning ◽

10.4018/ijaiml.2020010104 ◽

2020 ◽

Vol 10 (1) ◽

pp. 45-59

Author(s):

Bijaya Kumar Nanda ◽

Satchidananda Dehuri

Keyword(s):

Data Mining ◽

Large Data ◽

Classification Rule ◽

Classification Rules ◽

Rule Mining ◽

Ant Colonies ◽

Benchmark Datasets ◽

Objective Classification ◽

Single Objective ◽

Better Than

In data mining the task of extracting classification rules from large data is an important task and is gaining considerable attention. This article presents a novel ant miner for classification rule mining. The ant miner is inspired by researches on the behaviour of real ant colonies, simulated annealing, and some data mining concepts as well as principles. This paper presents a Pittsburgh style approach for single objective classification rule mining. The algorithm is tested on a few benchmark datasets drawn from UCI repository. The experimental outcomes confirm that ant miner-HPB (Hybrid Pittsburgh Style Classification) is significantly better than ant-miner-PB (Pittsburgh Style Classification).

Download Full-text

An Ensemble Classifier Algorithm for Mining Data Streams Based on Concept Drift

2017 10th International Symposium on Computational Intelligence and Design (ISCID) ◽

10.1109/iscid.2017.121 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yushui Geng ◽

Jianguo Zhang

Keyword(s):

Data Streams ◽

Concept Drift ◽

Ensemble Classifier ◽

Mining Data Streams

Download Full-text

Mining data streams with concept drifts using genetic algorithm

Artificial Intelligence Review ◽

10.1007/s10462-011-9209-y ◽

2011 ◽

Vol 36 (3) ◽

pp. 163-178 ◽

Cited By ~ 11

Author(s):

Periasamy Vivekanandan ◽

Raju Nedunchezhian

Keyword(s):

Genetic Algorithm ◽

Data Streams ◽

Mining Data Streams ◽

Concept Drifts

Download Full-text

Data Mining Models of High Dimensional Data Streams, and Contemporary Concept Drift Detection Methods: a Comprehensive Review

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.6.14959 ◽

2018 ◽

Vol 7 (3.6) ◽

pp. 148

Author(s):

M Sankara Prasanna Kumar ◽

A P. Siva Kumar ◽

K Prasanna

Keyword(s):

Data Streams ◽

Concept Drift ◽

High Dimensional Data ◽

Detection Methods ◽

High Dimensional ◽

Distributed Data ◽

Stable Period ◽

Multiple Data ◽

Multiple Data Streams ◽

Concept Drifts

Concept drift is defined as the distributed data across multiple data streams that change over the time. Concept drift is visible only when the type of collected data changes after some stable period. The emergence of concept drift in data streams leads to increase misclassification and performing degradation of data streams. In order to obtain accurate results, identification of such concept drifts must be visible. This paper focused on a review of the issues related to identifying the changes occurred in the various multivariate high dimensional data streams. The insight of the manuscript is probing the inbuilt difficulties of existing contemporary change-detection methods when they encounter during data dimensions scales.

Download Full-text

Batch Weighted Ensemble for Mining Data Streams with Concept Drift

Lecture Notes in Computer Science - Foundations of Intelligent Systems ◽

10.1007/978-3-642-21916-0_32 ◽

2011 ◽

pp. 290-299 ◽

Cited By ~ 5

Author(s):

Magdalena Deckert

Keyword(s):

Data Streams ◽

Concept Drift ◽

Mining Data Streams

Download Full-text

Researchon Classification Techniques in Data Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f1072.0486s419 ◽

2019 ◽

Vol 8 (6S4) ◽

pp. 357-361

Keyword(s):

Neural Network ◽

Data Mining ◽

Association Rule ◽

Group Membership ◽

Large Data ◽

Data Sources ◽

Classification Rules ◽

Class Label ◽

Different Types ◽

Data Extrapolation

Data Mining means a procedure to extracting the information out of large data. Data miningapproaches includes classification, association rule, clustering, etc. Data mining is applied in four stages such as data sources, data extrapolation / gathering, modeling and deploying modules. Classification is a method in data mining to predict the group membership of data instances. It’s an method useful in data mining with vast applications for classifying the different types of data used in almost every fields. Classification is giving a class label to in determine set of cases. In this survey, we would like discuss Bayesian classification, rules based classification, Decision trees &neural network.

Download Full-text

Study of The ID3 and C4.5 Learning Algorithms

Journal of Medical Informatics and Decision Making ◽

10.14302/issn.2641-5526.jmid-20-3302 ◽

2020 ◽

Vol 1 (2) ◽

pp. 29-43

Author(s):

Y. Fakir ◽

M. Azalmad ◽

R. Elaychi

Keyword(s):

Data Mining ◽

Decision Making ◽

Data Analysis ◽

Decision Tree ◽

Decision Trees ◽

Information Gain ◽

Learning Algorithms ◽

Large Data ◽

Classification Algorithms ◽

Important Data

Data Mining is a process of exploring against large data to find patterns in decision-making. One of the techniques in decision-making is classification. Data classification is a form of data analysis used to extract models describing important data classes. There are many classification algorithms. Each classifier encompasses some algorithms in order to classify object into predefined classes. Decision Tree is one such important technique, which builds a tree structure by incrementally breaking down the datasets in smaller subsets. Decision Trees can be implemented by using popular algorithms such as ID3, C4.5 and CART etc. The present study considers ID3 and C4.5 algorithms to build a decision tree by using the “entropy” and “information gain” measures that are the basics components behind the construction of a classifier model

Download Full-text

Ant Miner

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2020040104 ◽

2020 ◽

Vol 11 (2) ◽

pp. 47-64

Author(s):

Bijaya Kumar Nanda ◽

Satchidananda Dehuri

Keyword(s):

Data Mining ◽

Large Data ◽

Classification Rule ◽

Classification Rules ◽

Rule Mining ◽

Ant Colonies ◽

Benchmark Datasets ◽

Objective Classification ◽

Single Objective ◽

Better Than

Discovering classification rules from large data is an important task of data mining and is gaining considerable attention. This article presents a novel ant miner for classification rule mining. Our ant miner is inspired by research on the behavior of real ant colonies, simulated annealing, and some data mining concepts as well as principles. Here we present a Michigan style approach for single objective classification rule mining. The algorithm is tested on a few benchmark datasets drawn from UCI repository. Our experimental outcomes confirm that ant miner-HMC (Hybrid Michigan Style Classification) is significantly better than ant-miner-MC (Michigan Style Classification).

Download Full-text