MINAS: multiclass learning algorithm for novelty detection in data streams

Elaine Ribeiro de Faria; André Carlos Ponce de Leon Ferreira Carvalho; João Gama

doi:10.1007/s10618-015-0433-y

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

Entropy ◽

10.3390/e23070859 ◽

2021 ◽

Vol 23 (7) ◽

pp. 859

Author(s):

Abdulaziz O. AlQabbany ◽

Aqil M. Azmi

Keyword(s):

Big Data ◽

Random Forest ◽

Real Time ◽

Data Streams ◽

Learning Algorithm ◽

Concept Drift ◽

The United States ◽

Careful Consideration ◽

Data Sets ◽

Stream Data

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

Download Full-text

Online Clustering for Novelty Detection and Concept Drift in Data Streams

Progress in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-30244-3_37 ◽

2019 ◽

pp. 448-459

Author(s):

Kemilly Dearo Garcia ◽

Mannes Poel ◽

Joost N. Kok ◽

André C. P. L. F. de Carvalho

Keyword(s):

Data Streams ◽

Concept Drift ◽

Novelty Detection ◽

Online Clustering

Download Full-text

eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams

Research and Development in Intelligent Systems XXIX ◽

10.1007/978-1-4471-4739-8_5 ◽

2012 ◽

pp. 65-78 ◽

Cited By ~ 3

Author(s):

Frederic Stahl ◽

Mohamed Medhat Gaber ◽

Manuel Martin Salvador

Keyword(s):

Data Streams ◽

Learning Algorithm ◽

Rule Learning ◽

Classification Rule ◽

Adaptive Classification

Download Full-text

Dynamic weighted selective ensemble learning algorithm for imbalanced data streams

The Journal of Supercomputing ◽

10.1007/s11227-021-04084-w ◽

2021 ◽

Author(s):

Zhang Yan ◽

Du Hongle ◽

Ke Gang ◽

Zhang Lin ◽

Yeh-Cheng Chen

Keyword(s):

Ensemble Learning ◽

Data Streams ◽

Learning Algorithm ◽

Imbalanced Data ◽

Selective Ensemble ◽

Ensemble Learning Algorithm

Download Full-text

Online Learning from Data Streams with Varying Feature Spaces

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013232 ◽

2019 ◽

Vol 33 ◽

pp. 3232-3239 ◽

Cited By ~ 1

Author(s):

Ege Beyazit ◽

Jeevithan Alagurajah ◽

Xindong Wu

Keyword(s):

Online Learning ◽

Data Streams ◽

Learning Algorithm ◽

Feature Space ◽

Model Complexity ◽

Risk Minimization ◽

Minimization Principle ◽

Empirical Risk ◽

Feature Spaces ◽

Online Learning Algorithm

We study the problem of online learning with varying feature spaces. The problem is challenging because, unlike traditional online learning problems, varying feature spaces can introduce new features or stop having some features without following a pattern. Other existing methods such as online streaming feature selection (Wu et al. 2013), online learning from trapezoidal data streams (Zhang et al. 2016), and learning with feature evolvable streams (Hou, Zhang, and Zhou 2017) are not capable to learn from arbitrarily varying feature spaces because they make assumptions about the feature space dynamics. In this paper, we propose a novel online learning algorithm OLVF to learn from data with arbitrarily varying feature spaces. The OLVF algorithm learns to classify the feature spaces and the instances from feature spaces simultaneously. To classify an instance, the algorithm dynamically projects the instance classifier and the training instance onto their shared feature subspace. The feature space classifier predicts the projection confidences for a given feature space. The instance classifier will be updated by following the empirical risk minimization principle and the strength of the constraints will be scaled by the projection confidences. Afterwards, a feature sparsity method is applied to reduce the model complexity. Experiments on 10 datasets with varying feature spaces have been conducted to demonstrate the performance of the proposed OLVF algorithm. Moreover, experiments with trapezoidal data streams on the same datasets have been conducted to show that OLVF performs better than the state-of-the-art learning algorithm (Zhang et al. 2016).

Download Full-text

Novelty Detection and Online Learning for Chunk Data Streams

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2020.2965531 ◽

2020 ◽

pp. 1-1

Author(s):

Yi Wang ◽

Yi Ding ◽

Xiangjian He ◽

Xin Fan ◽

Chi Lin ◽

...

Keyword(s):

Online Learning ◽

Data Streams ◽

Novelty Detection

Download Full-text

Cognitively Motivated Novelty Detection in Video Data Streams

Multimedia Data Mining and Knowledge Discovery ◽

10.1007/978-1-84628-799-2_11 ◽

2007 ◽

pp. 209-233

Author(s):

James M. Kang ◽

Muhammad Aurangzeb Ahmad ◽

Ankur Teredesai ◽

Roger Gaborski

Keyword(s):

Data Streams ◽

Novelty Detection ◽

Video Data

Download Full-text

Evaluation of Multiclass Novelty Detection Algorithms for Data Streams

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2015.2441713 ◽

2015 ◽

Vol 27 (11) ◽

pp. 2961-2973 ◽

Cited By ~ 11

Author(s):

Elaine Ribeiro de Faria ◽

Isabel Ribeiro Goncalves ◽

Jo ao Gama ◽

Andre Carlos Ponce de Leon Ferreira Carvalho

Keyword(s):

Data Streams ◽

Novelty Detection ◽

Detection Algorithms

Download Full-text

An Effective Neural Learning Algorithm for Extracting Cross-Correlation Feature Between Two High-Dimensional Data Streams

Neural Processing Letters ◽

10.1007/s11063-014-9367-4 ◽

2014 ◽

Vol 42 (2) ◽

pp. 459-477 ◽

Cited By ~ 3

Author(s):

Xiang-yu Kong ◽

Hong-guang Ma ◽

Qiu-sheng An ◽

Qi Zhang

Keyword(s):

Data Streams ◽

Cross Correlation ◽

Learning Algorithm ◽

High Dimensional Data ◽

High Dimensional ◽

Neural Learning

Download Full-text

An adaptive algorithm for anomaly and novelty detection in evolving data streams

Data Mining and Knowledge Discovery ◽

10.1007/s10618-018-0571-0 ◽

2018 ◽

Vol 32 (6) ◽

pp. 1597-1633 ◽

Cited By ~ 6

Author(s):

Mohamed-Rafik Bouguelia ◽

Slawomir Nowaczyk ◽

Amir H. Payberah

Keyword(s):

Data Streams ◽

Adaptive Algorithm ◽

Novelty Detection ◽

Evolving Data

Download Full-text