scholarly journals eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams

Author(s):  
Frederic Stahl ◽  
Mohamed Medhat Gaber ◽  
Manuel Martin Salvador
2013 ◽  
Vol 20 (05) ◽  
pp. 644-652
Author(s):  
ATTIYA KANWAL ◽  
SAHAR FAZAL ◽  
SOHAIL ASGHAR ◽  
Muhammad Naeem

Background: The pandemic of metabolic disorders is accelerating in the urbanized world posing huge burden to healthand economy. The key pioneer to most of the metabolic disorders is Diabetes Mellitus. A newly discovered form of diabetes is MaturityOnset Diabetes of the Young (MODY). MODY is a monogenic form of diabetes. It is inherited as autosomal dominant disorder. Till to date11 different MODY genes have been reported. Objective: This study aims to discover subgroups from the biological text documentsrelated to these genes in public domain database. Data Source: The data set was obtained from PubMed. Period: September-December,2011. Materials and Methodology: APRIORI-SD subgroup discovery algorithm is used for the task of discovering subgroups. A wellknown association rule learning algorithm APRIORI is first modified into classification rule learning algorithm APRIORI-C. APRIORI-Calgorithm generates the rule from the discretized dataset with the minimum support set to 0.42% with no confidence threshold. Total 580rules are generated at the given support. APRIOIR-C is further modified by making adaptation into APRIORI-SD. Results: Experimentalresults demonstrate that APRIORI discovers the substantially smaller rule sets; each rule has higher support and significance. The rulesthat are obtained by APRIORI-C are ordered by weighted relative accuracy. Conclusion: Only first 66 rules are ordered as they cover therelation between all the 11 MODY genes with each other. These 66 rules are further organized into 11 different subgroups. The evaluationof obtained results from literature shows that APRIORI-SD is a competitive subgroup discovery algorithm. All the association amonggenes proved to be true.


Author(s):  
KRZYSZTOF TRAWIŃSKI ◽  
OSCAR CORDÓN ◽  
ARNAUD QUIRIN

In this work, we conduct a study considering a fuzzy rule-based multiclassification system design framework based on Fuzzy Unordered Rule Induction Algorithm (FURIA). This advanced method serves as the fuzzy classification rule learning algorithm to derive the component classifiers considering bagging and feature selection. We develop an exhaustive study on the potential of bagging and feature selection to design a final FURIA-based fuzzy multiclassifier dealing with high dimensional data. Several parameter settings for the global approach are tested when applied to twenty one popular UCI datasets. The results obtained show that FURIA-based fuzzy multiclassifiers outperform the single FURIA classifier and are competitive with C4.5 multiclassifiers and random forests.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


Author(s):  
Ege Beyazit ◽  
Jeevithan Alagurajah ◽  
Xindong Wu

We study the problem of online learning with varying feature spaces. The problem is challenging because, unlike traditional online learning problems, varying feature spaces can introduce new features or stop having some features without following a pattern. Other existing methods such as online streaming feature selection (Wu et al. 2013), online learning from trapezoidal data streams (Zhang et al. 2016), and learning with feature evolvable streams (Hou, Zhang, and Zhou 2017) are not capable to learn from arbitrarily varying feature spaces because they make assumptions about the feature space dynamics. In this paper, we propose a novel online learning algorithm OLVF to learn from data with arbitrarily varying feature spaces. The OLVF algorithm learns to classify the feature spaces and the instances from feature spaces simultaneously. To classify an instance, the algorithm dynamically projects the instance classifier and the training instance onto their shared feature subspace. The feature space classifier predicts the projection confidences for a given feature space. The instance classifier will be updated by following the empirical risk minimization principle and the strength of the constraints will be scaled by the projection confidences. Afterwards, a feature sparsity method is applied to reduce the model complexity. Experiments on 10 datasets with varying feature spaces have been conducted to demonstrate the performance of the proposed OLVF algorithm. Moreover, experiments with trapezoidal data streams on the same datasets have been conducted to show that OLVF performs better than the state-of-the-art learning algorithm (Zhang et al. 2016).


2014 ◽  
Vol 80 (1) ◽  
pp. 101-117 ◽  
Author(s):  
David García ◽  
Antonio González ◽  
Raúl Pérez

2016 ◽  
Vol 2016 ◽  
pp. 1-13 ◽  
Author(s):  
B. Mokhtar ◽  
M. Azab ◽  
N. Shehata ◽  
M. Rizk

This paper presents a comprehensive water quality monitoring system that employs a smart network management, nano-enriched sensing framework, and intelligent and efficient data analysis and forwarding protocols for smart and system-aware decision making. The presented system comprises two main subsystems, a data sensing and forwarding subsystem (DSFS), and Operation Management Subsystem (OMS). The OMS operates based on real-time learned patterns and rules of system operations projected from the DSFS to manage the entire network of sensors. The main tasks of OMS are to enable real-time data visualization, managed system control, and secure system operation. The DSFS employs a Hybrid Intelligence (HI) scheme which is proposed through integrating an association rule learning algorithm withfuzzylogic and weighted decision trees. The DSFS operation is based on profiling and registering raw data readings, generated from a set of optical nanosensors, as profiles of attribute-value pairs. As a case study, we evaluate our implemented test bed via simulation scenarios in a water quality monitoring framework. The monitoring processes are simulated based on measuring the percentage of dissolved oxygen and potential hydrogen (PH) in fresh water. Simulation results show the efficiency of the proposed HI-based methodology at learning different water quality classes.


Sign in / Sign up

Export Citation Format

Share Document