Interactive Decision Tree Learning and Decision Rule Extraction Based on the ImbTreeEntropy and ImbTreeAUC Packages

This paper presents two new R packages ImbTreeEntropy and ImbTreeAUC for building decision trees, including their interactive construction and analysis, which is a highly regarded feature for field experts who want to be involved in the learning process. ImbTreeEntropy functionality includes the application of generalized entropy functions, such as Renyi, Tsallis, Sharma-Mittal, Sharma-Taneja and Kapur, to measure the impurity of a node. ImbTreeAUC provides non-standard measures to choose an optimal split point for an attribute (as well the optimal attribute for splitting) by employing local, semi-global and global AUC measures. The contribution of both packages is that thanks to interactive learning, the user is able to construct a new tree from scratch or, if required, the learning phase enables making a decision regarding the optimal split in ambiguous situations, taking into account each attribute and its cut-off. The main difference with existing solutions is that our packages provide mechanisms that allow for analyzing the trees’ structures (several trees simultaneously) that are built after growing and/or pruning. Both packages support cost-sensitive learning by defining a misclassification cost matrix, as well as weight-sensitive learning. Additionally, the tree structure of the model can be represented as a rule-based model, along with the various quality measures, such as support, confidence, lift, conviction, addedValue, cosine, Jaccard and Laplace.

Download Full-text

ImbTreeEntropy and ImbTreeAUC: Novel R Packages for Decision Tree Learning on the Imbalanced Datasets

Electronics ◽

10.3390/electronics10060657 ◽

2021 ◽

Vol 10 (6) ◽

pp. 657

Author(s):

Krzysztof Gajowniczek ◽

Tomasz Ząbkowski

Keyword(s):

Imbalanced Data ◽

Misclassification Cost ◽

Learning Time ◽

Misclassification Costs ◽

Split Point ◽

Speed Up ◽

R Packages ◽

Class Labels ◽

Entropy Functions ◽

Support Cost

This paper presents two R packages ImbTreeEntropy and ImbTreeAUC to handle imbalanced data problems. ImbTreeEntropy functionality includes application of a generalized entropy functions, such as Rényi, Tsallis, Sharma–Mittal, Sharma–Taneja and Kapur, to measure impurity of a node. ImbTreeAUC provides non-standard measures to choose an optimal split point for an attribute (as well the optimal attribute for splitting) by employing local, semi-global and global AUC (Area Under the ROC curve) measures. Both packages are applicable for binary and multiclass problems and they support cost-sensitive learning, by defining a misclassification cost matrix, and weighted-sensitive learning. The packages accept all types of attributes, including continuous, ordered and nominal, where the latter type is simplified for multiclass problems to reduce the computational overheads. Both applications enable optimization of the thresholds where posterior probabilities determine final class labels in a way that misclassification costs are minimized. Model overfitting can be managed either during the growing phase or at the end using post-pruning. The packages are mainly implemented in R, however some computationally demanding functions are written in plain C++. In order to speed up learning time, parallel processing is supported as well.

Download Full-text

An Automatic Question Generation System using Rule-Based Approach in Bloom’s Taxonomy

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666191113143335 ◽

2019 ◽

Vol 13 ◽

Author(s):

G Deena ◽

K Raja ◽

K Kannan

Keyword(s):

Language Processing ◽

Learning Process ◽

Question Generation ◽

Test Question ◽

Rule Based ◽

Part Of Speech ◽

Core Idea ◽

Rule Based Approach ◽

Teaching Learning ◽

Automatic Question Generation

: In this competing world, education has become part of everyday life. The process of imparting the knowledge to the learner through education is the core idea in the Teaching-Learning Process (TLP). An assessment is one way to identify the learner’s weak spot of the area under discussion. An assessment question has higher preferences in judging the learner's skill. In manual preparation, the questions are not assured in excellence and fairness to assess the learner’s cognitive skill. Question generation is the most important part of the teaching-learning process. It is clearly understood that generating the test question is the toughest part. Methods: Proposed an Automatic Question Generation (AQG) system which automatically generates the assessment questions dynamically from the input file. Objective: The Proposed system is to generate the test questions that are mapped with blooms taxonomy to determine the learner’s cognitive level. The cloze type questions are generated using the tag part-of-speech and random function. Rule-based approaches and Natural Language Processing (NLP) techniques are implemented to generate the procedural question of the lowest blooms cognitive levels. Analysis: The outputs are dynamic in nature to create a different set of questions at each execution. Here, input paragraph is selected from computer science domain and their output efficiency are measured using the precision and recall.

Download Full-text

Non-Intrusive Electric Load identification using Wavelet Transform

Ingeniería e Investigación ◽

10.15446/ing.investig.v38n2.70550 ◽

2018 ◽

Vol 38 (2) ◽

pp. 42-51 ◽

Cited By ~ 1

Author(s):

José Antonio Hoyo-Montaño ◽

Jesús Naim Leon-Ortega ◽

Guillermo Valencia-Palomo ◽

Rafael Armando Galaz-Bustamante ◽

Daniel Fernando Espejel-Blanco ◽

...

Keyword(s):

Wavelet Transform ◽

Decision Tree ◽

Weighted Average ◽

Public Access ◽

Raspberry Pi ◽

Discrete Wavelet ◽

Load Monitoring ◽

Split Point ◽

Access Database ◽

Parseval’S Theorem

This paper shows the development of a decision tree for the classification of loads in a non-intrusive load monitoring (NILM) system implemented in a simple board computer (Raspberry Pi 3). The decision tree uses the total energy value of the power signal of an equipment, which is generated using a discrete wavelet transform and Parseval’s theorem. The power consumption data of different types of equipment were obtained from a public access database for NILM applications. The best split point for the design of the decision tree was determined using the weighted average Gini index. The tree was validated using loads available in the same public access database.

Download Full-text

Genetic feature selection in a fuzzy rule-based classification system learning process for high-dimensional problems

Information Sciences ◽

10.1016/s0020-0255(01)00147-5 ◽

2001 ◽

Vol 136 (1-4) ◽

pp. 135-157 ◽

Cited By ~ 101

Author(s):

J Casillas ◽

O Cordón ◽

M.J Del Jesus ◽

F Herrera

Keyword(s):

Feature Selection ◽

Learning Process ◽

Classification System ◽

Fuzzy Rule ◽

High Dimensional ◽

Rule Based ◽

Genetic Feature ◽

Genetic Feature Selection

Download Full-text

Detection and Classification of Transmission Line Faults Using Empirical Mode Decomposition and Rule Based Decision Tree Based Algorithm

2018 IEEE 8th Power India International Conference (PIICON) ◽

10.1109/poweri.2018.8704372 ◽

2018 ◽

Cited By ~ 2

Author(s):

Balvinder Singh ◽

Om Prakash Mahela ◽

Tanuj Manglani

Keyword(s):

Decision Tree ◽

Transmission Line ◽

Empirical Mode Decomposition ◽

Rule Based ◽

Mode Decomposition

Download Full-text

Fuzzy Rule Based Quality Measures for Adaptive Multimodal Biometric Fusion at Operation Time

Proceedings of the International Conference on Fuzzy Computation Theory and Applications ◽

10.5220/0005126301460152 ◽

2014 ◽

Cited By ~ 2

Author(s):

Madeena Sultana ◽

Marina Gavrilova ◽

Svetlana Yanushkevich

Keyword(s):

Fuzzy Rule ◽

Operation Time ◽

Quality Measures ◽

Rule Based ◽

Biometric Fusion

Download Full-text

Constructing Cost-Sensitive Fuzzy-Rule-Based Systems for Pattern Classification Problems

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2007.p0546 ◽

2007 ◽

Vol 11 (6) ◽

pp. 546-553 ◽

Cited By ~ 4

Author(s):

Tomoharu Nakashima ◽

◽

Yasuyuki Yokota ◽

Hisao Ishibuchi ◽

Gerald Schaefer ◽

...

Keyword(s):

Pattern Classification ◽

A Priori ◽

Fuzzy Rule ◽

Fuzzy Classification ◽

Classification Error ◽

Rule Generation ◽

Classification Problems ◽

Misclassification Cost ◽

Rule Based ◽

Rule Based Systems

We evaluate the performance of cost-sensitive fuzzy-rule-based systems for pattern classification problems. We assume that a misclassification cost is given a priori for each training pattern. The task of classification thus becomes to minimize both classification error and misclassification cost. We examine the performance of two types of fuzzy classification based on fuzzy if-then rules generated from training patterns. The difference is whether or not they consider misclassification costs in rule generation. In our computational experiments, we use several specifications of misclassification cost to evaluate the performance of the two classifiers. Experimental results show that both classification error and misclassification cost are reduced by considering the misclassification cost in fuzzy rule generation.

Download Full-text