A case study of optimal decision tree construction for RFKON database

2021 ◽

Vol 20 (Number 2) ◽

pp. 249-276

Author(s):

Sunil Kumar ◽

Saroj Ratnoo ◽

Jyoti Vashishtha

Keyword(s):

Decision Tree ◽

Heuristic Approach ◽

Decision Tree Model ◽

Evolutionary Approach ◽

Optimal Decision ◽

Decision Tree Classifier ◽

Tree Model ◽

Tree Construction ◽

Tree Classifier ◽

Optimal Values

Decision tree models have earned a special status in predictive modeling since these are considered comprehensible for human analysis and insight. Classification and Regression Tree (CART) algorithm is one of the renowned decision tree induction algorithms to address the classification as well as regression problems. Finding optimal values for the hyper parameters of a decision tree construction algorithm is a challenging issue. While making an effective decision tree classifier with high accuracy and comprehensibility, we need to address the question of setting optimal values for its hyper parameters like the maximum size of the tree, the minimum number of instances required in a node for inducing a split, node splitting criterion and the amount of pruning. The hyper parameter setting influences the performance of the decision tree model. As researchers, we know that no single setting of hyper parameters works equally well for different datasets. A particular setting that gives an optimal decision tree for one dataset may produce a sub-optimal decision tree model for another dataset. In this paper, we present a hyper heuristic approach for tuning the hyper parameters of Recursive and Partition Trees (rpart), which is a typical implementation of CART in statistical and data analytics package R. We employ an evolutionary algorithm as hyper heuristic for tuning the hyper parameters of the decision tree classifier. The approach is named as Hyper heuristic Evolutionary Approach with Recursive and Partition Trees (HEARpart). The proposed approach is validated on 30 datasets. It is statistically proved that HEARpart performs significantly better than WEKA’s J48 algorithm in terms of error rate, F-measure, and tree size. Further, the suggested hyper heuristic algorithm constructs significantly comprehensible models as compared to WEKA’s J48, CART and other similar decision tree construction strategies. The results show that the accuracy achieved by the hyper heuristic approach is slightly less as compared to the other comparative approaches.

Download Full-text

Noise Reduction Approach for Decision Tree Construction: A Case Study of Knowledge Discovery on Climate and Air Pollution

2007 IEEE Symposium on Computational Intelligence and Data Mining ◽

10.1109/cidm.2007.368944 ◽

2007 ◽

Cited By ~ 4

Author(s):

Kyoko Fukuda

Keyword(s):

Air Pollution ◽

Decision Tree ◽

Noise Reduction ◽

Knowledge Discovery ◽

Tree Construction ◽

Reduction Approach

Download Full-text

Minimum Query Set for Decision Tree Construction

Entropy ◽

10.3390/e23121682 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1682

Author(s):

Wojciech Wieczorek ◽

Jan Kozak ◽

Łukasz Strąk ◽

Arkadiusz Nowakowski

Keyword(s):

Genetic Algorithm ◽

Decision Tree ◽

Programming Model ◽

Building Blocks ◽

Optimal Decision ◽

Second Stage ◽

Tree Construction ◽

Series Of Experiments ◽

Definition Of ◽

Classification Quality

A new two-stage method for the construction of a decision tree is developed. The first stage is based on the definition of a minimum query set, which is the smallest set of attribute-value pairs for which any two objects can be distinguished. To obtain this set, an appropriate linear programming model is proposed. The queries from this set are building blocks of the second stage in which we try to find an optimal decision tree using a genetic algorithm. In a series of experiments, we show that for some databases, our approach should be considered as an alternative method to classical ones (CART, C4.5) and other heuristic approaches in terms of classification quality.

Download Full-text

Penerapan Metode Klasifikasi Decision Tree dan Algoritma C4.5 dalam Memprediksi Kriteria Nasabah Kredit Mega Auto Finance

JURIKOM (Jurnal Riset Komputer) ◽

10.30865/jurikom.v7i2.1762 ◽

2020 ◽

Vol 7 (2) ◽

pp. 200

Author(s):

Puji Santoso ◽

Rudy Setiawan

Keyword(s):

Data Mining ◽

Decision Tree ◽

Microsoft Excel ◽

Customer Data ◽

Data Mining Techniques ◽

C4.5 Algorithm ◽

Marketing Costs ◽

Excel Format ◽

Data Mining Application

One of the tasks in the field of marketing finance is to analyze customer data to find out which customers have the potential to do credit again. The method used to analyze customer data is by classifying all customers who have completed their credit installments into marketing targets, so this method causes high operational marketing costs. Therefore this research was conducted to help solve the above problems by designing a data mining application that serves to predict the criteria of credit customers with the potential to lend (credit) to Mega Auto Finance. The Mega Auto finance Fund Section located in Kotim Regency is a place chosen by researchers as a case study, assuming the Mega Auto finance Fund Section has experienced the same problems as described above. Data mining techniques that are applied to the application built is a classification while the classification method used is the Decision Tree (decision tree). While the algorithm used as a decision tree forming algorithm is the C4.5 Algorithm. The data processed in this study is the installment data of Mega Auto finance loan customers in July 2018 in Microsoft Excel format. The results of this study are an application that can facilitate the Mega Auto finance Funds Section in obtaining credit marketing targets in the future

Download Full-text

ASF/DT, adaptive step forward decision tree construction

2012 International Conference on Wavelet Analysis and Pattern Recognition ◽

10.1109/icwapr.2012.6294764 ◽

2012 ◽

Author(s):

Tai-Zhe Tan ◽

Ying-Yi Liang

Keyword(s):

Decision Tree ◽

Tree Construction ◽

Adaptive Step

Download Full-text

Efficient Decision Tree Construction for Classifying Numerical Data

2009 International Conference on Advances in Recent Technologies in Communication and Computing ◽

10.1109/artcom.2009.172 ◽

2009 ◽

Cited By ~ 1

Author(s):

Sushma Nandagaonkar ◽

Vahida Z. Attar ◽

Pradip K. Sinha

Keyword(s):

Decision Tree ◽

Numerical Data ◽

Tree Construction

Download Full-text

An Efficient Decision Tree Construction for Large Datasets

2007 Innovations in Information Technologies (IIT) ◽

10.1109/iit.2007.4430464 ◽

2007 ◽

Author(s):

Uyen Nguyen Thi Van ◽

Tae Choong Chung

Keyword(s):

Decision Tree ◽

Large Datasets ◽

Tree Construction

Download Full-text

Research on Online Scene Teaching Mode of Tobacco Picking Decision Tree Construction Process Integrating Deep Learning

Tobacco Regulatory Science ◽

10.18001/trs.7.5.1.78 ◽

2021 ◽

Vol 7 (5) ◽

pp. 3076-3086

Author(s):

Zhang Shuili ◽

Zhao Yi ◽

Zheng Kexin ◽

Zhang Jun ◽

Zheng Fuchun

Keyword(s):

Deep Learning ◽

Decision Tree ◽

Online Teaching ◽

Information Gain ◽

Teaching Evaluation ◽

Teaching Process ◽

Teaching Mode ◽

Teaching Interaction ◽

Tree Construction ◽

Information And Communication

Objectives: In view of the characteristics of online teaching during the coronavirus pandemic and the importance of practical teaching in training students’ skills in the process of graduate education, this paper proposes an online scene teaching mode that takes projects as the carrier and integrates with deep learning. In order to meet the demand for information and communication engineering professionals in the big data context, the whole teaching process is divided into four stages: Topic selection, Teaching project setting, online teaching interaction and teaching evaluation. In the teaching process of Python Data Analysis Foundations, the project “establishment process of tobacco picking decision tree based on information gain” is taken as the teaching case. Prior knowledge and references are pushed through the cloud platform before class, and The scene of tobacco picking affected by the weather is set in the online classroom to guide students to seek solutions to problems, and the results are presented with graphics to assist students to summarize, and then reset the scene to promote knowledge transfer, so as to integrate deep learning into the teaching process, and modify the corresponding stages according to the teaching evaluation results. The content of the scene is gradually increased from easy to difficult, from simple to complex, and from least to most, gradually increasing the difficulty, which enhances students’ learning interest and sense of achievement. Meanwhile, students’ initiative to participate in curriculum research further strengthens the effectiveness of the course in serving scientific research, which has a certain value of popularization and application.

Download Full-text