A case study of optimal decision tree construction for RFKON database

Author(s):  
Sinem Bozkurt Keser ◽  
Ugur Yayan
2021 ◽  
Vol 20 (Number 2) ◽  
pp. 249-276
Author(s):  
Sunil Kumar ◽  
Saroj Ratnoo ◽  
Jyoti Vashishtha

Decision tree models have earned a special status in predictive modeling since these are considered comprehensible for human analysis and insight. Classification and Regression Tree (CART) algorithm is one of the renowned decision tree induction algorithms to address the classification as well as regression problems. Finding optimal values for the hyper parameters of a decision tree construction algorithm is a challenging issue. While making an effective decision tree classifier with high accuracy and comprehensibility, we need to address the question of setting optimal values for its hyper parameters like the maximum size of the tree, the minimum number of instances required in a node for inducing a split, node splitting criterion and the amount of pruning. The hyper parameter setting influences the performance of the decision tree model. As researchers, we know that no single setting of hyper parameters works equally well for different datasets. A particular setting that gives an optimal decision tree for one dataset may produce a sub-optimal decision tree model for another dataset. In this paper, we present a hyper heuristic approach for tuning the hyper parameters of Recursive and Partition Trees (rpart), which is a typical implementation of CART in statistical and data analytics package R. We employ an evolutionary algorithm as hyper heuristic for tuning the hyper parameters of the decision tree classifier. The approach is named as Hyper heuristic Evolutionary Approach with Recursive and Partition Trees (HEARpart). The proposed approach is validated on 30 datasets. It is statistically proved that HEARpart performs significantly better than WEKA’s J48 algorithm in terms of error rate, F-measure, and tree size. Further, the suggested hyper heuristic algorithm constructs significantly comprehensible models as compared to WEKA’s J48, CART and other similar decision tree construction strategies. The results show that the accuracy achieved by the hyper heuristic approach is slightly less as compared to the other comparative approaches.


Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1682
Author(s):  
Wojciech Wieczorek ◽  
Jan Kozak ◽  
Łukasz Strąk ◽  
Arkadiusz Nowakowski

A new two-stage method for the construction of a decision tree is developed. The first stage is based on the definition of a minimum query set, which is the smallest set of attribute-value pairs for which any two objects can be distinguished. To obtain this set, an appropriate linear programming model is proposed. The queries from this set are building blocks of the second stage in which we try to find an optimal decision tree using a genetic algorithm. In a series of experiments, we show that for some databases, our approach should be considered as an alternative method to classical ones (CART, C4.5) and other heuristic approaches in terms of classification quality.


2020 ◽  
Vol 7 (2) ◽  
pp. 200
Author(s):  
Puji Santoso ◽  
Rudy Setiawan

One of the tasks in the field of marketing finance is to analyze customer data to find out which customers have the potential to do credit again. The method used to analyze customer data is by classifying all customers who have completed their credit installments into marketing targets, so this method causes high operational marketing costs. Therefore this research was conducted to help solve the above problems by designing a data mining application that serves to predict the criteria of credit customers with the potential to lend (credit) to Mega Auto Finance. The Mega Auto finance Fund Section located in Kotim Regency is a place chosen by researchers as a case study, assuming the Mega Auto finance Fund Section has experienced the same problems as described above. Data mining techniques that are applied to the application built is a classification while the classification method used is the Decision Tree (decision tree). While the algorithm used as a decision tree forming algorithm is the C4.5 Algorithm. The data processed in this study is the installment data of Mega Auto finance loan customers in July 2018 in Microsoft Excel format. The results of this study are an application that can facilitate the Mega Auto finance Funds Section in obtaining credit marketing targets in the future


2021 ◽  
Vol 7 (5) ◽  
pp. 3076-3086
Author(s):  
Zhang Shuili ◽  
Zhao Yi ◽  
Zheng Kexin ◽  
Zhang Jun ◽  
Zheng Fuchun

Objectives: In view of the characteristics of online teaching during the coronavirus pandemic and the importance of practical teaching in training students’ skills in the process of graduate education, this paper proposes an online scene teaching mode that takes projects as the carrier and integrates with deep learning. In order to meet the demand for information and communication engineering professionals in the big data context, the whole teaching process is divided into four stages: Topic selection, Teaching project setting, online teaching interaction and teaching evaluation. In the teaching process of Python Data Analysis Foundations, the project “establishment process of tobacco picking decision tree based on information gain” is taken as the teaching case. Prior knowledge and references are pushed through the cloud platform before class, and The scene of tobacco picking affected by the weather is set in the online classroom to guide students to seek solutions to problems, and the results are presented with graphics to assist students to summarize, and then reset the scene to promote knowledge transfer, so as to integrate deep learning into the teaching process, and modify the corresponding stages according to the teaching evaluation results. The content of the scene is gradually increased from easy to difficult, from simple to complex, and from least to most, gradually increasing the difficulty, which enhances students’ learning interest and sense of achievement. Meanwhile, students’ initiative to participate in curriculum research further strengthens the effectiveness of the course in serving scientific research, which has a certain value of popularization and application.


Sign in / Sign up

Export Citation Format

Share Document