scholarly journals Analysis of Decision Tree Induction Algorithms

2019 ◽  
Vol 8 (11) ◽  
pp. e298111473
Author(s):  
Hugo Kenji Rodrigues Okada ◽  
Andre Ricardo Nascimento das Neves ◽  
Ricardo Shitsuka

Decision trees are data structures or computational methods that enable nonparametric supervised machine learning and are used in classification and regression tasks. The aim of this paper is to present a comparison between the decision tree induction algorithms C4.5 and CART. A quantitative study is performed in which the two methods are compared by analyzing the following aspects: operation and complexity. The experiments presented practically equal hit percentages in the execution time for tree induction, however, the CART algorithm was approximately 46.24% slower than C4.5 and was considered to be more effective.

2021 ◽  
Vol 11 (15) ◽  
pp. 6728
Author(s):  
Muhammad Asfand Hafeez ◽  
Muhammad Rashid ◽  
Hassan Tariq ◽  
Zain Ul Abideen ◽  
Saud S. Alotaibi ◽  
...  

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1578
Author(s):  
Daniel Szostak ◽  
Adam Włodarczyk ◽  
Krzysztof Walkowiak

Rapid growth of network traffic causes the need for the development of new network technologies. Artificial intelligence provides suitable tools to improve currently used network optimization methods. In this paper, we propose a procedure for network traffic prediction. Based on optical networks’ (and other network technologies) characteristics, we focus on the prediction of fixed bitrate levels called traffic levels. We develop and evaluate two approaches based on different supervised machine learning (ML) methods—classification and regression. We examine four different ML models with various selected features. The tested datasets are based on real traffic patterns provided by the Seattle Internet Exchange Point (SIX). Obtained results are analyzed using a new quality metric, which allows researchers to find the best forecasting algorithm in terms of network resources usage and operational costs. Our research shows that regression provides better results than classification in case of all analyzed datasets. Additionally, the final choice of the most appropriate ML algorithm and model should depend on the network operator expectations.


2021 ◽  
Vol 54 (1) ◽  
pp. 1-38
Author(s):  
Víctor Adrián Sosa Hernández ◽  
Raúl Monroy ◽  
Miguel Angel Medina-Pérez ◽  
Octavio Loyola-González ◽  
Francisco Herrera

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.


2021 ◽  
Vol 10 (1) ◽  
pp. 99
Author(s):  
Sajad Yousefi

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.


Author(s):  
Marek Kretowski ◽  
Marcin Czajkowski

Decision trees represent one of the main predictive techniques in knowledge discovery. This chapter describes evolutionary induced trees, which are emerging alternatives to the greedy top-down solutions. Most typical tree-based system searches only for locally optimal decisions at each node and do not guarantee the optimal solution. Application of evolutionary algorithms to the problem of decision tree induction allows searching for the structure of the tree, tests in internal nodes and regression functions in the leaves (for model trees) at the same time. As a result, such globally induced decision tree is able to avoid local optima and usually leads to better prediction than the greedy counterparts.


2018 ◽  
Vol 5 (suppl_1) ◽  
pp. S618-S618
Author(s):  
Philip Zachariah ◽  
Elioth Mirsha Sanabria Buenaventura ◽  
Jianfang Liu ◽  
Bevin Cohen ◽  
David Yao ◽  
...  

2019 ◽  
Vol 1 (1) ◽  
pp. 384-399 ◽  
Author(s):  
Thais de Toledo ◽  
Nunzio Torrisi

The Distributed Network Protocol (DNP3) is predominately used by the electric utility industry and, consequently, in smart grids. The Peekaboo attack was created to compromise DNP3 traffic, in which a man-in-the-middle on a communication link can capture and drop selected encrypted DNP3 messages by using support vector machine learning algorithms. The communication networks of smart grids are a important part of their infrastructure, so it is of critical importance to keep this communication secure and reliable. The main contribution of this paper is to compare the use of machine learning techniques to classify messages of the same protocol exchanged in encrypted tunnels. The study considers four simulated cases of encrypted DNP3 traffic scenarios and four different supervised machine learning algorithms: Decision tree, nearest-neighbor, support vector machine, and naive Bayes. The results obtained show that it is possible to extend a Peekaboo attack over multiple substations, using a decision tree learning algorithm, and to gather significant information from a system that communicates using encrypted DNP3 traffic.


Author(s):  
M. Carr ◽  
V. Ravi ◽  
G. Sridharan Reddy ◽  
D. Veranna

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.


2002 ◽  
Vol 13 (03) ◽  
pp. 445-458 ◽  
Author(s):  
HANS ZANTEMA ◽  
HANS L. BODLAENDER

Decision tables provide a natural framework for knowledge acquisition and representation in the area of knowledge based information systems. Decision trees provide a standard method for inductive inference in the area of machine learning. In this paper we show how decision tables can be considered as ordered decision trees: decision trees satisfying an ordering restriction on the nodes. Every decision tree can be represented by an equivalent ordered decision tree, but we show that doing so may exponentially blow up sizes, even if the choice of the order is left free. Our main result states that finding an ordered decision tree of minimal size that represents the same function as a given ordered decision tree is an NP-hard problem; in earlier work we obtained a similar result for unordered decision trees.


2009 ◽  
Vol 18 (04) ◽  
pp. 613-620 ◽  
Author(s):  
ADAM H. PETERSON ◽  
TONY R. MARTINEZ

This research presents a new learning model, the Parallel Decision DAG (PDDAG), and shows how to use it to represent an ensemble of decision trees while using significantly less storage. Ensembles such as Bagging and Boosting have a high probability of encoding redundant data structures, and PDDAGs provide a way to remove this redundancy in decision tree based ensembles. When trained by encoding an ensemble, the new model behaves similar to the original ensemble, and can be made to perform identically to it. The reduced storage requirements allow an ensemble approach to be used in cases where storage requirements would normally be exceeded, and the smaller model can potentially execute faster by reducing redundant computation.


Sign in / Sign up

Export Citation Format

Share Document