decision tree induction Latest Research Papers

AbstractUnivariate decision tree induction methods for multiclass classification problems such as CART, C4.5 and ID3 continue to be very popular in the context of machine learning due to their major benefit of being easy to interpret. However, as these trees only consider a single attribute per node, they often get quite large which lowers their explanatory value. Oblique decision tree building algorithms, which divide the feature space by multidimensional hyperplanes, often produce much smaller trees but the individual splits are hard to interpret. Moreover, the effort of finding optimal oblique splits is very high such that heuristics have to be applied to determine local optimal solutions. In this work, we introduce an effective branch and bound procedure to determine global optimal bivariate oblique splits for concave impurity measures. Decision trees based on these bivariate oblique splits remain fairly interpretable due to the restriction to two attributes per split. The resulting trees are significantly smaller and more accurate than their univariate counterparts due to their ability of adapting better to the underlying data and capturing interactions of attribute pairs. Moreover, our evaluation shows that our algorithm even outperforms algorithms based on heuristically obtained multivariate oblique splits despite the fact that we are focusing on two attributes only.

Download Full-text

A Practical Tutorial for Decision Tree Induction

ACM Computing Surveys ◽

10.1145/3429739 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-38

Author(s):

Víctor Adrián Sosa Hernández ◽

Raúl Monroy ◽

Miguel Angel Medina-Pérez ◽

Octavio Loyola-González ◽

Francisco Herrera

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Machine Learning Techniques ◽

Evaluation Measures ◽

Decision Tree Induction ◽

Learning Techniques ◽

Tree Models ◽

Evaluation Measure ◽

Main Components ◽

Support Decision Making

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.

Download Full-text

A Smartphone-Based Decision Support Tool for Predicting Patients at Risk of Chemotherapy-Induced Nausea and Vomiting: App Development using Decision Tree Induction (Preprint)

JMIR mhealth and uhealth ◽

10.2196/27024 ◽

2021 ◽

Author(s):

Abu Saleh Mohammad Mosa ◽

Md Kamruz Zaman Rana ◽

Humayera Islam ◽

AKM Mosharraf Hossain ◽

Illhoi Yoo

Keyword(s):

At Risk ◽

Decision Support ◽

Decision Tree ◽

Decision Support Tool ◽

Nausea And Vomiting ◽

Support Tool ◽

Decision Tree Induction ◽

App Development ◽

Patients At Risk

Download Full-text

Investigation of KNN and Decision Tree Induction Modelin Predicting Customer Buying Pattern

10.4108/eai.7-12-2021.2314593 ◽

2021 ◽

Author(s):

V. Umarani

Keyword(s):

Decision Tree ◽

Decision Tree Induction

Download Full-text

Prodtu: A Novel Probabilistic Approach to Classify Uncertain Data Using Decision Tree Induction

SSRN Electronic Journal ◽

10.2139/ssrn.3911650 ◽

2021 ◽

Author(s):

Swapnil Andhariya ◽

Khushali Mistry ◽

Sahista Machchhar ◽

Dhruv Dave

Keyword(s):

Decision Tree ◽

Uncertain Data ◽

Probabilistic Approach ◽

Decision Tree Induction

Download Full-text

Local dampening

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436912 ◽

2020 ◽

Vol 14 (4) ◽

pp. 521-533

Author(s):

Victor A. E. Farias ◽

Felipe T. Brito ◽

Cheryl Flynn ◽

Javam C. Machado ◽

Subhabrata Majumdar ◽

...

Keyword(s):

Decision Tree ◽

Differential Privacy ◽

Local Sensitivity ◽

Global Sensitivity ◽

Decision Tree Induction ◽

Id3 Algorithm ◽

Influential Node ◽

Noisy Output ◽

Privacy Budget ◽

The Relationship

Differential privacy is the state-of-the-art formal definition for data release under strong privacy guarantees. A variety of mechanisms have been proposed in the literature for releasing the noisy output of numeric queries (e.g., using the Laplace mechanism), based on the notions of global sensitivity and local sensitivity. However, although there has been some work on generic mechanisms for releasing the output of non-numeric queries using global sensitivity (e.g., the Exponential mechanism), the literature lacks generic mechanisms for releasing the output of non-numeric queries using local sensitivity to reduce the noise in the query output. In this work, we remedy this shortcoming and present the local dampening mechanism. We adapt the notion of local sensitivity for the non-numeric setting and leverage it to design a generic non-numeric mechanism. We illustrate the effectiveness of the local dampening mechanism by applying it to two diverse problems: (i) Influential node analysis. Given an influence metric, we release the top-k most influential nodes while preserving the privacy of the relationship between nodes in the network; (ii) Decision tree induction. We provide a private adaptation to the ID3 algorithm to build decision trees from a given tabular dataset. Experimental results show that we could reduce the use of privacy budget by 3 to 4 orders of magnitude for Influential node analysis and increase accuracy up to 12% for Decision tree induction when compared to global sensitivity based approaches.

Download Full-text

Dynamic programming based fuzzy partition in fuzzy decision tree induction

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-191497 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6757-6772

Author(s):

Yashuang Mu ◽

Lidong Wang ◽

Xiaodong Liu

Keyword(s):

Dynamic Programming ◽

Decision Tree ◽

Decision Trees ◽

Dynamic Programming Algorithm ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Fuzzy Partition ◽

Decision Tree Induction ◽

Fuzzy Partitions ◽

Fuzzy Decision Trees

Fuzzy decision trees are one of the most popular extensions of decision trees for symbolic knowledge acquisition by fuzzy representation. Among the majority of fuzzy decision trees learning methods, the number of fuzzy partitions is given in advance, that is, there are the same amount of fuzzy items utilized in each condition attribute. In this study, a dynamic programming-based partition criterion for fuzzy items is designed in the framework of fuzzy decision tree induction. The proposed criterion applies an improved dynamic programming algorithm used in scheduling problems to establish an optimal number of fuzzy items for each condition attribute. Then, based on these fuzzy partitions, a fuzzy decision tree is constructed in a top-down recursive way. A comparative analysis using several traditional decision trees verify the feasibility of the proposed dynamic programming based fuzzy partition criterion. Furthermore, under the same framework of fuzzy decision trees, the proposed fuzzy partition solution can obtain a higher classification accuracy than some cases with the same amount of fuzzy items.

Download Full-text

Comparison of Algorithms for Fuzzy Decision Tree Induction

2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA) ◽

10.1109/iceta51985.2020.9379189 ◽

2020 ◽

Author(s):

Jan Rabcan ◽

Patrik Rusnak ◽

Jozef Kostolny ◽

Radomir S. Stankovic

Keyword(s):

Decision Tree ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Decision Tree Induction ◽

Comparison Of Algorithms

Download Full-text

Embedded Feature Construction in Fuzzy Decision Tree Induction for High Energy Physics Classification

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc42975.2020.9283103 ◽

2020 ◽

Author(s):

Noelie Cherrier ◽

Jean-Philippe Poli ◽

Maxime Defurne ◽

Franck Sabatie

Keyword(s):

Decision Tree ◽

High Energy Physics ◽

High Energy ◽

Feature Construction ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Decision Tree Induction ◽

Energy Physics

Download Full-text

Contrast trees and distribution boosting

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1921562117 ◽

2020 ◽

Vol 117 (35) ◽

pp. 21175-21184

Author(s):

Jerome H. Friedman

Keyword(s):

Decision Tree ◽

Statistical Models ◽

Conditional Distribution ◽

Parametric Representation ◽

Predictor Variables ◽

Diagnostic Tools ◽

Learning Method ◽

Decision Tree Induction ◽

Lack Of Fit ◽

Full Conditional Distribution

A method for decision tree induction is presented. Given a set of predictor variablesx=(x1,x2,⋅⋅⋅,xp)and two outcome variables y and z associated with each x, the goal is to identify those values of x for which the respective distributions ofy | xandz | x, or selected properties of those distributions such as means or quantiles, are most different. Contrast trees provide a lack-of-fit measure for statistical models of such statistics, or for the complete conditional distributionpy(y | x), as a function of x. They are easily interpreted and can be used as diagnostic tools to reveal and then understand the inaccuracies of models produced by any learning method. A corresponding contrast-boosting strategy is described for remedying any uncovered errors, thereby producing potentially more accurate predictions. This leads to a distribution-boosting strategy for directly estimating the full conditional distribution of y at each x under no assumptions concerning its shape, form, or parametric representation.

Download Full-text

decision tree induction
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A branch & bound algorithm to determine optimal bivariate splits for oblique decision tree induction

A Practical Tutorial for Decision Tree Induction

A Smartphone-Based Decision Support Tool for Predicting Patients at Risk of Chemotherapy-Induced Nausea and Vomiting: App Development using Decision Tree Induction (Preprint)

Investigation of KNN and Decision Tree Induction Modelin Predicting Customer Buying Pattern

Prodtu: A Novel Probabilistic Approach to Classify Uncertain Data Using Decision Tree Induction

Local dampening

Dynamic programming based fuzzy partition in fuzzy decision tree induction

Comparison of Algorithms for Fuzzy Decision Tree Induction

Embedded Feature Construction in Fuzzy Decision Tree Induction for High Energy Physics Classification

Contrast trees and distribution boosting

Export Citation Format

decision tree inductionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A branch & bound algorithm to determine optimal bivariate splits for oblique decision tree induction

A Practical Tutorial for Decision Tree Induction

A Smartphone-Based Decision Support Tool for Predicting Patients at Risk of Chemotherapy-Induced Nausea and Vomiting: App Development using Decision Tree Induction (Preprint)

Investigation of KNN and Decision Tree Induction Modelin Predicting Customer Buying Pattern

Prodtu: A Novel Probabilistic Approach to Classify Uncertain Data Using Decision Tree Induction

Local dampening

Dynamic programming based fuzzy partition in fuzzy decision tree induction

Comparison of Algorithms for Fuzzy Decision Tree Induction

Embedded Feature Construction in Fuzzy Decision Tree Induction for High Energy Physics Classification

Contrast trees and distribution boosting

decision tree induction
Recently Published Documents