Entropy-Based Greedy Algorithm for Decision Trees Using Hypotheses

In this paper, we consider decision trees that use both conventional queries based on one attribute each and queries based on hypotheses of values of all attributes. Such decision trees are similar to those studied in exact learning, where membership and equivalence queries are allowed. We present greedy algorithm based on entropy for the construction of the above decision trees and discuss the results of computer experiments on various data sets and randomly generated Boolean functions.

Download Full-text

Optimization of Decision Trees with Hypotheses for Knowledge Representation

Electronics ◽

10.3390/electronics10131580 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1580

Author(s):

Mohammad Azad ◽

Igor Chikalov ◽

Shahid Hussain ◽

Mikhail Moshkov

Keyword(s):

Dynamic Programming ◽

Knowledge Representation ◽

Decision Trees ◽

Boolean Functions ◽

Computer Experiments ◽

Data Sets ◽

Exact Learning ◽

Equivalence Queries ◽

Programming Algorithms

In this paper, we consider decision trees that use two types of queries: queries based on one attribute each and queries based on hypotheses about values of all attributes. Such decision trees are similar to the ones studied in exact learning, where membership and equivalence queries are allowed. We present dynamic programming algorithms for minimization of the depth and number of nodes of above decision trees and discuss results of computer experiments on various data sets and randomly generated Boolean functions. Decision trees with hypotheses generally have less complexity, i.e., they are more understandable and more suitable as a means for knowledge representation.

Download Full-text

On the Depth of Decision Trees with Hypotheses

Entropy ◽

10.3390/e24010116 ◽

2022 ◽

Vol 24 (1) ◽

pp. 116

Author(s):

Mikhail Moshkov

Keyword(s):

Information Systems ◽

Decision Trees ◽

Rough Set Theory ◽

Test Theory ◽

Complexity Classes ◽

Worst Case ◽

Exact Learning ◽

Equivalence Queries ◽

Problem Description ◽

Binary Information

In this paper, based on the results of rough set theory, test theory, and exact learning, we investigate decision trees over infinite sets of binary attributes represented as infinite binary information systems. We define the notion of a problem over an information system and study three functions of the Shannon type, which characterize the dependence in the worst case of the minimum depth of a decision tree solving a problem on the number of attributes in the problem description. The considered three functions correspond to (i) decision trees using attributes, (ii) decision trees using hypotheses (an analog of equivalence queries from exact learning), and (iii) decision trees using both attributes and hypotheses. The first function has two possible types of behavior: logarithmic and linear (this result follows from more general results published by the author earlier). The second and the third functions have three possible types of behavior: constant, logarithmic, and linear (these results were published by the author earlier without proofs that are given in the present paper). Based on the obtained results, we divided the set of all infinite binary information systems into four complexity classes. In each class, the type of behavior for each of the considered three functions does not change.

Download Full-text

Decision Rules Derived from Optimal Decision Trees with Hypotheses

Entropy ◽

10.3390/e23121641 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1641

Author(s):

Mohammad Azad ◽

Igor Chikalov ◽

Shahid Hussain ◽

Mikhail Moshkov ◽

Beata Zielosko

Keyword(s):

Decision Trees ◽

Decision Rules ◽

Computer Experiments ◽

Optimal Decision ◽

Equivalence Queries ◽

Minimum Number ◽

Minimum Depth ◽

Decision Tables ◽

Programming Algorithms ◽

Better Than

Conventional decision trees use queries each of which is based on one attribute. In this study, we also examine decision trees that handle additional queries based on hypotheses. This kind of query is similar to the equivalence queries considered in exact learning. Earlier, we designed dynamic programming algorithms for the computation of the minimum depth and the minimum number of internal nodes in decision trees that have hypotheses. Modification of these algorithms considered in the present paper permits us to build decision trees with hypotheses that are optimal relative to the depth or relative to the number of the internal nodes. We compare the length and coverage of decision rules extracted from optimal decision trees with hypotheses and decision rules extracted from optimal conventional decision trees to choose the ones that are preferable as a tool for the representation of information. To this end, we conduct computer experiments on various decision tables from the UCI Machine Learning Repository. In addition, we also consider decision tables for randomly generated Boolean functions. The collected results show that the decision rules derived from decision trees with hypotheses in many cases are better than the rules extracted from conventional decision trees.

Download Full-text

Rule Extraction from Decision Trees Ensembles: New Algorithms Based on Heuristic Search and Sparse Group Lasso Methods

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622017500055 ◽

2017 ◽

Vol 16 (06) ◽

pp. 1707-1727 ◽

Cited By ~ 9

Author(s):

Morteza Mashayekhi ◽

Robin Gras

Keyword(s):

Decision Trees ◽

Predictive Accuracy ◽

Weight Vector ◽

Rule Extraction ◽

Group Lasso ◽

Hill Climbing ◽

Data Sets ◽

Sparse Group Lasso ◽

Rule Set ◽

Interpretable Models

Decision trees are examples of easily interpretable models whose predictive accuracy is normally low. In comparison, decision tree ensembles (DTEs) such as random forest (RF) exhibit high predictive accuracy while being regarded as black-box models. We propose three new rule extraction algorithms from DTEs. The RF[Formula: see text]DHC method, a hill climbing method with downhill moves (DHC), is used to search for a rule set that decreases the number of rules dramatically. In the RF[Formula: see text]SGL and RF[Formula: see text]MSGL methods, the sparse group lasso (SGL) method, and the multiclass SGL (MSGL) method are employed respectively to find a sparse weight vector corresponding to the rules generated by RF. Experimental results with 24 data sets show that the proposed methods outperform similar state-of-the-art methods, in terms of human comprehensibility, by greatly reducing the number of rules and limiting the number of antecedents in the retained rules, while preserving the same level of accuracy.

Download Full-text

An Explainable Bayesian Decision Tree Algorithm

Frontiers in Applied Mathematics and Statistics ◽

10.3389/fams.2021.598833 ◽

2021 ◽

Vol 7 ◽

Author(s):

Giuseppe Nuti ◽

Lluís Antoni Jiménez Rugama ◽

Andreea-Ingrid Cross

Keyword(s):

Monte Carlo ◽

Decision Tree ◽

Decision Trees ◽

Medical Industry ◽

Data Sets ◽

Bayesian Decision ◽

Decision Tree Algorithm ◽

Probabilistic Framework ◽

Classification Problems ◽

Tree Algorithm

Bayesian Decision Trees provide a probabilistic framework that reduces the instability of Decision Trees while maintaining their explainability. While Markov Chain Monte Carlo methods are typically used to construct Bayesian Decision Trees, here we provide a deterministic Bayesian Decision Tree algorithm that eliminates the sampling and does not require a pruning step. This algorithm generates the greedy-modal tree (GMT) which is applicable to both regression and classification problems. We tested the algorithm on various benchmark classification data sets and obtained similar accuracies to other known techniques. Furthermore, we show that we can statistically analyze how was the GMT derived from the data and demonstrate this analysis with a financial example. Notably, the GMT allows for a technique that provides explainable simpler models which is often a prerequisite for applications in finance or the medical industry.

Download Full-text

Performance Analysis of a Greedy Algorithm for Inferring Boolean Functions

Discovery Science - Lecture Notes in Computer Science ◽

10.1007/978-3-540-39644-4_11 ◽

2003 ◽

pp. 114-127 ◽

Cited By ~ 1

Author(s):

Daiji Fukagawa ◽

Tatsuya Akutsu

Keyword(s):

Performance Analysis ◽

Greedy Algorithm ◽

Boolean Functions

Download Full-text

ORACLES IN $\Sigma^p_2$ ARE SUFFFICIENT FOR EXACT LEARNING

International Journal of Foundations of Computer Science ◽

10.1142/s012905410000034x ◽

2000 ◽

Vol 11 (04) ◽

pp. 613-632 ◽

Cited By ~ 5

Author(s):

Johannes Köbler ◽

Wolfgang Lindner

Keyword(s):

Polynomial Time ◽

Learning Model ◽

Query Complexity ◽

Boolean Circuits ◽

Membership Queries ◽

Exact Learning ◽

Equivalence Queries

We study the learnability of representation classes in Angluin's exact learning model. In particular, we consider the following three query types: equivalence queries, equivalence and membership queries, and membership queries only. We show in all three cases that polynomial query complexity implies already polynomial-time learnability, provided that the learner additionally has access to an oracle in [Formula: see text]. It follows that boolean circuits are polynomial-time learnable with equivalence queries and the help of an oracle in [Formula: see text].a

Download Full-text

A STUDY ON RULE EXTRACTION FROM SEVERAL COMBINED NEURAL NETWORKS

International Journal of Neural Systems ◽

10.1142/s0129065701000680 ◽

2001 ◽

Vol 11 (03) ◽

pp. 247-255 ◽

Cited By ~ 17

Author(s):

GUIDO BOLOGNA

Keyword(s):

Neural Network ◽

Neural Networks ◽

Decision Trees ◽

Polynomial Time ◽

Public Domain ◽

Rule Extraction ◽

Data Sets ◽

Np Hard ◽

The Public ◽

A New Technique

The problem of rule extraction from neural networks is NP-hard. This work presents a new technique to extract "if-then-else" rules from ensembles of DIMLP neural networks. Rules are extracted in polynomial time with respect to the dimensionality of the problem, the number of examples, and the size of the resulting network. Further, the degree of matching between extracted rules and neural network responses is 100%. Ensembles of DIMLP networks were trained on four data sets in the public domain. Extracted rules were on average significantly more accurate than those extracted from C4.5 decision trees.

Download Full-text

A Hybrid Evolutionary Approach To Construct Optimal Decision Trees With Large Data Sets

2006 IEEE International Conference on Industrial Technology ◽

10.1109/icit.2006.372250 ◽

2006 ◽

Cited By ~ 6

Author(s):

D. V. Patil ◽

R. S. Bichkar

Keyword(s):

Decision Trees ◽

Large Data ◽

Large Data Sets ◽

Evolutionary Approach ◽

Optimal Decision ◽

Data Sets

Download Full-text

Popular Ensemble Methods: An Empirical Study

Journal of Artificial Intelligence Research ◽

10.1613/jair.614 ◽

1999 ◽

Vol 11 ◽

pp. 169-198 ◽

Cited By ~ 1161

Author(s):

D. Opitz ◽

R. Maclin

Keyword(s):

Neural Networks ◽

Empirical Study ◽

Decision Trees ◽

Noisy Data ◽

Ensemble Methods ◽

Classification Algorithm ◽

The Other ◽

Data Sets ◽

Data Set ◽

Networks Analysis

An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Bagging (Breiman, 1996c) and Boosting (Freund & Shapire, 1996; Shapire, 1990) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods on 23 data sets using both neural networks and decision trees as our classification algorithm. Our results clearly indicate a number of conclusions. First, while Bagging is almost always more accurate than a single classifier, it is sometimes much less accurate than Boosting. On the other hand, Boosting can create ensembles that are less accurate than a single classifier -- especially when using neural networks. Analysis indicates that the performance of the Boosting methods is dependent on the characteristics of the data set being examined. In fact, further results show that Boosting ensembles may overfit noisy data sets, thus decreasing its performance. Finally, consistent with previous studies, our work suggests that most of the gain in an ensemble's performance comes in the first few classifiers combined; however, relatively large gains can be seen up to 25 classifiers when Boosting decision trees.

Download Full-text