Evaluating Nonlinear Decision Trees for Binary Classification Tasks with Other Existing Methods

Decision Trees (DTs) are widely used Machine Learning (ML) models with a broad range of applications. The interest in these models has increased even further in the context of Explainable AI (XAI), as decision trees of limited depth are very interpretable models. However, traditional algorithms for learning DTs are heuristic in nature; they may produce trees that are of suboptimal quality under depth constraints. We introduce PyDL8.5, a Python library to infer depth-constrained Optimal Decision Trees (ODTs). PyDL8.5 provides an interface for DL8.5, an efficient algorithm for inferring depth-constrained ODTs. The library provides an easy-to-use scikit-learn compatible interface. It cannot only be used for classification tasks, but also for regression, clustering, and other tasks. We introduce an interface that allows users to easily implement these other learning tasks. We provide a number of examples of how to use this library.

Download Full-text

Differential Evolution and Perceptron Decision Trees for Classification Tasks

Intelligent Data Engineering and Automated Learning - IDEAL 2012 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-32639-4_67 ◽

2012 ◽

pp. 550-557 ◽

Cited By ~ 5

Author(s):

R. A. Lopes ◽

A. R. R. Freitas ◽

R. C. Pedrosa Silva ◽

Frederico Gadelha Guimarães

Keyword(s):

Differential Evolution ◽

Decision Trees ◽

Classification Tasks

Download Full-text

MobileNetV2 Ensemble for Cervical Precancerous Lesions Classification

Processes ◽

10.3390/pr8050595 ◽

2020 ◽

Vol 8 (5) ◽

pp. 595 ◽

Cited By ~ 1

Author(s):

Cătălin Buiu ◽

Vlad-Rareş Dănăilă ◽

Cristina Nicoleta Răduţă

Keyword(s):

Uterine Cervix ◽

Precancerous Lesions ◽

Binary Classification ◽

The United States ◽

Medical Procedure ◽

Medical Doctors ◽

Analysis Framework ◽

Cervical Precancerous Lesions ◽

Classification Tasks

Women’s cancers remain a major challenge for many health systems. Between 1991 and 2017, the death rate for all major cancers fell continuously in the United States, excluding uterine cervix and uterine corpus cancers. Together with HPV (Human Papillomavirus) testing and cytology, colposcopy has played a central role in cervical cancer screening. This medical procedure allows physicians to view the cervix at a magnification of up to 10%. This paper presents an automated colposcopy image analysis framework for the classification of precancerous and cancerous lesions of the uterine cervix. This framework is based on an ensemble of MobileNetV2 networks. Our experimental results show that this method achieves accuracies of 83.33% and 91.66% on the four-class and binary classification tasks, respectively. These results are promising for the future use of automatic classification methods based on deep learning as tools to support medical doctors.

Download Full-text

Simplifying decision trees: A survey

The Knowledge Engineering Review ◽

10.1017/s0269888997000015 ◽

1997 ◽

Vol 12 (01) ◽

pp. 1-40 ◽

Cited By ~ 142

Author(s):

LEONARD A. BRESLOW ◽

DAVID W. AHA

Keyword(s):

Decision Trees ◽

Data Structures ◽

Classification Accuracy ◽

Case Based Reasoning ◽

Reasoning Systems ◽

Tree Generation ◽

Good Classification ◽

Classification Tasks ◽

Insight Into ◽

Case Based

Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpler, more comprehensible trees (or data structures derived from trees) with good classification accuracy, tree simplification has usually been of secondary concern relative to accuracy, and no attempt has been made to survey the literature from the perspective of simplification. We present a framework that organizes the approaches to tree simplification and summarize and critique the approaches within this framework. The purpose of this survey is to provide researchers and practitioners with a concise overview of tree-simplification approaches and insight into their relative capabilities. In our final discussion, we briefly describe some empirical findings and discuss the application of tree induction algorithms to case retrieval in case-based reasoning systems.

Download Full-text

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

BMC Genomics ◽

10.1186/s12864-019-6413-7 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 149

Author(s):

Davide Chicco ◽

Giuseppe Jurman

Keyword(s):

Correlation Coefficient ◽

Matthews Correlation Coefficient ◽

Binary Classification ◽

Confusion Matrix ◽

Mathematical Properties ◽

Statistical Measures ◽

Positive Elements ◽

Classification Tasks ◽

Classification Evaluation ◽

Confusion Matrices

Abstract Background To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. Results The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. Conclusions In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.

Download Full-text

Polarity correspondence: A general principle for performance of speeded binary classification tasks.

Psychological Bulletin ◽

10.1037/0033-2909.132.3.416 ◽

2006 ◽

Vol 132 (3) ◽

pp. 416-442 ◽

Cited By ~ 283

Author(s):

Robert W. Proctor ◽

Yang Seok Cho

Keyword(s):

General Principle ◽

Binary Classification ◽

Classification Tasks ◽

Polarity Correspondence

Download Full-text

How Does Knowledge of the AUC Constrain the Set of Possible Ground-Truth Labelings?

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015425 ◽

2019 ◽

Vol 33 ◽

pp. 5425-5432

Author(s):

Jacob Whitehill

Keyword(s):

Machine Learning ◽

Empirical Evidence ◽

Recent Work ◽

Roc Curve ◽

Binary Classification ◽

Ground Truth ◽

Mathematical Structure ◽

Test Set ◽

Classification Tasks ◽

N Vector

Recent work on privacy-preserving machine learning has considered how datamining competitions such as Kaggle could potentially be “hacked”, either intentionally or inadvertently, by using information from an oracle that reports a classifier’s accuracy on the test set (Blum and Hardt 2015; Hardt and Ullman 2014; Zheng 2015; Whitehill 2016). For binary classification tasks in particular, one of the most common accuracy metrics is the Area Under the ROC Curve (AUC), and in this paper we explore the mathematical structure of how the AUC is computed from an n-vector of real-valued “guesses” with respect to the ground-truth labels. Under the assumption of perfect knowledge of the test set AUC c=p/q, we show how knowing c constrains the set W of possible ground-truth labelings, and we derive an algorithm both to compute the exact number of such labelings and to enumerate efficiently over them. We also provide empirical evidence that, surprisingly, the number of compatible labelings can actually decrease as n grows, until a test set-dependent threshold is reached. Finally, we show how W can be efficiently whittled down, through pairs of oracle queries, to infer all the groundtruth test labels with complete certainty.

Download Full-text

A Multi-objective Optimisation Design Approach for Ensemble Member Combination in Binary Classification Tasks

10.17648/sbai-2019-111122 ◽

2019 ◽

Author(s):

Victor Henrique Alves Ribeiro ◽

Matheus Henrique Dal Molin Ribeiro ◽

Leandro dos Santos Coelho ◽

Gilberto Reynoso Meza

Keyword(s):

Binary Classification ◽

Ensemble Member ◽

Design Approach ◽

Multi Objective ◽

Classification Tasks

Download Full-text

The Eighty Five Percent Rule for Optimal Learning

10.1101/255182 ◽

2018 ◽

Cited By ~ 2

Author(s):

Robert C. Wilson ◽

Amitai Shenhav ◽

Mark Straccia ◽

Jonathan D. Cohen

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Broad Class ◽

Binary Classification ◽

Learning Algorithms ◽

Sweet Spot ◽

Optimal Learning ◽

Rate Of Learning ◽

Classification Tasks

AbstractResearchers and educators have long wrestled with the question of how best to teach their clients be they human, animal or machine. Here we focus on the role of a single variable, the difficulty of training, and examine its effect on the rate of learning. In many situations we find that there is a sweet spot in which training is neither too easy nor too hard, and where learning progresses most quickly. We derive conditions for this sweet spot for a broad class of learning algorithms in the context of binary classification tasks, in which ambiguous stimuli must be sorted into one of two classes. For all of these gradient-descent based learning algorithms we find that the optimal error rate for training is around 15.87% or, conversely, that the optimal training accuracy is about 85%. We demonstrate the efficacy of this ‘Eighty Five Percent Rule’ for artificial neural networks used in AI and biologically plausible neural networks thought to describe human and animal learning.

Download Full-text