Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests

The growing success of Machine Learning (ML) is making significant improvements to predictive models, facilitating their integration in various application fields, especially the healthcare context. However, it still has limitations and drawbacks, such as the lack of interpretability which does not allow users to understand how certain decisions are made. This drawback is identified with the term "Black-Box", as well as models that do not allow to interpret the internal work of certain ML techniques, thus discouraging their use. In a highly regulated and risk-averse context such as healthcare, although "trust" is not synonymous with decision and adoption, trusting an ML model is essential for its adoption. Many clinicians and health researchers feel uncomfortable with black box ML models, even if they achieve high degrees of diagnostic or prognostic accuracy. Therefore more and more research is being conducted on the functioning of these models. Our study focuses on the Random Forest (RF) model. It is one of the most performing and used methodologies in the context of ML approaches, in all fields of research from hard sciences to humanities. In the health context and in the evaluation of health policies, their use is limited by the impossibility of obtaining an interpretation of the causal links between predictors and response. This explains why we need to develop new techniques, tools, and approaches for reconstructing the causal relationships and interactions between predictors and response used in a RF model. Our research aims to perform a machine learning experiment on several medical datasets through a comparison between two methodologies, which are inTrees and NodeHarvest. They are the main approaches in the rules extraction framework. The contribution of our study is to identify, among the approaches to rule extraction, the best proposal for suggesting the appropriate choice to decision-makers in the health domain.

Download Full-text

Ada-WHIPS: Explaining AdaBoost Classification with Applications in the Health Sciences

10.21203/rs.2.19113/v4 ◽

2020 ◽

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R.M. Atif Azad

Keyword(s):

Machine Learning ◽

Statistical Tests ◽

Black Box ◽

Decision Makers ◽

Medical Practitioners ◽

Research Areas ◽

Novel Approach ◽

Mri Scans ◽

Practice Methods ◽

Logical Rules

Abstract Background Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients' disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning (ML) models and high dimensional data sources (electronic health records, MRI scans, cardiotocograms, etc). These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years because it addresses the interpretability and trust concerns of critical decision makers, including those in clinical and medical practice. Methods In this work, we focus on AdaBoost, a black box ML model that has been widely adopted in the CAD literature. We address the challenge -- to explain AdaBoost classification -- with a novel algorithm that extracts simple, logical rules from AdaBoost models. Our algorithm, Adaptive-Weighted High Importance Path Snippets (Ada-WHIPS), makes use of AdaBoost's adaptive classifier weights. Using a novel formulation, Ada-WHIPS uniquely redistributes the weights among individual decision nodes of the internal decision trees (DT) of the AdaBoost model. Then, a simple heuristic search of the weighted nodes finds a single rule that dominated the model's decision. We compare the explanations generated by our novel approach with the state of the art in an experimental study. We evaluate the derived explanations with simple statistical tests of well-known quality measures, precision and coverage, and a novel measure stability that is better suited to the XAI setting .

Download Full-text

Interpretable machine learning with reject option

at - Automatisierungstechnik ◽

10.1515/auto-2017-0123 ◽

2018 ◽

Vol 66 (4) ◽

pp. 283-290 ◽

Cited By ~ 7

Author(s):

Johannes Brinkrolf ◽

Barbara Hammer

Keyword(s):

Machine Learning ◽

Vector Quantization ◽

Random Forests ◽

Black Box ◽

Learning Models ◽

Process Automation ◽

Reject Option ◽

Interpretable Machine Learning ◽

Adversarial Examples ◽

Machine Learning Models

Abstract Classification by means of machine learning models constitutes one relevant technology in process automation and predictive maintenance. However, common techniques such as deep networks or random forests suffer from their black box characteristics and possible adversarial examples. In this contribution, we give an overview about a popular alternative technology from machine learning, namely modern variants of learning vector quantization, which, due to their combined discriminative and generative nature, incorporate interpretability and the possibility of explicit reject options for irregular samples. We give an explicit bound on minimum changes required for a change of the classification in case of LVQ networks with reject option, and we demonstrate the efficiency of reject options in two examples.

Download Full-text

Mapping the risk terrain for crime using machine learning

10.31235/osf.io/xc538 ◽

2020 ◽

Author(s):

Andrew Palmer Wheeler ◽

Wouter Steenbeek

Keyword(s):

Machine Learning ◽

Random Forests ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Kernel Density ◽

Black Box ◽

Terrain Models ◽

Interpretable Model ◽

Spatially Varying

Objectives: We illustrate how a machine learning algorithm, Random Forests, can provide accurate long-term predictions of crime at micro places relative to other popular techniques. We also show how recent advances in model summaries can help to open the ‘black box’ of Random Forests, considerably improving their interpretability.Methods: We generate long-term crime forecasts for robberies in Dallas at 200 by 200 feet grid cells that allow spatially varying associations of crime generators and demographic factors across the study area. We then show how using interpretable model summaries facilitate understanding the model’s inner workings.Results: We find that Random Forests greatly outperform Risk Terrain Models and Kernel Density Estimation in terms of forecasting future crimes using different measures of predictive accuracy, but only slightly outperform using prior counts of crime. We find different factors that predict crime are highly non-linear and vary over space. Conclusions: We show how using black-box machine learning models can provide accurate micro placed based crime predictions, but still be interpreted in a manner that fosters understanding of why a place is predicted to be risky.Data and code to replicate the results can be downloaded from https://www.dropbox.com/sh/b3n9a6z5xw14rd6/AAAjqnoMVKjzNQnWP9eu7M1ra?dl=0

Download Full-text

Ada-WHIPS: Explaining AdaBoost Classification with Applications in the Health Sciences

10.21203/rs.2.19113/v3 ◽

2020 ◽

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R.M. Atif Azad

Keyword(s):

Machine Learning ◽

Statistical Tests ◽

Black Box ◽

Decision Makers ◽

Medical Practitioners ◽

Research Areas ◽

Novel Approach ◽

Mri Scans ◽

Practice Methods ◽

Logical Rules

Download Full-text

Cloud-Based Parallel Machine Learning for Tool Wear Prediction

Journal of Manufacturing Science and Engineering ◽

10.1115/1.4038002 ◽

2018 ◽

Vol 140 (4) ◽

Cited By ~ 15

Author(s):

Dazhong Wu ◽

Connor Jennings ◽

Janis Terpenny ◽

Soundar Kumara ◽

Robert X. Gao

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Tool Wear ◽

Random Forests ◽

Predictive Models ◽

Parallel Machine ◽

Data Driven ◽

Smart Manufacturing ◽

Processing Scheme ◽

Cloud Computing System

The emergence of cloud computing, industrial internet of things (IIoT), and new machine learning techniques have shown the potential to advance prognostics and health management (PHM) in smart manufacturing. While model-based PHM techniques provide insight into the progression of faults in mechanical components, certain assumptions on the underlying physical mechanisms for fault development are required to develop predictive models. In situations where there is a lack of adequate prior knowledge of the underlying physics, data-driven PHM techniques have been increasingly applied in the field of smart manufacturing. One of the limitations of current data-driven methods is that large volumes of training data are required to make accurate predictions. Consequently, computational efficiency remains a primary challenge, especially when large volumes of sensor-generated data need to be processed in real-time applications. The objective of this research is to introduce a cloud-based parallel machine learning algorithm that is capable of training large-scale predictive models more efficiently. The random forests (RFs) algorithm is parallelized using the MapReduce data processing scheme. The MapReduce-based parallel random forests (PRFs) algorithm is implemented on a scalable cloud computing system with varying combinations of processors and memories. The effectiveness of this new method is demonstrated using condition monitoring data collected from milling experiments. By implementing RFs in parallel on the cloud, a significant increase in the processing speed (14.7 times in terms of increase in training time) has been achieved, with a high prediction accuracy of tool wear (eight times in terms of reduction in mean squared error (MSE)).

Download Full-text

Inspection of Blackbox Models for Evaluating Vulnerability in Maternal, Newborn, and Child Health

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/770 ◽

2020 ◽

Author(s):

William Ogallo ◽

Skyler Speakman ◽

Victor Akinwande ◽

Kush R Varshney ◽

Aisha Walcott-Bryant ◽

...

Keyword(s):

Machine Learning ◽

Child Health ◽

Predictive Models ◽

Black Box ◽

Demographic Health Survey ◽

Learning Models ◽

Newborn And Child Health ◽

Health Survey Data ◽

And Child Health ◽

Machine Learning Models

Improving maternal, newborn, and child health (MNCH) outcomes is a critical target for global sustainable development. Our research is centered on building predictive models, evaluating their interpretability, and generating actionable insights about the markers (features) and triggers (events) associated with vulnerability in MNCH. In this work, we demonstrate how a tool for inspecting "black box" machine learning models can be used to generate actionable insights from models trained on demographic health survey data to predict neonatal mortality.

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text

Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach

10.31234/osf.io/eqcdx ◽

2019 ◽

Author(s):

Oskar Flygare ◽

Jesper Enander ◽

Erik Andersson ◽

Brjánn Ljótsson ◽

Volen Z Ivanov ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Clinical Utility ◽

Body Dysmorphic Disorder ◽

Prediction Models ◽

Behavioral Therapy ◽

Learning Approach ◽

Learning Approaches ◽

Machine Learning Approach

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.

Download Full-text