Ada-WHIPS: Explaining AdaBoost Classification with Applications in the Health Sciences

10.21203/rs.2.19113/v3 ◽

2020 ◽

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R.M. Atif Azad

Keyword(s):

Machine Learning ◽

Statistical Tests ◽

Black Box ◽

Decision Makers ◽

Medical Practitioners ◽

Research Areas ◽

Novel Approach ◽

Mri Scans ◽

Practice Methods ◽

Logical Rules

Abstract Background Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients' disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning (ML) models and high dimensional data sources (electronic health records, MRI scans, cardiotocograms, etc). These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years because it addresses the interpretability and trust concerns of critical decision makers, including those in clinical and medical practice. Methods In this work, we focus on AdaBoost, a black box ML model that has been widely adopted in the CAD literature. We address the challenge -- to explain AdaBoost classification -- with a novel algorithm that extracts simple, logical rules from AdaBoost models. Our algorithm, Adaptive-Weighted High Importance Path Snippets (Ada-WHIPS), makes use of AdaBoost's adaptive classifier weights. Using a novel formulation, Ada-WHIPS uniquely redistributes the weights among individual decision nodes of the internal decision trees (DT) of the AdaBoost model. Then, a simple heuristic search of the weighted nodes finds a single rule that dominated the model's decision. We compare the explanations generated by our novel approach with the state of the art in an experimental study. We evaluate the derived explanations with simple statistical tests of well-known quality measures, precision and coverage, and a novel measure stability that is better suited to the XAI setting .

Download Full-text

Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01201-2 ◽

2020 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R. Muhammad Atif Azad

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Statistical Tests ◽

Black Box ◽

The State ◽

Related Data ◽

Research Areas ◽

Novel Approach ◽

Tightly Coupled ◽

Novel Algorithm

Abstract Background Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients’ disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning models and high dimensional data sources such as electronic health records, magnetic resonance imaging scans, cardiotocograms, etc. These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years because it addresses the interpretability and trust concerns of critical decision makers, including those in clinical and medical practice. Methods In this work, we focus on AdaBoost, a black box model that has been widely adopted in the CAD literature. We address the challenge – to explain AdaBoost classification – with a novel algorithm that extracts simple, logical rules from AdaBoost models. Our algorithm, Adaptive-Weighted High Importance Path Snippets (Ada-WHIPS), makes use of AdaBoost’s adaptive classifier weights. Using a novel formulation, Ada-WHIPS uniquely redistributes the weights among individual decision nodes of the internal decision trees of the AdaBoost model. Then, a simple heuristic search of the weighted nodes finds a single rule that dominated the model’s decision. We compare the explanations generated by our novel approach with the state of the art in an experimental study. We evaluate the derived explanations with simple statistical tests of well-known quality measures, precision and coverage, and a novel measure stability that is better suited to the XAI setting. Results Experiments on 9 CAD-related data sets showed that Ada-WHIPS explanations consistently generalise better (mean coverage 15%-68%) than the state of the art while remaining competitive for specificity (mean precision 80%-99%). A very small trade-off in specificity is shown to guard against over-fitting which is a known problem in the state of the art methods. Conclusions The experimental results demonstrate the benefits of using our novel algorithm for explaining CAD AdaBoost classifiers widely found in the literature. Our tightly coupled, AdaBoost-specific approach outperforms model-agnostic explanation methods and should be considered by practitioners looking for an XAI solution for this class of models.

Download Full-text

Ada-WHIPS: Explaining AdaBoost Classification with Applications in the Health Sciences

10.21203/rs.2.19113/v2 ◽

2020 ◽

Cited By ~ 1

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R.M. Atif Azad

Keyword(s):

Machine Learning ◽

Statistical Tests ◽

Black Box ◽

Classification Rule ◽

Diagnostic Tools ◽

Rule Based ◽

Medical Practitioners ◽

Research Areas ◽

Computer Aided ◽

Novel Algorithm

Abstract Background Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients' disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning (ML) models and high dimensional data sources (electronic health records, MRI scans, cardiotocograms, etc). These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years, because it addresses the interpretability and trust concerns of medical practitioners and other critical decision makers. Method In this work, we focus on AdaBoost, a black box model that has been widely adopted in the CAD literature. We address the challenge -- to explain AdaBoost classification -- with a novel algorithm that extracts simple, logical rules from AdaBoost models. Our algorithm, \textit{Adaptive-Weighted High Importance Path Snippets} (Ada-WHIPS), makes use of AdaBoost's adaptive classifier weights; using a novel formulation, Ada-WHIPS uniquely redistributes the weights among individual decision nodes at the internals of the AdaBoost model. Then, a simple heuristic search of the weighted nodes finds a single rule that dominated the model's decision. We compare the explanations generated by our novel approach with the state of the art in an experimental study. We evaluate the derived explanations with simple statistical tests of well-known quality measures, precision and coverage, and a novel measure \textit{stability} that is better suited to the XAI setting. Results In this paper, our experimental results demonstrate the benefits of using our novel algorithm for explaining AdaBoost classification. The simple rule-based explanations have better generalisation (mean coverage 15\%-68\%) while remaining competitive for specificity (mean precision 80\%-99\%). A very small trade-off in specificity is shown to guard against over-fitting. Conclusions This research demonstrates that interpretable, classification rule-based explanations can be generated for computer aided diagnostic tools based on AdaBoost, and that a tightly coupled, AdaBoost-specific approach can outperform model-agnostic methods.

Download Full-text

Ada-WHIPS: Explaining AdaBoost Classification with Applications in the Health Sciences

10.21203/rs.2.19113/v1 ◽

2019 ◽

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R.M. Atif Azad

Keyword(s):

Machine Learning ◽

Statistical Tests ◽

Black Box ◽

Classification Rule ◽

Diagnostic Tools ◽

Rule Based ◽

Medical Practitioners ◽

Research Areas ◽

Computer Aided ◽

Novel Algorithm

Abstract Background Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients' disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning (ML) models and high dimensional data sources (electronic health records, MRI scans, cardiotocograms, etc). These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years, because it addresses the interpretability and trust concerns of medical practitioners and other critical decision makers. Method In this work, we focus on AdaBoost, a black box model that has been widely adopted in the CAD literature. We address the challenge -- to explain AdaBoost classification -- with a novel algorithm that extracts simple, logical rules from AdaBoost models. Our algorithm, \textit{Adaptive-Weighted High Importance Path Snippets} (Ada-WHIPS), makes use of AdaBoost's adaptive classifier weights; using a novel formulation, Ada-WHIPS uniquely redistributes the weights among individual decision nodes at the internals of the AdaBoost model. Then, a simple heuristic search of the weighted nodes finds a single rule that dominated the model's decision. We compare the explanations generated by our novel approach with the state of the art in an experimental study. We evaluate the derived explanations with simple statistical tests of well-known quality measures, precision and coverage, and a novel measure \textit{stability} that is better suited to the XAI setting. Results In this paper, our experimental results demonstrate the benefits of using our novel algorithm for explaining AdaBoost classification. The simple rule-based explanations have better generalisation (mean coverage 15\%-68\%) while remaining competitive for specificity (mean precision 80\%-99\%). A very small trade-off in specificity is shown to guard against over-fitting. Conclusions This research demonstrates that interpretable, classification rule-based explanations can be generated for computer aided diagnostic tools based on AdaBoost, and that a tightly coupled, AdaBoost-specific approach can outperform model-agnostic methods.

Download Full-text

Ada-WHIPS: Explaining AdaBoost Classification with Applications in the Health Sciences

10.21203/rs.2.19113/v5 ◽

2020 ◽

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R.M. Atif Azad

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Statistical Tests ◽

Black Box ◽

The State ◽

Related Data ◽

Research Areas ◽

Novel Approach ◽

Tightly Coupled ◽

Novel Algorithm

Abstract Background Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients' disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning (ML) models and high dimensional data sources (electronic health records, MRI scans, cardiotocograms, etc). These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years because it addresses the interpretability and trust concerns of critical decision makers, including those in clinical and medical practice. Methods In this work, we focus on AdaBoost, a black box ML model that has been widely adopted in the CAD literature. We address the challenge -- to explain AdaBoost classification -- with a novel algorithm that extracts simple, logical rules from AdaBoost models. Our algorithm, Adaptive-Weighted High Importance Path Snippets (Ada-WHIPS), makes use of AdaBoost's adaptive classifier weights. Using a novel formulation, Ada-WHIPS uniquely redistributes the weights among individual decision nodes of the internal decision trees (DT) of the AdaBoost model. Then, a simple heuristic search of the weighted nodes finds a single rule that dominated the model's decision. We compare the explanations generated by our novel approach with the state of the art in an experimental study. We evaluate the derived explanations with simple statistical tests of well-known quality measures, precision and coverage, and a novel measure stability that is better suited to the XAI setting.Results Experiments on 9 CAD-related data sets showed that Ada-WHIPS explanations consistently generalise better (mean coverage 15%-68%) than the state of the art while remaining competitive for speciﬁcity (mean precision 80%-99%). A very small trade-oﬀ in speciﬁcity is shown to guard againstover-ﬁtting which is a known problem in the state of the art methods.Conclusions The experimental results demonstrate the beneﬁts of using our novel algorithm for explaining CAD AdaBoost classiﬁers widely found in the literature. Our tightly coupled, AdaBoost-speciﬁc approach outperforms model-agnostic explanation methods and should be considered by practitioners looking for an XAI solution for this class of models.

Download Full-text

Exact Maximum Clique Algorithm for Different Graph Types Using Machine Learning

Mathematics ◽

10.3390/math10010097 ◽

2021 ◽

Vol 10 (1) ◽

pp. 97

Author(s):

Kristjan Reba ◽

Matej Guid ◽

Kati Rozman ◽

Dušanka Janežič ◽

Janez Konc

Keyword(s):

Machine Learning ◽

Maximum Clique ◽

Dynamic Algorithm ◽

Graph Theoretic ◽

Research Areas ◽

Novel Approach ◽

Search Speed ◽

Speed Up ◽

Clique Algorithm ◽

And Function

Finding a maximum clique is important in research areas such as computational chemistry, social network analysis, and bioinformatics. It is possible to compare the maximum clique size between protein graphs to determine their similarity and function. In this paper, improvements based on machine learning (ML) are added to a dynamic algorithm for finding the maximum clique in a protein graph, Maximum Clique Dynamic (MaxCliqueDyn; short: MCQD). This algorithm was published in 2007 and has been widely used in bioinformatics since then. It uses an empirically determined parameter, Tlimit, that determines the algorithm’s flow. We have extended the MCQD algorithm with an initial phase of a machine learning-based prediction of the Tlimit parameter that is best suited for each input graph. Such adaptability to graph types based on state-of-the-art machine learning is a novel approach that has not been used in most graph-theoretic algorithms. We show empirically that the resulting new algorithm MCQD-ML improves search speed on certain types of graphs, in particular molecular docking graphs used in drug design where they determine energetically favorable conformations of small molecules in a protein binding site. In such cases, the speed-up is twofold.

Download Full-text

Human-Centric Justification of Machine Learning Predictions

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/202 ◽

2017 ◽

Cited By ~ 6

Author(s):

Or Biran ◽

Kathleen McKeown

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Decision Makers ◽

Decision Making Process ◽

Human Reasoning ◽

Learning Models ◽

Language Generation ◽

Novel Approach ◽

Human Decision ◽

Machine Learning Models

Human decision makers in many domains can make use of predictions made by machine learning models in their decision making process, but the usability of these predictions is limited if the human is unable to justify his or her trust in the prediction. We propose a novel approach to producing justifications that is geared towards users without machine learning expertise, focusing on domain knowledge and on human reasoning, and utilizing natural language generation. Through a task-based experiment, we show that our approach significantly helps humans to correctly decide whether or not predictions are accurate, and significantly increases their satisfaction with the justification.

Download Full-text

Regulatory discretion: structuring power in the era of regulatory capitalism

Legal Studies ◽

10.1017/lst.2021.13 ◽

2021 ◽

pp. 1-20

Author(s):

Rebecca Schmidt ◽

Colin Scott

Keyword(s):

Decision Making ◽

Rule Of Law ◽

Black Box ◽

Decision Makers ◽

The Rule Of Law ◽

League Tables ◽

Novel Approach ◽

Decision Making Processes ◽

Changing Patterns ◽

Automated Decision Making

Abstract Discretion gives decision makers choices as to how resources are allocated, or how other aspects of state largesse or coercion are deployed. Discretionary state power challenges aspects of the rule of law, first by transferring decisions from legislators to departments, agencies and street-level bureaucrats and secondly by risking the uniform application of key fairness and equality norms. Concerns to find alternative and decentred forms of regulation gave rise to new types of regulation, sometimes labeled ‘regulatory capitalism’. Regulatory capitalism highlights the roles of a wider range of actors exercising powers and a wider range of instruments. It includes also new forms of discretion, for example over automated decision making processes, over the formulation and dissemination of league tables or over the use of behavioural measures. This paper takes a novel approach by linking and extending the significant literature on these changing patterns of regulatory administration with consideration of the changing modes of deployment of discretion. Using this specific lens, we observe two potentially contradictory trends: an increase in determining and structuring administrative decision, leading to a more transparent use of discretion; and the increased use of automated decision making processes which have the potential of producing a less transparent black box scenario.

Download Full-text

Competence region estimation for black-box surrogate models

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128571 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Tapan Shah

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Black Box ◽

Future Research ◽

Stochastic Quantization ◽

Learning Models ◽

Performance Loss ◽

Research Areas ◽

Upper Level ◽

Machine Learning Models

With advances in edge applications for industry andhealthcare, machine learning models are increasinglytrained on the edge. However, storage and memory in-frastructure at the edge are often primitive, due to costand real-estate constraints. A simple, effective methodis to learn machine learning models from quantized datastored with low arithmetic precision (1-8 bits). In thiswork, we introduce two stochastic quantization meth-ods, dithering and stochastic rounding. In dithering, ad-ditive noise from a uniform distribution is added tothe sample before quantization. In stochastic rounding,each sample is quantized to the upper level with prob-ability p and to a lower level with probability 1-p. Thekey contributions of the paper are For 3 standard machine learning models, Support Vec-tor Machines, Decision Trees and Linear (Logistic)Regression, we compare the performance loss for astandard static quantization and stochastic quantiza-tion for 55 classification and 30 regression datasetswith 1-8 bits quantization. We showcase that for 4- and 8-bit quantization overregression datasets, stochastic quantization demon-strates statistically significant improvement. We investigate the performance loss as a function ofdataset attributes viz. number of features, standard de-viation, skewness. This helps create a transfer functionwhich will recommend the best quantizer for a givendataset. We propose 2 future research areas, a) dynamic quan-tizer update where the model is trained using stream-ing data and the quantizer is updated after each batchand b) precision re-allocation under budget constraintswhere different precision is used for different features.

Download Full-text

Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests

10.36253/978-88-5518-461-8.34 ◽

2021 ◽

pp. 179-184

Author(s):

Massimo Aria ◽

Corrado Cuccurullo ◽

Agostino Gnasso

Keyword(s):

Machine Learning ◽

Random Forests ◽

Predictive Models ◽

Black Box ◽

Decision Makers ◽

Risk Averse ◽

New Techniques ◽

Prognostic Accuracy ◽

Application Fields ◽

Rules Extraction

The growing success of Machine Learning (ML) is making significant improvements to predictive models, facilitating their integration in various application fields, especially the healthcare context. However, it still has limitations and drawbacks, such as the lack of interpretability which does not allow users to understand how certain decisions are made. This drawback is identified with the term "Black-Box", as well as models that do not allow to interpret the internal work of certain ML techniques, thus discouraging their use. In a highly regulated and risk-averse context such as healthcare, although "trust" is not synonymous with decision and adoption, trusting an ML model is essential for its adoption. Many clinicians and health researchers feel uncomfortable with black box ML models, even if they achieve high degrees of diagnostic or prognostic accuracy. Therefore more and more research is being conducted on the functioning of these models. Our study focuses on the Random Forest (RF) model. It is one of the most performing and used methodologies in the context of ML approaches, in all fields of research from hard sciences to humanities. In the health context and in the evaluation of health policies, their use is limited by the impossibility of obtaining an interpretation of the causal links between predictors and response. This explains why we need to develop new techniques, tools, and approaches for reconstructing the causal relationships and interactions between predictors and response used in a RF model. Our research aims to perform a machine learning experiment on several medical datasets through a comparison between two methodologies, which are inTrees and NodeHarvest. They are the main approaches in the rules extraction framework. The contribution of our study is to identify, among the approaches to rule extraction, the best proposal for suggesting the appropriate choice to decision-makers in the health domain.

Download Full-text