Machine Learning Explainability Through Comprehensible Decision Trees

Author(s):  
Alberto Blanco-Justicia ◽  
Josep Domingo-Ferrer
2021 ◽  
Vol 11 (15) ◽  
pp. 6728
Author(s):  
Muhammad Asfand Hafeez ◽  
Muhammad Rashid ◽  
Hassan Tariq ◽  
Zain Ul Abideen ◽  
Saud S. Alotaibi ◽  
...  

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.


2020 ◽  
Author(s):  
Vincent Bremer ◽  
Philip I Chow ◽  
Burkhardt Funk ◽  
Frances P Thorndike ◽  
Lee M Ritterband

BACKGROUND User dropout is a widespread concern in the delivery and evaluation of digital (ie, web and mobile apps) health interventions. Researchers have yet to fully realize the potential of the large amount of data generated by these technology-based programs. Of particular interest is the ability to predict who will drop out of an intervention. This may be possible through the analysis of user journey data—self-reported as well as system-generated data—produced by the path (or journey) an individual takes to navigate through a digital health intervention. OBJECTIVE The purpose of this study is to provide a step-by-step process for the analysis of user journey data and eventually to predict dropout in the context of digital health interventions. The process is applied to data from an internet-based intervention for insomnia as a way to illustrate its use. The completion of the program is contingent upon completing 7 sequential cores, which include an initial tutorial core. Dropout is defined as not completing the seventh core. METHODS Steps of user journey analysis, including data transformation, feature engineering, and statistical model analysis and evaluation, are presented. Dropouts were predicted based on data from 151 participants from a fully automated web-based program (Sleep Healthy Using the Internet) that delivers cognitive behavioral therapy for insomnia. Logistic regression with L1 and L2 regularization, support vector machines, and boosted decision trees were used and evaluated based on their predictive performance. Relevant features from the data are reported that predict user dropout. RESULTS Accuracy of predicting dropout (area under the curve [AUC] values) varied depending on the program core and the machine learning technique. After model evaluation, boosted decision trees achieved AUC values ranging between 0.6 and 0.9. Additional handcrafted features, including time to complete certain steps of the intervention, time to get out of bed, and days since the last interaction with the system, contributed to the prediction performance. CONCLUSIONS The results support the feasibility and potential of analyzing user journey data to predict dropout. Theory-driven handcrafted features increased the prediction performance. The ability to predict dropout at an individual level could be used to enhance decision making for researchers and clinicians as well as inform dynamic intervention regimens.


2021 ◽  
Author(s):  
Chris J. Kennedy ◽  
Dustin G. Mark ◽  
Jie Huang ◽  
Mark J. van der Laan ◽  
Alan E. Hubbard ◽  
...  

Background: Chest pain is the second leading reason for emergency department (ED) visits and is commonly identified as a leading driver of low-value health care. Accurate identification of patients at low risk of major adverse cardiac events (MACE) is important to improve resource allocation and reduce over-treatment. Objectives: We sought to assess machine learning (ML) methods and electronic health record (EHR) covariate collection for MACE prediction. We aimed to maximize the pool of low-risk patients that are accurately predicted to have less than 0.5% MACE risk and may be eligible for reduced testing. Population Studied: 116,764 adult patients presenting with chest pain in the ED and evaluated for potential acute coronary syndrome (ACS). 60-day MACE rate was 1.9%. Methods: We evaluated ML algorithms (lasso, splines, random forest, extreme gradient boosting, Bayesian additive regression trees) and SuperLearner stacked ensembling. We tuned ML hyperparameters through nested ensembling, and imputed missing values with generalized low-rank models (GLRM). We benchmarked performance to key biomarkers, validated clinical risk scores, decision trees, and logistic regression. We explained the models through variable importance ranking and accumulated local effect visualization. Results: The best discrimination (area under the precision-recall [PR-AUC] and receiver operating characteristic [ROC-AUC] curves) was provided by SuperLearner ensembling (0.148, 0.867), followed by random forest (0.146, 0.862). Logistic regression (0.120, 0.842) and decision trees (0.094, 0.805) exhibited worse discrimination, as did risk scores [HEART (0.064, 0.765), EDACS (0.046, 0.733)] and biomarkers [serum troponin level (0.064, 0.708), electrocardiography (0.047, 0.686)]. The ensemble's risk estimates were miscalibrated by 0.2 percentage points. The ensemble accurately identified 50% of patients to be below a 0.5% 60-day MACE risk threshold. The most important predictors were age, peak troponin, HEART score, EDACS score, and electrocardiogram. GLRM imputation achieved 90% reduction in root mean-squared error compared to median-mode imputation. Conclusion: Use of ML algorithms, combined with broad predictor sets, improved MACE risk prediction compared to simpler alternatives, while providing calibrated predictions and interpretability. Standard risk scores may neglect important health information available in other characteristics and combined in nuanced ways via ML.


2019 ◽  
Vol 8 (11) ◽  
pp. e298111473
Author(s):  
Hugo Kenji Rodrigues Okada ◽  
Andre Ricardo Nascimento das Neves ◽  
Ricardo Shitsuka

Decision trees are data structures or computational methods that enable nonparametric supervised machine learning and are used in classification and regression tasks. The aim of this paper is to present a comparison between the decision tree induction algorithms C4.5 and CART. A quantitative study is performed in which the two methods are compared by analyzing the following aspects: operation and complexity. The experiments presented practically equal hit percentages in the execution time for tree induction, however, the CART algorithm was approximately 46.24% slower than C4.5 and was considered to be more effective.


Author(s):  
Gaël Aglin ◽  
Siegfried Nijssen ◽  
Pierre Schaus

Decision Trees (DTs) are widely used Machine Learning (ML) models with a broad range of applications. The interest in these models has increased even further in the context of Explainable AI (XAI), as decision trees of limited depth are very interpretable models. However, traditional algorithms for learning DTs are heuristic in nature; they may produce trees that are of suboptimal quality under depth constraints. We introduce PyDL8.5, a Python library to infer depth-constrained Optimal Decision Trees (ODTs). PyDL8.5 provides an interface for DL8.5, an efficient algorithm for inferring depth-constrained ODTs. The library provides an easy-to-use scikit-learn compatible interface. It cannot only be used for classification tasks, but also for regression, clustering, and other tasks. We introduce an interface that allows users to easily implement these other learning tasks. We provide a number of examples of how to use this library.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Koichi Sughimoto ◽  
Jacob Levman ◽  
Fazleem Baig ◽  
Derek Berger ◽  
Yoshihiro Oshima ◽  
...  

Introduction: Despite improvements in management for children after cardiac surgery, a non-negligible proportion of patients suffer from cardiac arrest, having a poor prognosis. Although serum lactate levels are widely accepted markers of hemodynamic instability, measuring lactate requires discrete blood sampling. An alternative method to evaluate hemodynamic stability/instability continuously and non-invasively may assist in improving the standard of patient care. Hypothesis: We hypothesize that blood lactate in PICU patients can be predicted using machine learning applied to arterial waveforms and perioperative characteristics. Methods: Forty-eight children, who underwent heart surgery, were included. Patient characteristics and physiological measurements were acquired and analyzed using specialized software/hardware, including heart rate, lactate level, arterial waveform sharpness, and area under the curve. Predicting a patient’s blood lactate levels was accomplished using regression-based supervised learning algorithms, including regression decision trees, tuned decision trees, random forest regressor, tuned random forest, AdaBoost regressor, and hypertuned AdaBoost. All algorithms were compared with hold-out cross validation. Two approaches were considered: basing prediction on the currently acquired physiological measurements along with those acquired at admission, as well as adding the most recent lactate measurement and the time since that measurement as prediction parameters. The second approach supports updating the learning system’s predictive capacity whenever a patient has a new ground truth blood lactate reading acquired. Results: In both approaches, the best performing machine learning method was the tuned random forest, which yielded a mean absolute error of 5.60 mg/dL in the first approach, and 4.62 mg/dL when predicting blood lactate with updated ground truth. Conclusions: In conclusion, the tuned random forest is capable of predicting the level of serum lactate by analyzing perioperative variables, including the arterial pressure waveform. Machine learning can predict the patient’s hemodynamics non-invasively, continuously, and with accuracy that may demonstrate clinical utility.


2018 ◽  
pp. 1587-1599
Author(s):  
Hiroaki Koma ◽  
Taku Harada ◽  
Akira Yoshizawa ◽  
Hirotoshi Iwasaki

Detecting distracted states can be applied to various problems such as danger prevention when driving a car. A cognitive distracted state is one example of a distracted state. It is known that eye movements express cognitive distraction. Eye movements can be classified into several types. In this paper, the authors detect a cognitive distraction using classified eye movement types when applying the Random Forest machine learning algorithm, which uses decision trees. They show the effectiveness of considering eye movement types for detecting cognitive distraction when applying Random Forest. The authors use visual experiments with still images for the detection.


Sign in / Sign up

Export Citation Format

Share Document