scholarly journals treeheatr: an R package for interpretable decision tree visualizations

2020 ◽  
Author(s):  
Trang T. Le ◽  
Jason H. Moore

AbstractSummarytreeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods.AvailabilityThe treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous [email protected]

Author(s):  
Trang T Le ◽  
Jason H Moore

Abstract Summary treeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods. Availability and implementation The treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous integration.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Basim Mahbooba ◽  
Mohan Timilsina ◽  
Radhya Sahal ◽  
Martin Serrano

Despite the growing popularity of machine learning models in the cyber-security applications (e.g., an intrusion detection system (IDS)), most of these models are perceived as a black-box. The eXplainable Artificial Intelligence (XAI) has become increasingly important to interpret the machine learning models to enhance trust management by allowing human experts to understand the underlying data evidence and causal reasoning. According to IDS, the critical role of trust management is to understand the impact of the malicious data to detect any intrusion in the system. The previous studies focused more on the accuracy of the various classification algorithms for trust in IDS. They do not often provide insights into their behavior and reasoning provided by the sophisticated algorithm. Therefore, in this paper, we have addressed XAI concept to enhance trust management by exploring the decision tree model in the area of IDS. We use simple decision tree algorithms that can be easily read and even resemble a human approach to decision-making by splitting the choice into many small subchoices for IDS. We experimented with this approach by extracting rules in a widely used KDD benchmark dataset. We also compared the accuracy of the decision tree approach with the other state-of-the-art algorithms.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yejin Lee ◽  
Dae-Young Kim

Purpose Using the decision tree model, this study aims to understand the online travelers booking behaviors on Expedia.com, by examining influential determinants of online hotel booking, especially for longer-stay travelers. The geographical distance is also considered in understanding the booking behaviors trisecting travel destinations (i.e. Americas, Europe and Asia). Design/methodology/approach The data were obtained from American Statistical Association DataFest and Expedia.com. Based on the US travelers who made hotel reservation on the website, the study used a machine learning algorithm, decision tree, to analyze the influential determinants on hotel booking considering the geographical distance between origin and destination. Findings The results of the findings demonstrate that the choice of package product is the prioritized determinant for longer-stay hotel guests. Several similarities and differences were found from the significant determinants of the decision tree, in accordance with the geographic distance among the Americas, Europe and Asia. Research limitations/implications This paper presents the extension to an existing machine learning environment, and especially to the decision tree model. The findings are anticipated to expand the understanding of online hotel booking and apprehend the influential determinants toward consumers’ decision-making process regarding the relationship between geographical distance and traveler’s hotel staying duration. Originality/value This research brings a meaningful understanding of the hospitality and tourism industry, especially to the realm of machine learning adapted to an online booking website. It provides a unique approach to comprehend and forecast consumer behavior with data mining.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Alberto Parola ◽  
Ilaria Gabbatore ◽  
Laura Berardinelli ◽  
Rogerio Salvini ◽  
Francesca M. Bosco

AbstractAn impairment in pragmatic communication is a core feature of schizophrenia, often associated with difficulties in social interactions. The pragmatic deficits regard various pragmatic phenomena, e.g., direct and indirect communicative acts, deceit, irony, and include not only the use of language but also other expressive means such as non-verbal/extralinguistic modalities, e.g., gestures and body movements, and paralinguistic cues, e.g., prosody and tone of voice. The present paper focuses on the identification of those pragmatic features, i.e., communicative phenomena and expressive modalities, that more reliably discriminate between individuals with schizophrenia and healthy controls. We performed a multimodal assessment of communicative-pragmatic ability, and applied a machine learning approach, specifically a Decision Tree model, with the aim of identifying the pragmatic features that best separate the data into the two groups, i.e., individuals with schizophrenia and healthy controls, and represent their configuration. The results indicated good overall performance of the Decision Tree model, with mean Accuracy of 82%, Sensitivity of 76%, and Precision of 91%. Linguistic irony emerged as the most relevant pragmatic phenomenon in distinguishing between the two groups, followed by violation of the Gricean maxims, and then extralinguistic deceitful and sincere communicative acts. The results are discussed in light of the pragmatic theoretical literature, and their clinical relevance in terms of content and design of both assessment and rehabilitative training.


2017 ◽  
Vol 7 (1.1) ◽  
pp. 449
Author(s):  
N Ravikumar ◽  
Dr P. Tamil Selvan

Text categorization with machine learning algorithms generally reckons to possess horizontal set of classes. Several advanced machine learning algorithms have been designed in the past few decades. With the growing research work for text categorization, it has become important to categorize the research outcome and provide the learners with an effective machine learning method, a framework called, Hierarchical Decision Tree and Deep Neural Network (HDT-DNN).It investigates machine learning algorithms to create horizontal set of classes and it is used for classification of text. With this objective, a novel and efficient text categorization framework based on decision tree model is used in order to categorize text according to superior and subordinate level. The text to be categorized is presented in the form of a tree with parent text category being superior to all. The intermediate level represents the text that is both superior and subordinate. Then Deep Neural Network model is presented initiating compositional model, where the text has to be categorized, as a layered integration of primitives from the constructed decision tree model. The extra layers enable composition of features from lower layers, potentially modeling complex text with fewer units than a similarly carried out shallow network producing hierarchical classification. The significance of the impact of HDT-DNN framework is evaluated through empirical study. Extensive experiments are carried out and the performance of HDT-DNN framework is evaluated and compared with existing state-of-art methods using parameters such as precision, classification accuracy, classification time, with respect to varied number of features and document size.


2020 ◽  
Author(s):  
Kaichun Li ◽  
Qiaoyun Wang ◽  
Yanyan Lu ◽  
Xiaorong Pan ◽  
Long Liu ◽  
...  

Abstract Background The aim of this study was to confirm the role of Brachyury in breast cells and to establish and verify whether four types of machine learning models can use Brachyury expression to predict the survival of patients.Methods We conducted a retrospective review of the medical records to obtain patient information, and made the patient's paraffin tissue into tissue chips for staining analysis. We selected a total of 303 patients for research and implemented four machine learning prediction algorithms, including multivariate logistic regression model, decision tree, artificial neural network and random forest, and compared the results of these models with each other. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results.Results The chi-square test results of relevant data suggested that the expression of Brachyury protein in cancer tissues was significantly higher than that in paracancerous tissues (p=0.0335); breast cancer patients with high Brachyury expression had a worse overall survival (OS) compared with patients with low Brachyury expression. We also found that Brachyury expression was associated with ER expression (p=0.0489). Subsequently, we used four machine learning models to verify the relationship between Brachyury expression and the survival of breast cancer patients. The results showed that the decision tree model had the best performance (AUC=0.781).Conclusions Brachyury is highly expressed in breast cancer and indicates that the patient had a poor chance of survival. Compared with conventional statistical methods, decision tree model shows superior performance in predicting the survival status of breast cancer patients. This indicates that machine learning can thus be applied in a wide range of clinical studies.


2021 ◽  
Author(s):  
Kaichun Li ◽  
Qiaoyun Wang ◽  
Yanyan Lu ◽  
Xiaorong Pan ◽  
Long Liu ◽  
...  

Background The aim of this study was to confirm the role of Brachyury in breast cancer and to verify whether four types of machine learning models can use Brachyury expression to predict the survival of patients.</p>  Methods We conducted a retrospective review of the medical records to obtain patient information, and made the patient's paraffin tissue into tissue chips for staining analysis. We selected  303 patients for research and implemented four machine learning algorithms, including multivariate logistic regression model, decision tree, artificial neural network and random forest, and compared the results of these models with each other. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results.</p>  Results The chi-square test results of relevant data suggested that the expression of Brachyury protein in cancer tissues was significantly higher than that in paracancerous tissues (p=0.0335); breast cancer patients with high Brachyury expression had a worse overall survival (OS) compared with patients with low Brachyury expression. We also found that Brachyury expression was associated with ER expression (p=0.0489). Subsequently, we used four machine learning models to verify the relationship between Brachyury expression and the survival of breast cancer patients. The results showed that the decision tree model had the best performance (AUC=0.781).</p>  Conclusions Brachyury is highly expressed in breast cancer and indicates that patients had a poor prognosis. Compared with conventional statistical methods, decision tree model shows superior performance in predicting the survival status of breast cancer patients.


Author(s):  
Cengiz Gazeloğlu ◽  
Zeynep Hande Toyganözü ◽  
Cüneyt Toyganözü ◽  
Murat Kemal Keleş

Wikipedia is a source that has been used at many universities around the world for students to gain some skills and be motivated positively. In higher education, some academicians have a positive view on the teaching usefulness of Wikipedia, and some of them are determined to use classical teaching. In this chapter, teaching use of Wikipedia in all faculty members of the Universitat Oberta de Catalunya are used as data. Then an entropy-based decision tree algorithm was developed. Wikipedia users and non-users are classified according to some aspects with this decision tree. Thus, it can be understood that whether Wikipedia has been used as a teaching tool by academicians or not. So, researchers can have information about the usefulness of Wikipedia in teaching and the intentions in use of it by academicians.


Author(s):  
Avijit Kumar Chaudhuri ◽  
Deepankar Sinha ◽  
Dilip K. Banerjee ◽  
Anirban Das

2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


Sign in / Sign up

Export Citation Format

Share Document