treeheatr: an R package for interpretable decision tree visualizations

Mapping Intimacies ◽

10.1101/2020.07.10.196352 ◽

2020 ◽

Author(s):

Trang T. Le ◽

Jason H. Moore

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Feature Space ◽

R Package ◽

Tree Structure ◽

Decision Tree Model ◽

Teaching Tool ◽

Tree Model ◽

Machine Learning Methods ◽

Link Type

AbstractSummarytreeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods.AvailabilityThe treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous [email protected]

Download Full-text

treeheatr: an R package for interpretable decision tree visualizations

Bioinformatics ◽

10.1093/bioinformatics/btaa662 ◽

2020 ◽

Author(s):

Trang T Le ◽

Jason H Moore

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Feature Space ◽

R Package ◽

Tree Structure ◽

Decision Tree Model ◽

Teaching Tool ◽

Tree Model ◽

Continuous Integration ◽

Machine Learning Methods

Abstract Summary treeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods. Availability and implementation The treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous integration.

Download Full-text

Explainable Artificial Intelligence (XAI) to Enhance Trust Management in Intrusion Detection Systems Using Decision Tree Model

Complexity ◽

10.1155/2021/6634811 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Basim Mahbooba ◽

Mohan Timilsina ◽

Radhya Sahal ◽

Martin Serrano

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Intrusion Detection ◽

Decision Tree ◽

Trust Management ◽

Decision Tree Model ◽

Learning Models ◽

Tree Model ◽

Explainable Artificial Intelligence ◽

Machine Learning Models

Despite the growing popularity of machine learning models in the cyber-security applications (e.g., an intrusion detection system (IDS)), most of these models are perceived as a black-box. The eXplainable Artificial Intelligence (XAI) has become increasingly important to interpret the machine learning models to enhance trust management by allowing human experts to understand the underlying data evidence and causal reasoning. According to IDS, the critical role of trust management is to understand the impact of the malicious data to detect any intrusion in the system. The previous studies focused more on the accuracy of the various classification algorithms for trust in IDS. They do not often provide insights into their behavior and reasoning provided by the sophisticated algorithm. Therefore, in this paper, we have addressed XAI concept to enhance trust management by exploring the decision tree model in the area of IDS. We use simple decision tree algorithms that can be easily read and even resemble a human approach to decision-making by splitting the choice into many small subchoices for IDS. We experimented with this approach by extracting rules in a widely used KDD benchmark dataset. We also compared the accuracy of the decision tree approach with the other state-of-the-art algorithms.

Download Full-text

The decision tree for longer-stay hotel guest: the relationship between hotel booking determinants and geographical distance

International Journal of Contemporary Hospitality Management ◽

10.1108/ijchm-06-2020-0594 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yejin Lee ◽

Dae-Young Kim

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Learning Algorithm ◽

Tourism Industry ◽

Decision Tree Model ◽

Geographical Distance ◽

Tree Model ◽

Content Type ◽

Hotel Booking ◽

The Relationship

Purpose Using the decision tree model, this study aims to understand the online travelers booking behaviors on Expedia.com, by examining influential determinants of online hotel booking, especially for longer-stay travelers. The geographical distance is also considered in understanding the booking behaviors trisecting travel destinations (i.e. Americas, Europe and Asia). Design/methodology/approach The data were obtained from American Statistical Association DataFest and Expedia.com. Based on the US travelers who made hotel reservation on the website, the study used a machine learning algorithm, decision tree, to analyze the influential determinants on hotel booking considering the geographical distance between origin and destination. Findings The results of the findings demonstrate that the choice of package product is the prioritized determinant for longer-stay hotel guests. Several similarities and differences were found from the significant determinants of the decision tree, in accordance with the geographic distance among the Americas, Europe and Asia. Research limitations/implications This paper presents the extension to an existing machine learning environment, and especially to the decision tree model. The findings are anticipated to expand the understanding of online hotel booking and apprehend the influential determinants toward consumers’ decision-making process regarding the relationship between geographical distance and traveler’s hotel staying duration. Originality/value This research brings a meaningful understanding of the hospitality and tourism industry, especially to the realm of machine learning adapted to an online booking website. It provides a unique approach to comprehend and forecast consumer behavior with data mining.

Download Full-text

Multimodal assessment of communicative-pragmatic features in schizophrenia: a machine learning approach

npj Schizophrenia ◽

10.1038/s41537-021-00153-4 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Alberto Parola ◽

Ilaria Gabbatore ◽

Laura Berardinelli ◽

Rogerio Salvini ◽

Francesca M. Bosco

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Decision Tree Model ◽

Learning Approach ◽

Healthy Controls ◽

Tree Model ◽

Communicative Acts ◽

Multimodal Assessment ◽

Machine Learning Approach ◽

Tone Of Voice

AbstractAn impairment in pragmatic communication is a core feature of schizophrenia, often associated with difficulties in social interactions. The pragmatic deficits regard various pragmatic phenomena, e.g., direct and indirect communicative acts, deceit, irony, and include not only the use of language but also other expressive means such as non-verbal/extralinguistic modalities, e.g., gestures and body movements, and paralinguistic cues, e.g., prosody and tone of voice. The present paper focuses on the identification of those pragmatic features, i.e., communicative phenomena and expressive modalities, that more reliably discriminate between individuals with schizophrenia and healthy controls. We performed a multimodal assessment of communicative-pragmatic ability, and applied a machine learning approach, specifically a Decision Tree model, with the aim of identifying the pragmatic features that best separate the data into the two groups, i.e., individuals with schizophrenia and healthy controls, and represent their configuration. The results indicated good overall performance of the Decision Tree model, with mean Accuracy of 82%, Sensitivity of 76%, and Precision of 91%. Linguistic irony emerged as the most relevant pragmatic phenomenon in distinguishing between the two groups, followed by violation of the Gricean maxims, and then extralinguistic deceitful and sincere communicative acts. The results are discussed in light of the pragmatic theoretical literature, and their clinical relevance in terms of content and design of both assessment and rehabilitative training.

Download Full-text

Index split decision tree and compositional deep neural network for text categorization

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.1.9953 ◽

2017 ◽

Vol 7 (1.1) ◽

pp. 449

Author(s):

N Ravikumar ◽

Dr P. Tamil Selvan

Keyword(s):

Neural Network ◽

Machine Learning ◽

Decision Tree ◽

Text Categorization ◽

Deep Neural Network ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Decision Tree Model ◽

Tree Model ◽

The Impact

Text categorization with machine learning algorithms generally reckons to possess horizontal set of classes. Several advanced machine learning algorithms have been designed in the past few decades. With the growing research work for text categorization, it has become important to categorize the research outcome and provide the learners with an effective machine learning method, a framework called, Hierarchical Decision Tree and Deep Neural Network (HDT-DNN).It investigates machine learning algorithms to create horizontal set of classes and it is used for classification of text. With this objective, a novel and efficient text categorization framework based on decision tree model is used in order to categorize text according to superior and subordinate level. The text to be categorized is presented in the form of a tree with parent text category being superior to all. The intermediate level represents the text that is both superior and subordinate. Then Deep Neural Network model is presented initiating compositional model, where the text has to be categorized, as a layered integration of primitives from the constructed decision tree model. The extra layers enable composition of features from lower layers, potentially modeling complex text with fewer units than a similarly carried out shallow network producing hierarchical classification. The significance of the impact of HDT-DNN framework is evaluated through empirical study. Extensive experiments are carried out and the performance of HDT-DNN framework is evaluated and compared with existing state-of-art methods using parameters such as precision, classification accuracy, classification time, with respect to varied number of features and document size.

Download Full-text

Machine Learning based Tissue Analysis Reveals Brachyury has a Diagnosis Value in Breast Cancer

10.21203/rs.3.rs-60846/v1 ◽

2020 ◽

Author(s):

Kaichun Li ◽

Qiaoyun Wang ◽

Yanyan Lu ◽

Xiaorong Pan ◽

Long Liu ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Decision Tree ◽

Cancer Patients ◽

Decision Tree Model ◽

Superior Performance ◽

Breast Cancer Patients ◽

Learning Models ◽

Tree Model ◽

Machine Learning Models

Abstract Background The aim of this study was to confirm the role of Brachyury in breast cells and to establish and verify whether four types of machine learning models can use Brachyury expression to predict the survival of patients.Methods We conducted a retrospective review of the medical records to obtain patient information, and made the patient's paraffin tissue into tissue chips for staining analysis. We selected a total of 303 patients for research and implemented four machine learning prediction algorithms, including multivariate logistic regression model, decision tree, artificial neural network and random forest, and compared the results of these models with each other. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results.Results The chi-square test results of relevant data suggested that the expression of Brachyury protein in cancer tissues was significantly higher than that in paracancerous tissues (p=0.0335); breast cancer patients with high Brachyury expression had a worse overall survival (OS) compared with patients with low Brachyury expression. We also found that Brachyury expression was associated with ER expression (p=0.0489). Subsequently, we used four machine learning models to verify the relationship between Brachyury expression and the survival of breast cancer patients. The results showed that the decision tree model had the best performance (AUC=0.781).Conclusions Brachyury is highly expressed in breast cancer and indicates that the patient had a poor chance of survival. Compared with conventional statistical methods, decision tree model shows superior performance in predicting the survival status of breast cancer patients. This indicates that machine learning can thus be applied in a wide range of clinical studies.

Download Full-text

Machine learning based tissue analysis reveals Brachyury has a diagnosis value in breast cancer

Bioscience Reports ◽

10.1042/bsr20203391 ◽

2021 ◽

Author(s):

Kaichun Li ◽

Qiaoyun Wang ◽

Yanyan Lu ◽

Xiaorong Pan ◽

Long Liu ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Decision Tree ◽

Cancer Patients ◽

Decision Tree Model ◽

Superior Performance ◽

Breast Cancer Patients ◽

Learning Models ◽

Tree Model ◽

Machine Learning Models

Background The aim of this study was to confirm the role of Brachyury in breast cancer and to verify whether four types of machine learning models can use Brachyury expression to predict the survival of patients.</p>  Methods We conducted a retrospective review of the medical records to obtain patient information, and made the patient's paraffin tissue into tissue chips for staining analysis. We selected 303 patients for research and implemented four machine learning algorithms, including multivariate logistic regression model, decision tree, artificial neural network and random forest, and compared the results of these models with each other. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results.</p>  Results The chi-square test results of relevant data suggested that the expression of Brachyury protein in cancer tissues was significantly higher than that in paracancerous tissues (p=0.0335); breast cancer patients with high Brachyury expression had a worse overall survival (OS) compared with patients with low Brachyury expression. We also found that Brachyury expression was associated with ER expression (p=0.0489). Subsequently, we used four machine learning models to verify the relationship between Brachyury expression and the survival of breast cancer patients. The results showed that the decision tree model had the best performance (AUC=0.781).</p>  Conclusions Brachyury is highly expressed in breast cancer and indicates that patients had a poor prognosis. Compared with conventional statistical methods, decision tree model shows superior performance in predicting the survival status of breast cancer patients.

Download Full-text

Classification of the Usage of Wikipedia as a Tool of Teaching in Higher Education With Decision Tree Model

Multi-Criteria Decision-Making Models for Website Evaluation - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-8238-0.ch003 ◽

2019 ◽

pp. 44-63

Author(s):

Cengiz Gazeloğlu ◽

Zeynep Hande Toyganözü ◽

Cüneyt Toyganözü ◽

Murat Kemal Keleş

Keyword(s):

Higher Education ◽

Decision Tree ◽

Faculty Members ◽

Decision Tree Model ◽

Teaching Tool ◽

Tree Model ◽

Positive View ◽

The World ◽

Teaching In Higher Education

Wikipedia is a source that has been used at many universities around the world for students to gain some skills and be motivated positively. In higher education, some academicians have a positive view on the teaching usefulness of Wikipedia, and some of them are determined to use classical teaching. In this chapter, teaching use of Wikipedia in all faculty members of the Universitat Oberta de Catalunya are used as data. Then an entropy-based decision tree algorithm was developed. Wikipedia users and non-users are classified according to some aspects with this decision tree. Thus, it can be understood that whether Wikipedia has been used as a teaching tool by academicians or not. So, researchers can have information about the usefulness of Wikipedia in teaching and the intentions in use of it by academicians.

Download Full-text

A novel enhanced decision tree model for detecting chronic kidney disease

Network Modeling Analysis in Health Informatics and Bioinformatics ◽

10.1007/s13721-021-00302-w ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Avijit Kumar Chaudhuri ◽

Deepankar Sinha ◽

Dilip K. Banerjee ◽

Anirban Das

Keyword(s):

Chronic Kidney Disease ◽

Kidney Disease ◽

Decision Tree ◽

Decision Tree Model ◽

Tree Model

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

Innovation in Aging ◽

10.1093/geroni/igaa057.859 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 268-269

Author(s):

Jaime Speiser ◽

Kathryn Callahan ◽

Jason Fanning ◽

Thomas Gill ◽

Anne Newman ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Receiver Operating Curve ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text