Explainable Artificial Intelligence (XAI) to Enhance Trust Management in Intrusion Detection Systems Using Decision Tree Model

Despite the growing popularity of machine learning models in the cyber-security applications (e.g., an intrusion detection system (IDS)), most of these models are perceived as a black-box. The eXplainable Artificial Intelligence (XAI) has become increasingly important to interpret the machine learning models to enhance trust management by allowing human experts to understand the underlying data evidence and causal reasoning. According to IDS, the critical role of trust management is to understand the impact of the malicious data to detect any intrusion in the system. The previous studies focused more on the accuracy of the various classification algorithms for trust in IDS. They do not often provide insights into their behavior and reasoning provided by the sophisticated algorithm. Therefore, in this paper, we have addressed XAI concept to enhance trust management by exploring the decision tree model in the area of IDS. We use simple decision tree algorithms that can be easily read and even resemble a human approach to decision-making by splitting the choice into many small subchoices for IDS. We experimented with this approach by extracting rules in a widely used KDD benchmark dataset. We also compared the accuracy of the decision tree approach with the other state-of-the-art algorithms.

Download Full-text

Machine Learning based Tissue Analysis Reveals Brachyury has a Diagnosis Value in Breast Cancer

10.21203/rs.3.rs-60846/v1 ◽

2020 ◽

Author(s):

Kaichun Li ◽

Qiaoyun Wang ◽

Yanyan Lu ◽

Xiaorong Pan ◽

Long Liu ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Decision Tree ◽

Cancer Patients ◽

Decision Tree Model ◽

Superior Performance ◽

Breast Cancer Patients ◽

Learning Models ◽

Tree Model ◽

Machine Learning Models

Abstract Background The aim of this study was to confirm the role of Brachyury in breast cells and to establish and verify whether four types of machine learning models can use Brachyury expression to predict the survival of patients.Methods We conducted a retrospective review of the medical records to obtain patient information, and made the patient's paraffin tissue into tissue chips for staining analysis. We selected a total of 303 patients for research and implemented four machine learning prediction algorithms, including multivariate logistic regression model, decision tree, artificial neural network and random forest, and compared the results of these models with each other. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results.Results The chi-square test results of relevant data suggested that the expression of Brachyury protein in cancer tissues was significantly higher than that in paracancerous tissues (p=0.0335); breast cancer patients with high Brachyury expression had a worse overall survival (OS) compared with patients with low Brachyury expression. We also found that Brachyury expression was associated with ER expression (p=0.0489). Subsequently, we used four machine learning models to verify the relationship between Brachyury expression and the survival of breast cancer patients. The results showed that the decision tree model had the best performance (AUC=0.781).Conclusions Brachyury is highly expressed in breast cancer and indicates that the patient had a poor chance of survival. Compared with conventional statistical methods, decision tree model shows superior performance in predicting the survival status of breast cancer patients. This indicates that machine learning can thus be applied in a wide range of clinical studies.

Download Full-text

Machine learning based tissue analysis reveals Brachyury has a diagnosis value in breast cancer

Bioscience Reports ◽

10.1042/bsr20203391 ◽

2021 ◽

Author(s):

Kaichun Li ◽

Qiaoyun Wang ◽

Yanyan Lu ◽

Xiaorong Pan ◽

Long Liu ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Decision Tree ◽

Cancer Patients ◽

Decision Tree Model ◽

Superior Performance ◽

Breast Cancer Patients ◽

Learning Models ◽

Tree Model ◽

Machine Learning Models

Background The aim of this study was to confirm the role of Brachyury in breast cancer and to verify whether four types of machine learning models can use Brachyury expression to predict the survival of patients.</p>  Methods We conducted a retrospective review of the medical records to obtain patient information, and made the patient's paraffin tissue into tissue chips for staining analysis. We selected 303 patients for research and implemented four machine learning algorithms, including multivariate logistic regression model, decision tree, artificial neural network and random forest, and compared the results of these models with each other. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results.</p>  Results The chi-square test results of relevant data suggested that the expression of Brachyury protein in cancer tissues was significantly higher than that in paracancerous tissues (p=0.0335); breast cancer patients with high Brachyury expression had a worse overall survival (OS) compared with patients with low Brachyury expression. We also found that Brachyury expression was associated with ER expression (p=0.0489). Subsequently, we used four machine learning models to verify the relationship between Brachyury expression and the survival of breast cancer patients. The results showed that the decision tree model had the best performance (AUC=0.781).</p>  Conclusions Brachyury is highly expressed in breast cancer and indicates that patients had a poor prognosis. Compared with conventional statistical methods, decision tree model shows superior performance in predicting the survival status of breast cancer patients.

Download Full-text

XHAC

10.4018/978-1-7998-4186-9.ch008 ◽

2022 ◽

pp. 146-164

Author(s):

Duygu Bagci Das ◽

Derya Birant

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Internet Of Things ◽

Window Size ◽

Classification Performance ◽

Black Box ◽

Data Exploration ◽

Learning Models ◽

Explainable Artificial Intelligence ◽

Machine Learning Models

Explainable artificial intelligence (XAI) is a concept that has emerged and become popular in recent years. Even interpretation in machine learning models has been drawing attention. Human activity classification (HAC) systems still lack interpretable approaches. In this study, an approach, called eXplainable HAC (XHAC), was proposed in which the data exploration, model structure explanation, and prediction explanation of the ML classifiers for HAR were examined to improve the explainability of the HAR models' components such as sensor types and their locations. For this purpose, various internet of things (IoT) sensors were considered individually, including accelerometer, gyroscope, and magnetometer. The location of these sensors (i.e., ankle, arm, and chest) was also taken into account. The important features were explored. In addition, the effect of the window size on the classification performance was investigated. According to the obtained results, the proposed approach makes the HAC processes more explainable compared to the black-box ML techniques.

Download Full-text

Explainable AI: A Review of Machine Learning Interpretability Methods

Entropy ◽

10.3390/e23010018 ◽

2020 ◽

Vol 23 (1) ◽

pp. 18

Author(s):

Pantelis Linardatos ◽

Vasilis Papastefanopoulos ◽

Sotiris Kotsiantis

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Black Box ◽

Learning Systems ◽

Model Complexity ◽

Learning Models ◽

New Methods ◽

Industrial Adoption ◽

Machine Learning Models ◽

The Way

Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption, with machine learning systems demonstrating superhuman performance in a significant number of tasks. However, this surge in performance, has often been achieved through increased model complexity, turning such systems into “black box” approaches and causing uncertainty regarding the way they operate and, ultimately, the way that they come to decisions. This ambiguity has made it problematic for machine learning systems to be adopted in sensitive yet critical domains, where their value could be immense, such as healthcare. As a result, scientific interest in the field of Explainable Artificial Intelligence (XAI), a field that is concerned with the development of new methods that explain and interpret machine learning models, has been tremendously reignited over recent years. This study focuses on machine learning interpretability methods; more specifically, a literature review and taxonomy of these methods are presented, as well as links to their programming implementations, in the hope that this survey would serve as a reference point for both theorists and practitioners.

Download Full-text

On Using Decision Tree Coverage Criteria forTesting Machine Learning Models

10.1145/3482909.3482911 ◽

2021 ◽

Author(s):

Sebastião Santos ◽

Beatriz Silveira ◽

Vinicius Durelli ◽

Rafael Durelli ◽

Simone Souza ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Learning Models ◽

Coverage Criteria ◽

Tree Coverage ◽

Machine Learning Models

Download Full-text

A Survey on Data-driven Network Intrusion Detection

ACM Computing Surveys ◽

10.1145/3472753 ◽

2022 ◽

Vol 54 (9) ◽

pp. 1-36

Author(s):

Dylan Chou ◽

Meng Jiang

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Real World ◽

Data Driven ◽

Network Intrusion Detection ◽

Large Network ◽

Learning Models ◽

Simulated Environments ◽

Network Intrusion ◽

Machine Learning Models

Data-driven network intrusion detection (NID) has a tendency towards minority attack classes compared to normal traffic. Many datasets are collected in simulated environments rather than real-world networks. These challenges undermine the performance of intrusion detection machine learning models by fitting machine learning models to unrepresentative “sandbox” datasets. This survey presents a taxonomy with eight main challenges and explores common datasets from 1999 to 2020. Trends are analyzed on the challenges in the past decade and future directions are proposed on expanding NID into cloud-based environments, devising scalable models for large network data, and creating labeled datasets collected in real-world networks.

Download Full-text

A Sensitivity Analysis of Poisoning and Evasion Attacks in Network Intrusion Detection System Machine Learning Models

10.1109/milcom52596.2021.9652959 ◽

2021 ◽

Author(s):

Kevin Talty ◽

John Stockdale ◽

Nathaniel D. Bastian

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Network Intrusion Detection ◽

Learning Models ◽

Network Intrusion ◽

Network Intrusion Detection System ◽

Machine Learning Models

Download Full-text

Comparison of the Performance of Machine Learning Algorithms in Predicting Heart Disease

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.349 ◽

2021 ◽

Vol 10 (1) ◽

pp. 99

Author(s):

Sajad Yousefi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Decision Tree ◽

Roc Curve ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Learning Models ◽

Algorithm Performance ◽

Machine Learning Models

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.

Download Full-text

treeheatr: an R package for interpretable decision tree visualizations

10.1101/2020.07.10.196352 ◽

2020 ◽

Author(s):

Trang T. Le ◽

Jason H. Moore

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Feature Space ◽

R Package ◽

Tree Structure ◽

Decision Tree Model ◽

Teaching Tool ◽

Tree Model ◽

Machine Learning Methods ◽

Link Type

AbstractSummarytreeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods.AvailabilityThe treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous [email protected]

Download Full-text