On Using Decision Tree Coverage Criteria forTesting Machine Learning Models

2021 ◽  
Author(s):  
Sebastião Santos ◽  
Beatriz Silveira ◽  
Vinicius Durelli ◽  
Rafael Durelli ◽  
Simone Souza ◽  
...  
2021 ◽  
Vol 10 (1) ◽  
pp. 99
Author(s):  
Sajad Yousefi

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.


2007 ◽  
Vol 16 (06) ◽  
pp. 1001-1014 ◽  
Author(s):  
PANAGIOTIS ZERVAS ◽  
IOSIF MPORAS ◽  
NIKOS FAKOTAKIS ◽  
GEORGE KOKKINAKIS

This paper presents and discusses the problem of emotion recognition from speech signals with the utilization of features bearing intonational information. In particular parameters extracted from Fujisaki's model of intonation are presented and evaluated. Machine learning models were build with the utilization of C4.5 decision tree inducer, instance based learner and Bayesian learning. The datasets utilized for the purpose of training machine learning models were extracted from two emotional databases of acted speech. Experimental results showed the effectiveness of Fujisaki's model attributes since they enhanced the recognition process for most of the emotion categories and learning approaches helping to the segregation of emotion categories.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Basim Mahbooba ◽  
Mohan Timilsina ◽  
Radhya Sahal ◽  
Martin Serrano

Despite the growing popularity of machine learning models in the cyber-security applications (e.g., an intrusion detection system (IDS)), most of these models are perceived as a black-box. The eXplainable Artificial Intelligence (XAI) has become increasingly important to interpret the machine learning models to enhance trust management by allowing human experts to understand the underlying data evidence and causal reasoning. According to IDS, the critical role of trust management is to understand the impact of the malicious data to detect any intrusion in the system. The previous studies focused more on the accuracy of the various classification algorithms for trust in IDS. They do not often provide insights into their behavior and reasoning provided by the sophisticated algorithm. Therefore, in this paper, we have addressed XAI concept to enhance trust management by exploring the decision tree model in the area of IDS. We use simple decision tree algorithms that can be easily read and even resemble a human approach to decision-making by splitting the choice into many small subchoices for IDS. We experimented with this approach by extracting rules in a widely used KDD benchmark dataset. We also compared the accuracy of the decision tree approach with the other state-of-the-art algorithms.


Author(s):  
Pooja Thakkar

Abstract: The focus of this study is on drug categorization utilising Machine Learning models, as well as interpretability utilizing LIME and SHAP to get a thorough understanding of the ML models. To do this, the researchers used machine learning models such as random forest, decision tree, and logistic regression to classify drugs. Then, using LIME and SHAP, they determined if these models were interpretable, which allowed them to better understand their results. It may be stated at the conclusion of this paper that LIME and SHAP can be utilised to get insight into a Machine Learning model and determine which attribute is accountable for the divergence in the outcomes. According to the LIME and SHAP results, it is also discovered that Random Forest and Decision Tree ML models are the best models to employ for drug classification, with Na to K and BP being the most significant characteristics for drug classification. Keywords: Machine Learning, Back-box models, LIME, SHAP, Decision Tree


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Thérence Nibareke ◽  
Jalal Laassiri

Abstract Introduction Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further more, we performed analytics on flight delays. The main contribution of this paper is to give an overview of Big Data tools and machine learning models. We highlight some metrics that allow us to choose a more accurate model. We predict diabetes disease using three machine learning models and then compared their performance. Further more we analyzed flight delay and produced a dashboard which can help managers of flight companies to have a 360° view of their flights and take strategic decisions. Case description We applied three Machine Learning algorithms for predicting diabetes and we compared the performance to see what model give the best results. We performed analytics on flights datasets to help decision making and predict flight delays. Discussion and evaluation The experiment shows that the Linear Regression, Naive Bayesian and Decision Tree give the same accuracy (0.766) but Decision Tree outperforms the two other models with the greatest score (1) and the smallest error (0). For the flight delays analytics, the model could show for example the airport that recorded the most flight delays. Conclusions Several tools and machine learning models to deal with big data analytics have been discussed in this paper. We concluded that for the same datasets, we have to carefully choose the model to use in prediction. In our future works, we will test different models in other fields (climate, banking, insurance.).


2021 ◽  
Vol 14 (3) ◽  
pp. 138
Author(s):  
Fisnik Doko ◽  
Slobodan Kalajdziski ◽  
Igor Mishkovski

Data science and machine-learning techniques help banks to optimize enterprise operations, enhance risk analyses and gain competitive advantage. There is a vast amount of research in credit risk, but to our knowledge, none of them uses credit registry as a data source to model the probability of default for individual clients. The goal of this paper is to evaluate different machine-learning models to create accurate model for credit risk assessment using the data from the real credit registry dataset of the Central Bank of Republic of North Macedonia. We strongly believe that the model developed in this research will be an additional source of valuable information to commercial banks, by leveraging historical data for all the population of the country in all the commercial banks. Thus, in this research, we compare five machine-learning models to classify credit risk data, i.e., logistic regression, decision tree, random forest, support vector machines (SVM) and neural network. We evaluate the five models using different machine-learning metrics, and we propose a model based on credit registry data from the central bank with detailed methodology that can predict the credit risk based on credit history of the population in the country. Our results show that the best accuracy is achieved by using decision tree performing on imbalanced data with and without scaling, followed by random forest and linear regression.


Author(s):  
Vijaylaxmi Kochari

Breast cancer represents one of the dangerous diseases that causes a high number of deaths every year. The dataset containing the features present in the CSV format is used to identify whether the digitalized image is benign or malignant. The machine learning models such as Linear Regression, Decision Tree, Radom Forest are trained with the training dataset and used to classify. The accuracy of these classifiers is compared to get the best model. This will help the doctors to give proper treatment at the initial stage and save their lives.


2020 ◽  
Author(s):  
Carlos Pedro Gonçalves ◽  
José Rouco

AbstractWe compare the performance of major decision tree-based ensemble machine learning models on the task of COVID-19 death probability prediction, conditional on three risk factors: age group, sex and underlying comorbidity or disease, using the US Centers for Disease Control and Prevention (CDC)’s COVID-19 case surveillance dataset. To evaluate the impact of the three risk factors on COVID-19 death probability, we extract and analyze the conditional probability profile produced by the best performer. The results show the presence of an exponential rise in death probability from COVID-19 with the age group, with males exhibiting a higher exponential growth rate than females, an effect that is stronger when an underlying comorbidity or disease is present, which also acts as an accelerator of COVID-19 death probability rise for both male and female subjects. The results are discussed in connection to healthcare and epidemiological concerns and in the degree to which they reinforce findings coming from other studies on COVID-19.


2020 ◽  
Author(s):  
Kaichun Li ◽  
Qiaoyun Wang ◽  
Yanyan Lu ◽  
Xiaorong Pan ◽  
Long Liu ◽  
...  

Abstract Background The aim of this study was to confirm the role of Brachyury in breast cells and to establish and verify whether four types of machine learning models can use Brachyury expression to predict the survival of patients.Methods We conducted a retrospective review of the medical records to obtain patient information, and made the patient's paraffin tissue into tissue chips for staining analysis. We selected a total of 303 patients for research and implemented four machine learning prediction algorithms, including multivariate logistic regression model, decision tree, artificial neural network and random forest, and compared the results of these models with each other. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results.Results The chi-square test results of relevant data suggested that the expression of Brachyury protein in cancer tissues was significantly higher than that in paracancerous tissues (p=0.0335); breast cancer patients with high Brachyury expression had a worse overall survival (OS) compared with patients with low Brachyury expression. We also found that Brachyury expression was associated with ER expression (p=0.0489). Subsequently, we used four machine learning models to verify the relationship between Brachyury expression and the survival of breast cancer patients. The results showed that the decision tree model had the best performance (AUC=0.781).Conclusions Brachyury is highly expressed in breast cancer and indicates that the patient had a poor chance of survival. Compared with conventional statistical methods, decision tree model shows superior performance in predicting the survival status of breast cancer patients. This indicates that machine learning can thus be applied in a wide range of clinical studies.


Sign in / Sign up

Export Citation Format

Share Document