Predicting Metabolic Syndrome With Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study (Preprint)

Mapping Intimacies ◽

10.2196/preprints.17110 ◽

2019 ◽

Author(s):

Cheng-Sheng Yu ◽

Yu-Jiun Lin ◽

Chang-Hsien Lin ◽

Sen-Te Wang ◽

Shiyng-Yu Lin ◽

...

Keyword(s):

Machine Learning ◽

Metabolic Syndrome ◽

Logistic Regression ◽

Decision Tree ◽

Characteristic Curve ◽

Significant Risk ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Health Examination ◽

Multivariate Logistic Regression

BACKGROUND Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling. OBJECTIVE We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in self-paid health examination subjects who were examined with FibroScan. METHODS Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables. RESULTS Obesity, serum glutamic-oxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively. CONCLUSIONS Machine learning technology facilitates the identification of metabolic syndrome in self-paid health examination subjects with high accuracy.

Download Full-text

Predicting Metabolic Syndrome With Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study

JMIR Medical Informatics ◽

10.2196/17110 ◽

2020 ◽

Vol 8 (3) ◽

pp. e17110 ◽

Cited By ~ 3

Author(s):

Cheng-Sheng Yu ◽

Yu-Jiun Lin ◽

Chang-Hsien Lin ◽

Sen-Te Wang ◽

Shiyng-Yu Lin ◽

...

Keyword(s):

Machine Learning ◽

Metabolic Syndrome ◽

Logistic Regression ◽

Decision Tree ◽

Characteristic Curve ◽

Significant Risk ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Health Examination ◽

Multivariate Logistic Regression

Background Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling. Objective We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in self-paid health examination subjects who were examined with FibroScan. Methods Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables. Results Obesity, serum glutamic-oxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively. Conclusions Machine learning technology facilitates the identification of metabolic syndrome in self-paid health examination subjects with high accuracy.

Download Full-text

Comparison of the Performance of Machine Learning Algorithms in Predicting Heart Disease

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.349 ◽

2021 ◽

Vol 10 (1) ◽

pp. 99

Author(s):

Sajad Yousefi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Decision Tree ◽

Roc Curve ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Learning Models ◽

Algorithm Performance ◽

Machine Learning Models

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.

Download Full-text

Machine Learning Techniques Applied to Profile Mobile Banking Users in India

International Journal of Information Systems in the Service Sector ◽

10.4018/jisss.2013010105 ◽

2013 ◽

Vol 5 (1) ◽

pp. 82-92 ◽

Cited By ~ 8

Author(s):

M. Carr ◽

V. Ravi ◽

G. Sridharan Reddy ◽

D. Veranna

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Decision Trees ◽

Multilayer Perceptron ◽

Machine Learning Techniques ◽

Mobile Banking ◽

Classification Rules ◽

Learning Techniques ◽

Potential Customers

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.

Download Full-text

Comparison of Support Vector Machine, Bayesian Logistic Regression, and Alternating Decision Tree Algorithms for Shallow Landslide Susceptibility Mapping along a Mountainous Road in the West of Iran

Applied Sciences ◽

10.3390/app10155047 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5047 ◽

Cited By ~ 7

Author(s):

Viet-Ha Nhu ◽

Danesh Zandi ◽

Himan Shahabi ◽

Kamran Chapi ◽

Ataollah Shirzadi ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Decision Tree ◽

Shallow Landslide ◽

Machine Learning Algorithms ◽

Support Vector ◽

Svm Algorithm ◽

Alternating Decision Tree ◽

Bayesian Logistic Regression

This paper aims to apply and compare the performance of the three machine learning algorithms–support vector machine (SVM), bayesian logistic regression (BLR), and alternating decision tree (ADTree)–to map landslide susceptibility along the mountainous road of the Salavat Abad saddle, Kurdistan province, Iran. We identified 66 shallow landslide locations, based on field surveys, by recording the locations of the landslides by a global position System (GPS), Google Earth imagery and black-and-white aerial photographs (scale 1: 20,000) and 19 landslide conditioning factors, then tested these factors using the information gain ratio (IGR) technique. We checked the validity of the models using statistical metrics, including sensitivity, specificity, accuracy, kappa, root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC). We found that, although all three machine learning algorithms yielded excellent performance, the SVM algorithm (AUC = 0.984) slightly outperformed the BLR (AUC = 0.980), and ADTree (AUC = 0.977) algorithms. We observed that not only all three algorithms are useful and effective tools for identifying shallow landslide-prone areas but also the BLR algorithm can be used such as the SVM algorithm as a soft computing benchmark algorithm to check the performance of the models in future.

Download Full-text

Performance Evaluation of Machine Learning Techniques for Identifying Forged and Phony Uniform Resource Locators (URLs)

Nigerian Journal of Technological Development ◽

10.4314/njtd.v16i4.2 ◽

2019 ◽

Vol 16 (4) ◽

pp. 155-169

Author(s):

N. A. Azeez ◽

A. A. Ajayi

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Financial Institutions ◽

Traditional Approach ◽

Learning Algorithms ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Reliable Solution ◽

F Measure

Since the invention of Information and Communication Technology (ICT), there has been a great shift from the erstwhile traditional approach of handling information across the globe to the usage of this innovation. The application of this initiative cut across almost all areas of human endeavours. ICT is widely utilized in education and production sectors as well as in various financial institutions. It is of note that many people are using it genuinely to carry out their day to day activities while others are using it to perform nefarious activities at the detriment of other cyber users. According to several reports which are discussed in the introductory part of this work, millions of people have become victims of fake Uniform Resource Locators (URLs) sent to their mails by spammers. Financial institutions are not left out in the monumental loss recorded through this illicit act over the years. It is worth mentioning that, despite several approaches currently in place, none could confidently be confirmed to provide the best and reliable solution. According to several research findings reported in the literature, researchers have demonstrated how machine learning algorithms could be employed to verify and confirm compromised and fake URLs in the cyberspace. Inconsistencies have however been noticed in the researchers’ findings and also their corresponding results are not dependable based on the values obtained and conclusions drawn from them. Against this backdrop, the authors carried out a comparative analysis of three learning algorithms (Naïve Bayes, Decision Tree and Logistics Regression Model) for verification of compromised, suspicious and fake URLs and determine which is the best of all based on the metrics (F-Measure, Precision and Recall) used for evaluation. Based on the confusion metrics measurement, the result obtained shows that the Decision Tree (ID3) algorithm achieves the highest values for recall, precision and f-measure. It unarguably provides efficient and credible means of maximizing the detection of compromised and malicious URLs. Finally, for future work, authors are of the opinion that two or more supervised learning algorithms can be hybridized to form a single effective and more efficient algorithm for fake URLs verification.Keywords: Learning-algorithms, Forged-URL, Phoney-URL, performance-comparison

Download Full-text

Predicting Forest Fires using Supervised and Ensemble Machine Learning Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2878.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 3697-3705 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Forest Fires ◽

Principal Component ◽

Climatic Conditions ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Physical Factors

Forest fires have become one of the most frequently occurring disasters in recent years. The effects of forest fires have a lasting impact on the environment as it lead to deforestation and global warming, which is also one of its major cause of occurrence. Forest fires are dealt by collecting the satellite images of forest and if there is any emergency caused by the fires then the authorities are notified to mitigate its effects. By the time the authorities get to know about it, the fires would have already caused a lot of damage. Data mining and machine learning techniques can provide an efficient prevention approach where data associated with forests can be used for predicting the eventuality of forest fires. This paper uses the dataset present in the UCI machine learning repository which consists of physical factors and climatic conditions of the Montesinho park situated in Portugal. Various algorithms like Logistic regression, Support Vector Machine, Random forest, K-Nearest neighbors in addition to Bagging and Boosting predictors are used, both with and without Principal Component Analysis (PCA). Among the models in which PCA was applied, Logistic Regression gave the highest F-1 score of 68.26 and among the models where PCA was absent, Gradient boosting gave the highest score of 68.36.

Download Full-text

Decision Tree Analysis: A Retrospective Analysis of Postoperative Recurrence of Adhesions in Patients with Moderate-to-Severe Intrauterine

BioMed Research International ◽

10.1155/2019/7391965 ◽

2019 ◽

Vol 2019 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Ru Zhu ◽

Hua Duan ◽

Sha Wang ◽

Lu Gan ◽

Qian Xu ◽

...

Keyword(s):

Logistic Regression ◽

Decision Tree ◽

Endometrial Thickness ◽

Characteristic Curve ◽

Multivariate Logistic Regression Analysis ◽

Decision Tree Model ◽

Classification And Regression Tree ◽

Multivariate Logistic Regression Model ◽

Multivariate Logistic Regression ◽

Tree Model

Objective. To establish and validate a decision tree model to predict the recurrence of intrauterine adhesions (IUAs) in patients after separation of moderate-to-severe IUAs. Design. A retrospective study. Setting. A tertiary hysteroscopic center at a teaching hospital. Population. Patients were retrospectively selected who had undergone hysteroscopic adhesion separation surgery for treatment of moderate-to-severe IUAs. Interventions. Hysteroscopic adhesion separation surgery and second-look hysteroscopy 3 months later. Measurements and Main Results. Patients’ demographics, clinical indicators, and hysteroscopy data were collected from the electronic database of the hospital. The patients were randomly apportioned to either a training or testing set (332 and 142 patients, respectively). A decision tree model of adhesion recurrence was established with a classification and regression tree algorithm and validated with reference to a multivariate logistic regression model. The decision tree model was constructed based on the training set. The classification node variables were the risk factors for recurrence of IUAs: American Fertility Society score (root node variable), isolation barrier, endometrial thickness, tubal opening, uterine volume, and menstrual volume. The accuracies of the decision tree model and multivariate logistic regression analysis model were 75.35% and 76.06%, respectively, and areas under the receiver operating characteristic curve were 0.763 (95% CI 0.681–0.846) and 0.785 (95% CI 0.702–0.868). Conclusions. The decision tree model can readily predict the recurrence of IUAs and provides a new theoretical basis upon which clinicians can make appropriate clinical decisions.

Download Full-text

Spatial–Temporal Analysis of Land Cover Change at the Bento Rodrigues Dam Disaster Area Using Machine Learning Techniques

Remote Sensing ◽

10.3390/rs11212548 ◽

2019 ◽

Vol 11 (21) ◽

pp. 2548

Author(s):

Dong Luo ◽

Douglas G. Goodin ◽

Marcellus M. Caldas

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Land Cover ◽

Decision Tree ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Disaster Area ◽

Mine Sites

Disasters are an unpredictable way to change land use and land cover. Improving the accuracy of mapping a disaster area at different time is an essential step to analyze the relationship between human activity and environment. The goals of this study were to test the performance of different processing procedures and examine the effect of adding normalized difference vegetation index (NDVI) as an additional classification feature for mapping land cover changes due to a disaster. Using Landsat ETM+ and OLI images of the Bento Rodrigues mine tailing disaster area, we created two datasets, one with six bands, and the other one with six bands plus the NDVI. We used support vector machine (SVM) and decision tree (DT) algorithms to build classifier models and validated models performance using 10-fold cross-validation, resulting in accuracies higher than 90%. The processed results indicated that the accuracy could reach or exceed 80%, and the support vector machine had a better performance than the decision tree. We also calculated each land cover type’s sensitivity (true positive rate) and found that Agriculture, Forest and Mine sites had higher values but Bareland and Water had lower values. Then, we visualized land cover maps in 2000 and 2017 and found out the Mine sites areas have been expanded about twice of the size, but Forest decreased 12.43%. Our findings showed that it is feasible to create a training data pool and use machine learning algorithms to classify a different year’s Landsat products and NDVI can improve the vegetation covered land classification. Furthermore, this approach can provide a venue to analyze land pattern change in a disaster area over time.

Download Full-text

Decision Tree Ensembles to Predict Coronavirus Disease 2019 Infection: A Comparative Study

Complexity ◽

10.1155/2021/5550344 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Amir Ahmad ◽

Ourooj Safi ◽

Sharaf Malebary ◽

Sami Alesawi ◽

Entisar Alkayal

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Characteristic Curve ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Polymerase Chain Reaction Test ◽

Imbalanced Datasets ◽

Polymerase Chain ◽

Chain Reaction Test ◽

Selection Of

The coronavirus disease 2019 (Covid-19) pandemic has affected most countries of the world. The detection of Covid-19 positive cases is an important step to fight the pandemic and save human lives. The polymerase chain reaction test is the most used method to detect Covid-19 positive cases. Various molecular methods and serological methods have also been explored to detect Covid-19 positive cases. Machine learning algorithms have been applied to various kinds of datasets to predict Covid-19 positive cases. The machine learning algorithms were applied on a Covid-19 dataset based on commonly taken laboratory tests to predict Covid-19 positive cases. These types of datasets are easy to collect. The paper investigates the application of decision tree ensembles which are accurate and robust to the selection of parameters. As there is an imbalance between the number of positive cases and the number of negative cases, decision tree ensembles developed for imbalanced datasets are applied. F-measure, precision, recall, area under the precision-recall curve, and area under the receiver operating characteristic curve are used to compare different decision tree ensembles. Different performance measures suggest that decision tree ensembles developed for imbalanced datasets perform better. Results also suggest that including age as a variable can improve the performance of various ensembles of decision trees.

Download Full-text

Statistical Analysis for Selective Identifications of VOCs by Using Surface Functionalized MoS2 Based Sensor Array

Chemistry Proceedings ◽

10.3390/csac2021-10451 ◽

2021 ◽

Vol 5 (1) ◽

pp. 35

Author(s):

Uttam Narendra Thakur ◽

Radha Bhardwaj ◽

Arnab Hazra

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Sensor Array ◽

Multinomial Logistic Regression ◽

Learning Algorithms ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Human Breath

Disease diagnosis through breath analysis has attracted significant attention in recent years due to its noninvasive nature, rapid testing ability, and applicability for patients of all ages. More than 1000 volatile organic components (VOCs) exist in human breath, but only selected VOCs are associated with specific diseases. Selective identification of those disease marker VOCs using an array of multiple sensors are highly desirable in the current scenario. The use of efficient sensors and the use of suitable classification algorithms is essential for the selective and reliable detection of those disease markers in complex breath. In the current study, we fabricated a noble metal (Au, Pd and Pt) nanoparticle-functionalized MoS2 (Chalcogenides, Sigma Aldrich, St. Louis, MO, USA)-based sensor array for the selective identification of different VOCs. Four sensors, i.e., pure MoS2, Au/MoS2, Pd/MoS2, and Pt/MoS2 were tested under exposure to different VOCs, such as acetone, benzene, ethanol, xylene, 2-propenol, methanol and toluene, at 50 °C. Initially, principal component analysis (PCA) and linear discriminant analysis (LDA) were used to discriminate those seven VOCs. As compared to the PCA, LDA was able to discriminate well between the seven VOCs. Four different machine learning algorithms such as k-nearest neighbors (kNN), decision tree, random forest, and multinomial logistic regression were used to further identify those VOCs. The classification accuracy of those seven VOCs using KNN, decision tree, random forest, and multinomial logistic regression was 97.14%, 92.43%, 84.1%, and 98.97%, respectively. These results authenticated that multinomial logistic regression performed best between the four machine learning algorithms to discriminate and differentiate the multiple VOCs that generally exist in human breath.

Download Full-text