Identifying the Main Risk Factors for CVD Prediction Using Machine Learning Algorithms

Mapping Intimacies ◽

10.20944/preprints202108.0471.v1 ◽

2021 ◽

Author(s):

Luis Rolando Guarneros-Nolasco ◽

Nancy Aracely Cruz-Ramos ◽

Giner Alor-Hernández ◽

Lisbeth Rodríguez-Mazahua ◽

José Luis Sánchez-Cervantes

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Performance Metrics ◽

Learning Algorithms ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Algorithm Performance ◽

Body Regions ◽

Risks Factors ◽

Fold Cross Validation

CVDs are a leading cause of death globally. In CVDs, the heart is unable to deliver enough blood to other body regions. Since effective and accurate diagnosis of CVDs is essential for CVD prevention and treatment, machine learning (ML) techniques can be effectively and reliably used to discern patients suffering from a CVD from those who do not suffer from any heart condition. Namely, machine learning algorithms (MLAs) play a key role in the diagnosis of CVDs through predictive models that allow us to identify the main risks factors influencing CVD development. In this study, we analyze the performance of ten MLAs on two datasets for CVD prediction and two for CVD diagnosis. Algorithm performance is analyzed on top-two and top-four dataset attributes/features with respect to five performance metrics –accuracy, precision, recall, f1-score, and roc-auc – using the train-test split technique and k-fold cross-validation. Our study identifies the top two and four attributes from each CVD diagnosis/prediction dataset. As our main findings, the ten MLAs exhibited appropriate diagnosis and predictive performance; hence, they can be successfully implemented for improving current CVD diagnosis efforts and help patients around the world, especially in regions where medical staff is lacking.

Download Full-text

Identifying the Main Risk Factors for Cardiovascular Diseases Prediction Using Machine Learning Algorithms

Mathematics ◽

10.3390/math9202537 ◽

2021 ◽

Vol 9 (20) ◽

pp. 2537

Author(s):

Luis Rolando Guarneros-Nolasco ◽

Nancy Aracely Cruz-Ramos ◽

Giner Alor-Hernández ◽

Lisbeth Rodríguez-Mazahua ◽

José Luis Sánchez-Cervantes

Keyword(s):

Machine Learning ◽

Cardiovascular Diseases ◽

Performance Metrics ◽

Learning Algorithms ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Algorithm Performance ◽

Body Regions ◽

Risks Factors ◽

Fold Cross Validation

Cardiovascular Diseases (CVDs) are a leading cause of death globally. In CVDs, the heart is unable to deliver enough blood to other body regions. As an effective and accurate diagnosis of CVDs is essential for CVD prevention and treatment, machine learning (ML) techniques can be effectively and reliably used to discern patients suffering from a CVD from those who do not suffer from any heart condition. Namely, machine learning algorithms (MLAs) play a key role in the diagnosis of CVDs through predictive models that allow us to identify the main risks factors influencing CVD development. In this study, we analyze the performance of ten MLAs on two datasets for CVD prediction and two for CVD diagnosis. Algorithm performance is analyzed on top-two and top-four dataset attributes/features with respect to five performance metrics –accuracy, precision, recall, f1-score, and roc-auc—using the train-test split technique and k-fold cross-validation. Our study identifies the top-two and top-four attributes from CVD datasets analyzing the performance of the accuracy metrics to determine that they are the best for predicting and diagnosing CVD. As our main findings, the ten ML classifiers exhibited appropriate diagnosis in classification and predictive performance with accuracy metric with top-two attributes, identifying three main attributes for diagnosis and prediction of a CVD such as arrhythmia and tachycardia; hence, they can be successfully implemented for improving current CVD diagnosis efforts and help patients around the world, especially in regions where medical staff is lacking.

Download Full-text

P031: Using machine learning algorithms for predicting future performance of emergency medicine residents

CJEM ◽

10.1017/cem.2017.233 ◽

2017 ◽

Vol 19 (S1) ◽

pp. S88

Author(s):

A. Ariaeinejad ◽

R. Patel ◽

T.M. Chan ◽

R. Samavi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Medical Education ◽

Cross Validation ◽

Predictive Analytics ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Resident Performance ◽

Competency Based ◽

Fold Cross Validation

Introduction: Background: Medical education is transitioning from a time-based system to a competency-based framework. In the age of Competency-Based Medical Education, however, there is a drastically increased amount of data that needs to be interpreted. With this data, however, comes an opportunity to develop predictive analytics. Machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look. Machine learning has been successfully used in other fields to create predictive models. Objective: This study evaluates the application of neural network as a machine learning algorithm in learning from historical data in emergency residency program and predicting future resident performance. Methods: We analyzed performance data for 16 residents (PGY1-5) who were assessed at end of each shift. Performance was graded in each of the CanMEDS Roles with scores from 1 to 7 by different attending physicians who observed residents during the shift. We transformed sequences of scores for each resident to a fixed set of features and combined all of them in one dataset. We considered scores under 6 as “At Risk Resident” and scores 6 or more as “Competent Resident”, and then we separated the dataset into training and testing sets using K-Fold cross validation and trained an artificial Neural Network in order to make decision about the future situation of residents in a specific CanMEDS Role and general performance. Results: We used 5-fold cross validation to evaluate the model, one round of cross-validation involves partitioning the whole data into complementary subsets, performing the training phase on the training set, and validating the analysis on the testing set. To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds. Results of cross validation show that accuracy of model was 72%, sensitivity was 81% and specificity was 43%. Conclusion: Machine learning algorithms such (as Neural Network) have the ability to predict future resident performance on a global level and within specific domains (i.e. CanMEDS roles). Used appropriately, such information may be a valuable for monitoring resident progress.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

A machine learning-based predictor for the identification of the recurrence of patients with gastric cancer after operation

Scientific Reports ◽

10.1038/s41598-021-81188-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chengmao Zhou ◽

Junhong Hu ◽

Ying Wang ◽

Mu-Huo Ji ◽

Jianhua Tong ◽

...

Keyword(s):

Machine Learning ◽

Gastric Cancer ◽

Learning Algorithms ◽

Test Group ◽

Operation Time ◽

Predictive Performance ◽

Original Data ◽

Postoperative Recurrence ◽

Machine Learning Algorithms ◽

Gastric Cancer Patients

AbstractTo explore the predictive performance of machine learning on the recurrence of patients with gastric cancer after the operation. The available data is divided into two parts. In particular, the first part is used as a training set (such as 80% of the original data), and the second part is used as a test set (the remaining 20% of the data). And we use fivefold cross-validation. The weight of recurrence factors shows the top four factors are BMI, Operation time, WGT and age in order. In training group:among the 5 machine learning models, the accuracy of gbm was 0.891, followed by gbm algorithm was 0.876; The AUC values of the five machine learning algorithms are from high to low as forest (0.962), gbm (0.922), GradientBoosting (0.898), DecisionTree (0.790) and Logistic (0.748). And the precision of the forest is the highest 0.957, followed by the GradientBoosting algorithm (0.878). At the same time, in the test group is as follows: the highest accuracy of Logistic was 0.801, followed by forest algorithm and gbm; the AUC values of the five algorithms are forest (0.795), GradientBoosting (0.774), DecisionTree (0.773), Logistic (0.771) and gbm (0.771), from high to low. Among the five machine learning algorithms, the highest precision rate of Logistic is 1.000, followed by the gbm (0.487). Machine learning can predict the recurrence of gastric cancer patients after an operation. Besides, the first four factors affecting postoperative recurrence of gastric cancer were BMI, Operation time, WGT and age.

Download Full-text

Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation

Computational Materials Science ◽

10.1016/j.commatsci.2019.109203 ◽

2020 ◽

Vol 171 ◽

pp. 109203 ◽

Cited By ~ 26

Author(s):

Zheng Xiong ◽

Yuxin Cui ◽

Zhonghao Liu ◽

Yong Zhao ◽

Ming Hu ◽

...

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Comparison of Machine Learning Algorithms in the Interpolation and Extrapolation of Flame Describing Functions

Volume 4B: Combustion, Fuels, and Emissions ◽

10.1115/gt2019-91319 ◽

2019 ◽

Author(s):

Michael McCartney ◽

Matthias Haeringer ◽

Wolfgang Polifke

Keyword(s):

Machine Learning ◽

Gaussian Processes ◽

Spline Interpolation ◽

Learning Algorithms ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Test Time ◽

Minimal Amount ◽

Data Points ◽

The Impact

Abstract This paper examines and compares commonly used Machine Learning algorithms in their performance in interpolation and extrapolation of FDFs, based on experimental and simulation data. Algorithm performance is evaluated by interpolating and extrapolating FDFs and then the impact of errors on the limit cycle amplitudes are evaluated using the xFDF framework. The best algorithms in interpolation and extrapolation were found to be the widely used cubic spline interpolation, as well as the Gaussian Processes regressor. The data itself was found to be an important factor in defining the predictive performance of a model, therefore a method of optimally selecting data points at test time using Gaussian Processes was demonstrated. The aim of this is to allow a minimal amount of data points to be collected while still providing enough information to model the FDF accurately. The extrapolation performance was shown to decay very quickly with distance from the domain and so emphasis should be put on selecting measurement points in order to expand the covered domain. Gaussian Processes also give an indication of confidence on its predictions and is used to carry out uncertainty quantification, in order to understand model sensitivities. This was demonstrated through application to the xFDF framework.

Download Full-text

Evaluation and Identification of the Neuroprotective Compounds of Xiaoxuming Decoction by Machine Learning: A Novel Mode to Explore the Combination Rules in Traditional Chinese Medicine Prescription

BioMed Research International ◽

10.1155/2019/6847685 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14

Author(s):

Shilun Yang ◽

Yanjia Shen ◽

Wendan Lu ◽

Yinglin Yang ◽

Haigang Wang ◽

...

Keyword(s):

Machine Learning ◽

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Cross Validation ◽

Bayesian Models ◽

Machine Learning Algorithms ◽

Therapeutic Effects ◽

Test Set ◽

Screening Experiments ◽

Fold Cross Validation

Xiaoxuming decoction (XXMD), a classic traditional Chinese medicine (TCM) prescription, has been used as a therapeutic in the treatment of stroke in clinical practice for over 1200 years. However, the pharmacological mechanisms of XXMD have not yet been elucidated. The purpose of this study was to develop neuroprotective models for identifying neuroprotective compounds in XXMD against hypoxia-induced and H2O2-induced brain cell damage. In this study, a phenotype-based classification method was designed by machine learning to identify neuroprotective compounds and to clarify the compatibility of XXMD components. Four different single classifiers (AB, kNN, CT, and RF) and molecular fingerprint descriptors were used to construct stacked naïve Bayesian models. Among them, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross-validation and test set, respectively. The probability values calculated by four models were then integrated into a stacked Bayesian model. In total, two optimal models, s-NB-1-LPFP6 and s-NB-2-LPFP6, were obtained. The two validated optimal models revealed Matthews correlation coefficients (MCC) of 0.968 and 0.993 for 5-fold cross-validation and of 0.874 and 0.959 for the test set, respectively. Furthermore, the two models were used for virtual screening experiments to identify neuroprotective compounds in XXMD. Ten representative compounds with potential therapeutic effects against the two phenotypes were selected for further cell-based assays. Among the selected compounds, two compounds significantly inhibited H2O2-induced and Na2S2O4-induced neurotoxicity simultaneously. Together, our findings suggested that machine learning algorithms such as combination Bayesian models were feasible to predict neuroprotective compounds and to preliminarily demonstrate the pharmacological mechanisms of TCM.

Download Full-text

Haar Wavelet Pyramid-Based Melanoma Skin Cancer Identification With Ensemble of Machine Learning Algorithms

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.20211001.oa24 ◽

2021 ◽

Vol 16 (4) ◽

pp. 1-15

Author(s):

Sudeep D. Thepade ◽

Gaurav Ramnani

Keyword(s):

Machine Learning ◽

Skin Cancer ◽

Health Informatics ◽

Performance Metrics ◽

Learning Algorithms ◽

Haar Wavelet ◽

Machine Learning Algorithms ◽

Computer Assisted ◽

Marginal Improvement ◽

Wavelet Pyramid

Melanoma is a mortal type of skin cancer. Early detection of melanoma significantly improves the patient’s chances of survival. Detection of melanoma at an early juncture demands expert doctors. The scarcity of such expert doctors is a major issue with healthcare systems globally. Computer-assisted diagnostics may prove helpful in this case. This paper proposes a health informatics system for melanoma identification using machine learning with dermoscopy skin images. In the proposed method, the features of dermoscopy skin images are extracted using the Haar wavelet pyramid various levels. These features are employed to train machine learning algorithms and ensembles for melanoma identification. The consideration of higher levels of Haar Wavelet Pyramid helps speed up the identification process. It is observed that the performance gradually improves from the Haar wavelet pyramid level 4x4 to 16x16, and shows marginal improvement further. The ensembles of machine learning algorithms have shown a boost in performance metrics compared to the use of individual machine learning algorithms.

Download Full-text

Automated Performance Metrics and Machine Learning Algorithms to Measure Surgeon Performance and Anticipate Clinical Outcomes in Robotic Surgery

JAMA Surgery ◽

10.1001/jamasurg.2018.1512 ◽

2018 ◽

Vol 153 (8) ◽

pp. 770 ◽

Cited By ~ 27

Author(s):

Andrew J. Hung ◽

Jian Chen ◽

Inderbir S. Gill

Keyword(s):

Machine Learning ◽

Robotic Surgery ◽

Clinical Outcomes ◽

Performance Metrics ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Surgeon Performance

Download Full-text

Assessment of Machine Learning Algorithms for Prediction of Breast Cancer Malignancy Based on Mammogram Numeric Data

10.1101/2020.01.08.20016949 ◽

2020 ◽

Cited By ~ 1

Author(s):

Peter T. Habib ◽

Alsamman M. Alsamman ◽

Sameh E. Hassnein ◽

Ghada A. Shereif ◽

Aladdin Hamwieh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cross Validation ◽

Mean Squared Error ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Adjusted Rand Index ◽

Support Vector ◽

Cancer Information ◽

Term Care

Abstractin 2019, estimated New Cases 268.600, Breast cancer has one of the most common cancers and is one of the world’s leading causes of death for women. Classification and data mining is an efficient way to classify information. Particularly in the medical field where prediction techniques are commonly used for early detection and effective treatment in diagnosis and research.These paper tests models for the mammogram analysis of breast cancer information from 23 of the more widely used machine learning algorithms such as Decision Tree, Random forest, K-nearest neighbors and support vector machine. The spontaneously splits results are distributed from a replicated 10-fold cross-validation method. The accuracy calculated by Regression Metrics such as Mean Absolute Error, Mean Squared Error, R2 Score and Clustering Metrics such as Adjusted Rand Index, Homogeneity, V-measure.accuracy has been checked F-Measure, AUC, and Cross-Validation. Thus, proper identification of patients with breast cancer would create care opportunities, for example, the supervision and the implementation of intervention plans could benefit the quality of long-term care. Experimental results reveal that the maximum precision 100%with the lowest error rate is obtained with Ada-boost Classifier.

Download Full-text