Supervised Machine Learning-Based Cardiovascular Disease Analysis and Prediction

Cardiovascular illness, often commonly known as heart disease, encompasses a variety of diseases that affect the heart and has been the leading cause of mortality globally in recent decades. It is associated with numerous risks for heart disease and a requirement of the moment to get accurate, trustworthy, and reasonable methods to establish an early diagnosis in order to accomplish early disease treatment. In the healthcare sector, data analysis is a widely utilized method for processing massive amounts of data. Researchers use a variety of statistical and machine learning methods to evaluate massive amounts of complicated medical data, assisting healthcare practitioners in predicting cardiac disease. This study covers many aspects of cardiac illness, as well as a model based on supervised learning techniques such as Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR). It makes use of an existing dataset from the UCI Cleveland database of heart disease patients. There are 303 occurrences and 76 characteristics in the collection. Only 14 of these 76 characteristics are evaluated for testing, which is necessary to validate the performance of various methods. The purpose of this study is to forecast the likelihood of individuals getting heart disease. The findings show that logistic regression achieves the best accuracy score (92.10%).

Download Full-text

Comparison of the Performance of Machine Learning Algorithms in Predicting Heart Disease

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.349 ◽

2021 ◽

Vol 10 (1) ◽

pp. 99

Author(s):

Sajad Yousefi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Decision Tree ◽

Roc Curve ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Learning Models ◽

Algorithm Performance ◽

Machine Learning Models

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.

Download Full-text

Earlier Prediction on the heart disease based on supervised machine learning techniques

2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS) ◽

10.1109/iciccs51141.2021.9432212 ◽

2021 ◽

Author(s):

Anusha M ◽

Suresh K ◽

Chandana M

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

An Innovative Method for Predicting and Classifying Inadequate Accuracy in Heart Disease by Using Decision Tree with K-Nearest Neighbors Algorithm

Alinteri Journal of Agricultural Sciences ◽

10.47059/alinteri/v36i1/ajas21086 ◽

2021 ◽

Vol 36 (1) ◽

pp. 609-615

Author(s):

Mandhapati Rajesh ◽

Dr.K. Malathi

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Decision Tree ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Accuracy Rate ◽

K Nearest Neighbors ◽

Machine Learning Methods ◽

Learning Techniques

Aim: Predicting the Heartdiseases using medical parameters of cardiac patients to get a good accuracy rate using machine learning methods like innovative Decision Tree (DT) algorithm. Materials and Methods: Supervised Machine learning Techniques with innovative Decision Tree (N = 20) and K Nearest Neighbour (KNN) (N = 20) are performed with five different datasets at each time to record five samples. Results: The Decision Tree is used to predict heart disease with the help of various medical conditions, the accuracy is achieved for DT is 98% and KNN is 72.2%. The two algorithms Decision Tree and KNN are statistically insignificant (=.737) with the independent sample T-Test value (p<0.005) with a confidence level of 95%. Conclusion: Prediction and classification of heart disease significantly seem to be better in DT than KNN.

Download Full-text

Aprendizado de Máquina Aplicado à Predição de Doenças Cardiometabólicas com Utilização de Indicadores Metabólicos e Comportamentais de Risco à Saúde

10.14210/cotb.v12.p301-308 ◽

2021 ◽

Author(s):

Alan Lopes de Sousa Freitas ◽

Ana Silvia Degasperi Ieker ◽

Josiane Melchiori Pinheiro ◽

Wilson Rinaldi ◽

Heloise Manica Paris Teixeira

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Decision Tree ◽

Causes Of Death ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cardiometabolic Diseases ◽

Learning Techniques ◽

Good Classification

Cardiometabolic diseases, developed throughout the worker’s life,such as hypertension, diabetes, dyslipidemia and obesity are amongthe main causes of death and are associated with modifiable andcontrollable risk factors. The general objective of this study wasto apply supervised Machine Learning techniques and to comparetheir performance to predict the risk of developing cardiometabolicdisease from servers working at the School Hospital of south inBrazil. We sought to map the characteristics of individuals who aremore likely to develop cardiometabolic diseases. The machine learningmodels evaluated were Naive Bayes, Decision Tree, RandomForest, KNN, Logistic Regression and SVM. The results obtained inthe experiments showed that some supervised machine learningmodels produce a good classification, depending on the attributesand hyperparameters used.

Download Full-text

Cardiac Disease Prediction using Supervised Machine Learning Techniques.

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012013 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012013

Author(s):

Chiradeep Gupta ◽

Athina Saha ◽

N V Subba Reddy ◽

U Dinesh Acharya

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Cardiac Disease ◽

Performance Metrics ◽

Confusion Matrix ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Ensemble Techniques ◽

Learning Techniques

Abstract Diagnosis of cardiac disease requires being more accurate, precise, and reliable. The number of death cases due to cardiac attacks is increasing exponentially day by day. Thus, practical approaches for earlier diagnosis of cardiac or heart disease are done to achieve prompt management of the disease. Various supervised machine learning techniques like K-Nearest Neighbour, Decision Tree, Logistic Regression, Naïve Bayes, and Support Vector Machine (SVM) model are used for predicting cardiac disease using a dataset that was collected from the repository of the University of California, Irvine (UCI). The results depict that Logistic Regression was better than all other supervised classifiers in terms of the performance metrics. The model is also less risky since the number of false negatives is low as compared to other models as per the confusion matrix of all the models. In addition, ensemble techniques can be approached for the accuracy improvement of the classifier. Jupyter notebook is the best tool, for the implementation of Python Programming having many types of libraries, header files, for accurate and precise work.

Download Full-text

Using real-world transaction data to identify money laundering: Leveraging traditional regression and machine learning techniques

STEM Fellowship Journal ◽

10.17975/sfj-2021-006 ◽

2021 ◽

pp. 1-11

Author(s):

Daniel A. Harris ◽

Kyla L. Pyndiura ◽

Shelby L. Sturrock ◽

Rebecca A.G. Christensen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Money Laundering ◽

Model Performance ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Binary Outcome ◽

Learning Techniques ◽

Increased Risk

Money laundering is a pervasive legal and economic problem that hides criminal activity. Identifying money laundering is a priority for both banks and governments, thus, machine learning algorithms have emerged as a possible strategy to detect suspicious financial activity within financial institutions. We used traditional regression and supervised machine learning techniques to identify bank customers at an increased risk of committing money laundering. Specifically, we assessed whether model performance differed across varying operationalizations of the outcome (e.g., multinomial vs. binary classification) and determined whether the inclusion of investigator-derived novel features (e.g., averages across existing features) could improve model performance. We received two proprietary datasets from Scotiabank, a large bank headquartered in Canada. The datasets included customer account information (N = 4,469) and customers’ monthly transaction histories (N = 2,827) from April 15, 2019 to April 15, 2020. We implemented traditional logistic regression, logistic regression with LASSO regularization (LASSO), K-nearest neighbours (KNN), and extreme gradient boosted models (XGBoost). Results indicated that traditional logistic regression with a binary outcome, conducted with investigator-derived novel features, performed the best with an F1 score of 0.79 and accuracy of 0.72. Models with a binary outcome had higher accuracy than the multinomial models, but the F1 scores yielded mixed results. For KNN and XGBoost, we observed little change or worsening performance after the introduction of the investigator-derived novel features. However, the investigator-derived novel features improved model performance for LASSO and traditional logistic regression. Our findings demonstrate that investigators should consider different operationalizations of the outcome, where possible, and include novel features derived from existing features to potentially improve the detection of customer at risk of committing money laundering.

Download Full-text

Angiographic prognosis and diagnosis of heart disease by using unsupervised and supervised Machine Learning techniques

2020 24th International Conference on System Theory, Control and Computing (ICSTCC) ◽

10.1109/icstcc50638.2020.9259719 ◽

2020 ◽

Author(s):

Sebastian Sbirna ◽

Liana-Simona Sbirna

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Regression Based Model for Prediction of Heart Disease Recumbent

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8888.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 6639-6642

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Random Forest ◽

Regression Models ◽

The Other ◽

Supervised Machine Learning ◽

Input Output ◽

Novel Method

Supervised Learning, a novel method that figures out how to anticipate the resultant of an input-output pair by inducting data under series of training and testing functions. Regression model is a sub classification of Supervised Machine Learning. In this paper various Regression models such as Logistic Regression, SVM, KNN, Naive Bayes and Random forest have been applied on Heart Disease dataset. The anticipated outcomes draw the deduction on the level of patients inclined to coronary illness dependent on the traits and qualities. In reference to the applied calculations both KNN and Random Forest beats the other relapse calculation with a precision of 88.52%

Download Full-text

Development of heart attack prediction model based on ensemble learning

Eastern-European Journal of Enterprise Technologies ◽

10.15587/1729-4061.2021.238528 ◽

2021 ◽

Vol 4 (2(112)) ◽

pp. 26-34

Author(s):

Omar Shakir Hasan ◽

Ibrahim Ahmed Saleh

Keyword(s):

Machine Learning ◽

Data Mining ◽

Heart Disease ◽

Ensemble Learning ◽

Heart Attack ◽

Medical Information ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Clinical Records

With the advent of the data age, the continuous improvement and widespread application of medical information systems have led to an exponential growth of biomedical data, such as medical imaging, electronic medical records, biometric tags, and clinical records that have potential and essential research value. However, medical research based on statistical methods is limited by the class and size of the research community, so it cannot effectively perform data mining for large-scale medical information. At the same time, supervised machine learning techniques can effectively solve this problem. Heart attack is one of the most common diseases and one of the leading causes of death, so finding a system that can accurately and reliably predict early diagnosis is an essential and influential step in treating such diseases. Researchers have used various data mining and machine learning techniques to analyze medical data, helping professionals predict heart disease. This paper presents various features related to heart disease, and the model is based on ensemble learning. The proposed system involves preprocessing data, selecting attributes, and then using logistic regression algorithms as meta-classifiers to build the ensemble learning model. Furthermore, using machine learning algorithms (Support Vector Machines, Decision Tree, Random Forest, Extreme Gradient Boosting) for prediction on the Framingham Heart Study dataset and compared with the proposed methodology. The results show that the feasibility and effectiveness of the proposed prediction method based on group learning provide accuracy for medical recommendations and better accuracy than the single traditional machine learning algorithm.

Download Full-text

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text