A Survey of Predicting Heart Disease

Author(s):  
M Preethi ◽  
J Selvakumar

This paper describes various methods of data mining, big data and machine learning models for predicting the heart disease. Data mining and machine learning plays an important role in building an important model for medical system to predict heart disease or cardiovascular disease. Medical experts can help the patients by detecting the cardiovascular disease before occurring. Now-a-days heart disease is one of the most significant causes of fatality. The prediction of heart disease is a critical challenge in the clinical area. But time to time, several techniques are discovered to predict the heart disease in data mining. In this survey paper, many techniques were described for predicting the heart disease.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Moojung Kim ◽  
Young Jae Kim ◽  
Sung Jin Park ◽  
Kwang Gi Kim ◽  
Pyung Chun Oh ◽  
...  

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.


2021 ◽  
Vol 10 (1) ◽  
pp. 99
Author(s):  
Sajad Yousefi

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.


BMJ ◽  
2020 ◽  
pp. m3919
Author(s):  
Yan Li ◽  
Matthew Sperrin ◽  
Darren M Ashcroft ◽  
Tjeerd Pieter van Staa

AbstractObjectiveTo assess the consistency of machine learning and statistical techniques in predicting individual level and population level risks of cardiovascular disease and the effects of censoring on risk predictions.DesignLongitudinal cohort study from 1 January 1998 to 31 December 2018.Setting and participants3.6 million patients from the Clinical Practice Research Datalink registered at 391 general practices in England with linked hospital admission and mortality records.Main outcome measuresModel performance including discrimination, calibration, and consistency of individual risk prediction for the same patients among models with comparable model performance. 19 different prediction techniques were applied, including 12 families of machine learning models (grid searched for best models), three Cox proportional hazards models (local fitted, QRISK3, and Framingham), three parametric survival models, and one logistic model.ResultsThe various models had similar population level performance (C statistics of about 0.87 and similar calibration). However, the predictions for individual risks of cardiovascular disease varied widely between and within different types of machine learning and statistical models, especially in patients with higher risks. A patient with a risk of 9.5-10.5% predicted by QRISK3 had a risk of 2.9-9.2% in a random forest and 2.4-7.2% in a neural network. The differences in predicted risks between QRISK3 and a neural network ranged between –23.2% and 0.1% (95% range). Models that ignored censoring (that is, assumed censored patients to be event free) substantially underestimated risk of cardiovascular disease. Of the 223 815 patients with a cardiovascular disease risk above 7.5% with QRISK3, 57.8% would be reclassified below 7.5% when using another model.ConclusionsA variety of models predicted risks for the same patients very differently despite similar model performances. The logistic models and commonly used machine learning models should not be directly applied to the prediction of long term risks without considering censoring. Survival models that consider censoring and that are explainable, such as QRISK3, are preferable. The level of consistency within and between models should be routinely assessed before they are used for clinical decision making.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
I Korsakov ◽  
A Gusev ◽  
T Kuznetsova ◽  
D Gavrilov ◽  
R Novitskiy

Abstract Abstract Background Advances in precision medicine will require an increasingly individualized prognostic evaluation of patients in order to provide the patient with appropriate therapy. The traditional statistical methods of predictive modeling, such as SCORE, PROCAM, and Framingham, according to the European guidelines for the prevention of cardiovascular disease, not adapted for all patients and require significant human involvement in the selection of predictive variables, transformation and imputation of variables. In ROC-analysis for prediction of significant cardiovascular disease (CVD), the areas under the curve for Framingham: 0.62–0.72, for SCORE: 0.66–0.73 and for PROCAM: 0.60–0.69. To improve it, we apply for approaches to predict a CVD event rely on conventional risk factors by machine learning and deep learning models to 10-year CVD event prediction by using longitudinal electronic health record (EHR). Methods For machine learning, we applied logistic regression (LR) and recurrent neural networks with long short-term memory (LSTM) units as a deep learning algorithm. We extract from longitudinal EHR the following features: demographic, vital signs, diagnoses (ICD-10-cm: I21-I22.9: I61-I63.9) and medication. The problem in this step, that near 80 percent of clinical information in EHR is “unstructured” and contains errors and typos. Missing data are important for the correct training process using by deep learning & machine learning algorithm. The study cohort included patients between the ages of 21 to 75 with a dynamic observation window. In total, we got 31517 individuals in the dataset, but only 3652 individuals have all features or missing features values can be easy to impute. Among these 3652 individuals, 29.4% has a CVD, mean age 49.4 years, 68,2% female. Evaluation We randomly divided the dataset into a training and a test set with an 80/20 split. The LR was implemented with Python Scikit-Learn and the LSTM model was implemented with Keras using Tensorflow as the backend. Results We applied machine learning and deep learning models using the same features as traditional risk scale and longitudinal EHR features for CVD prediction, respectively. Machine learning model (LR) achieved an AUROC of 0.74–0.76 and deep learning (LSTM) 0.75–0.76. By using features from EHR logistic regression and deep learning models improved the AUROC to 0.78–0.79. Conclusion The machine learning models outperformed a traditional clinically-used predictive model for CVD risk prediction (i.e. SCORE, PROCAM, and Framingham equations). This approach was used to create a clinical decision support system (CDSS). It uses both traditional risk scales and models based on neural networks. Especially important is the fact that the system can calculate the risks of cardiovascular disease automatically and recalculate immediately after adding new information to the EHR. The results are delivered to the user's personal account.


2020 ◽  
Author(s):  
Dianbo Liu ◽  
Kathe Fox ◽  
Griffin Weber ◽  
Tim Miller

BACKGROUND A patient’s health information is generally fragmented across silos because it follows how care is delivered: multiple providers in multiple settings. Though it is technically feasible to reunite data for analysis in a manner that underpins a rapid learning healthcare system, privacy concerns and regulatory barriers limit data centralization for this purpose. OBJECTIVE Machine learning can be conducted in a federated manner on patient datasets with the same set of variables, but separated across storage. But federated learning cannot handle the situation where different data types for a given patient are separated vertically across different organizations and when patient ID matching across different institutions is difficult. We call methods that enable machine learning model training on data separated by two or more dimensions “confederated machine learning.” We propose and evaluate confederated learning for training machine learning models to stratify the risk of several diseases among silos when data are horizontally separated by individual, vertically separated by data type, and separated by identity without patient ID matching. METHODS The confederated learning method can be intuitively understood as a distributed learning method with representation learning, generative model, imputation method and data augmentation elements.The confederated learning method we developed consists of three steps: Step 1) Conditional generative adversarial networks with matching loss (cGAN) were trained using data from the central analyzer to infer one data type from another, for example, inferring medications using diagnoses. Generative (cGAN) models were used in this study because a considerable percentage of individuals has not paired data types. For instance, a patient may only have his or her diagnoses in the database but not medication information due to insurance enrolment. cGAN can utilize data with paired information by minimizing matching loss and data without paired information by minimizing adversarial loss. Step 2) Missing data types from each silo were inferred using the model trained in step 1. Step 3) Task-specific models, such as a model to predict diagnoses of diabetes, were trained in a federated manner across all silos simultaneously. RESULTS We conducted experiments to train disease prediction models using confederated learning on a large nationwide health insurance dataset from the U.S that is split into 99 silos. The models stratify individuals by their risk of diabetes, psychological disorders or ischemic heart disease in the next two years, using diagnoses, medication claims and clinical lab test records of patients (See Methods section for details). The goal of these experiments is to test whether a confederated learning approach can simultaneously address the two types of separation mentioned above. CONCLUSIONS we demonstrated that health data distributed across silos separated by individual and data type can be used to train machine learning models without moving or aggregating data. Our method obtains predictive accuracy competitive to a centralized upper bound in predicting risks of diabetes, psychological disorders or ischemic heart disease using previous diagnoses, medications and lab tests as inputs. We compared the performance of a confederated learning approach with models trained on centralized data, only data with the central analyzer or a single data type across silos. The experimental results suggested that confederated learning trained predictive models efficiently across disconnected silos. CLINICALTRIAL NA


Sign in / Sign up

Export Citation Format

Share Document