A data-driven approach to predicting diabetes and cardiovascular disease with machine learning

Abstract Background Diabetes and cardiovascular disease are two of the main causes of death in the United States. Identifying and predicting these diseases in patients is the first step towards stopping their progression. We evaluate the capabilities of machine learning models in detecting at-risk patients using survey data (and laboratory results), and identify key variables within the data contributing to these diseases among the patients. Methods Our research explores data-driven approaches which utilize supervised machine learning models to identify patients with such diseases. Using the National Health and Nutrition Examination Survey (NHANES) dataset, we conduct an exhaustive search of all available feature variables within the data to develop models for cardiovascular, prediabetes, and diabetes detection. Using different time-frames and feature sets for the data (based on laboratory data), multiple machine learning models (logistic regression, support vector machines, random forest, and gradient boosting) were evaluated on their classification performance. The models were then combined to develop a weighted ensemble model, capable of leveraging the performance of the disparate models to improve detection accuracy. Information gain of tree-based models was used to identify the key variables within the patient data that contributed to the detection of at-risk patients in each of the diseases classes by the data-learned models. Results The developed ensemble model for cardiovascular disease (based on 131 variables) achieved an Area Under - Receiver Operating Characteristics (AU-ROC) score of 83.1% using no laboratory results, and 83.9% accuracy with laboratory results. In diabetes classification (based on 123 variables), eXtreme Gradient Boost (XGBoost) model achieved an AU-ROC score of 86.2% (without laboratory data) and 95.7% (with laboratory data). For pre-diabetic patients, the ensemble model had the top AU-ROC score of 73.7% (without laboratory data), and for laboratory based data XGBoost performed the best at 84.4%. Top five predictors in diabetes patients were 1) waist size, 2) age, 3) self-reported weight, 4) leg length, and 5) sodium intake. For cardiovascular diseases the models identified 1) age, 2) systolic blood pressure, 3) self-reported weight, 4) occurrence of chest pain, and 5) diastolic blood pressure as key contributors. Conclusion We conclude machine learned models based on survey questionnaire can provide an automated identification mechanism for patients at risk of diabetes and cardiovascular diseases. We also identify key contributors to the prediction, which can be further explored for their implications on electronic health records.

Download Full-text

Transparent Machine Learning models for Rapid Risk Stratification in the Emergency Department: A multi-center evaluation

10.1101/2020.11.25.20238386 ◽

2020 ◽

Author(s):

William P.T.M. van Doorn ◽

Floris Helmich ◽

Paul M.E.L. van Dam ◽

Leo H.J. Jacobs ◽

Patricia M. Stassen ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Risk Stratification ◽

Diagnostic Performance ◽

Medical Center ◽

Laboratory Data ◽

Patient Specific ◽

Learning Models ◽

Laboratory Results ◽

Machine Learning Models

AbstractIntroductionRisk stratification of patients presenting to the emergency department (ED) is important for appropriate triage. Using machine learning technology, we can integrate laboratory data from a modern emergency department and present these in relation to clinically relevant endpoints for risk stratification. In this study, we developed and evaluated transparent machine learning models in four large hospitals in the Netherlands.MethodsHistorical laboratory data (2013-2018) available within the first two hours after presentation to the ED of Maastricht University Medical Centre+ (Maastricht), Meander Medical Center (Amersfoort), and Zuyderland (locations Sittard and Heerlen) were used. We used the first five years of data to develop the model and the sixth year to evaluate model performance in each hospital separately. Performance was assessed using area under the receiver-operating-characteristic curve (AUROC), brier scores and calibration curves. The SHapley Additive exPlanations (SHAP) algorithm was used to obtain transparent machine learning models.ResultsWe included 266,327 patients with more than 7 million laboratory results available for analysis. Models possessed high diagnostic performance with AUROCs of 0.94 [0.94-0.95], 0.98 [0.97-0.98], 0.88 [0.87-0.89] and 0.90 [0.89-0.91] for Maastricht, Amersfoort, Sittard and Heerlen, respectively. Using the SHAP algorithm, we visualized patient characteristics and laboratory results that drive patient-specific RISKINDEX predictions. As an illustrative example, we applied our models in a triage system for risk stratification that categorized 94.7% of the patients as low risk with a corresponding NPV of ≥99%.DiscussionDeveloped machine learning models are transparent with excellent diagnostic performance in predicting 31-day mortality in ED patients across four hospitals. Follow up studies will assess whether implementation of these algorithm can improve clinically relevant endpoints.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Cardiovascular Disease Prediction Using Machine Learning Models

2020 IEEE Pune Section International Conference (PuneCon) ◽

10.1109/punecon50868.2020.9362367 ◽

2020 ◽

Author(s):

Atharv Nikam ◽

Sanket Bhandari ◽

Aditya Mhaske ◽

Shamla Mantri

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Disease Prediction ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Cardiovascular Disease Detection using Artificial Immune System and other Machine Learning Models

Journal of Physics Conference Series ◽

10.1088/1742-6596/1950/1/012032 ◽

2021 ◽

Vol 1950 (1) ◽

pp. 012032

Author(s):

Ishan Gupta ◽

Ruchir Shangle ◽

Vishwas Latiyan ◽

Umang Soni

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Immune System ◽

Artificial Immune System ◽

Disease Detection ◽

Learning Models ◽

Artificial Immune ◽

Machine Learning Models

Download Full-text

Effect of a Real-Time Risk Score on 30-day Readmission Reduction in Singapore

Applied Clinical Informatics ◽

10.1055/s-0041-1726422 ◽

2021 ◽

Vol 12 (02) ◽

pp. 372-382

Author(s):

Christine Xia Wu ◽

Ernest Suresh ◽

Francis Wei Loong Phng ◽

Kai Pik Tai ◽

Janthorn Pakdeethai ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Real Time ◽

Risk Score ◽

Patient Specific ◽

Learning Models ◽

Medicine Department ◽

High Risk Patients ◽

Risk Patients ◽

Machine Learning Models

Abstract Objective To develop a risk score for the real-time prediction of readmissions for patients using patient specific information captured in electronic medical records (EMR) in Singapore to enable the prospective identification of high-risk patients for enrolment in timely interventions. Methods Machine-learning models were built to estimate the probability of a patient being readmitted within 30 days of discharge. EMR of 25,472 patients discharged from the medicine department at Ng Teng Fong General Hospital between January 2016 and December 2016 were extracted retrospectively for training and internal validation of the models. We developed and implemented a real-time 30-day readmission risk score generation in the EMR system, which enabled the flagging of high-risk patients to care providers in the hospital. Based on the daily high-risk patient list, the various interfaces and flow sheets in the EMR were configured according to the information needs of the various stakeholders such as the inpatient medical, nursing, case management, emergency department, and postdischarge care teams. Results Overall, the machine-learning models achieved good performance with area under the receiver operating characteristic ranging from 0.77 to 0.81. The models were used to proactively identify and attend to patients who are at risk of readmission before an actual readmission occurs. This approach successfully reduced the 30-day readmission rate for patients admitted to the medicine department from 11.7% in 2017 to 10.1% in 2019 (p < 0.01) after risk adjustment. Conclusion Machine-learning models can be deployed in the EMR system to provide real-time forecasts for a more comprehensive outlook in the aspects of decision-making and care provision.

Download Full-text

Learning to Identify At-Risk Students in Distance Education Using Interaction Counts

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.62211 ◽

2016 ◽

Vol 23 (2) ◽

pp. 124 ◽

Cited By ~ 2

Author(s):

Douglas Detoni ◽

Cristian Cechinel ◽

Ricardo Araujo Matsumura ◽

Daniela Francisco Brauner

Keyword(s):

Machine Learning ◽

At Risk ◽

At Risk Students ◽

Drop Out ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Student Dropout ◽

Vector Machines ◽

Machine Learning Models

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.

Download Full-text

Machine learning models for identifying preterm infants at risk of cerebral hemorrhage

PLoS ONE ◽

10.1371/journal.pone.0227419 ◽

2020 ◽

Vol 15 (1) ◽

pp. e0227419 ◽

Cited By ~ 2

Author(s):

Varvara Turova ◽

Irina Sidorenko ◽

Laura Eckardt ◽

Esther Rieger-Fackeldey ◽

Ursula Felderhoff-Müser ◽

...

Keyword(s):

Machine Learning ◽

At Risk ◽

Preterm Infants ◽

Cerebral Hemorrhage ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Comparison of Machine Learning Models in Prediction of Cardiovascular Disease Using Health Record Data

2019 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS) ◽

10.1109/icimcis48181.2019.8985205 ◽

2019 ◽

Author(s):

Jaouja Maiga ◽

Gilbert Gutabaga Hungilo ◽

Pranowo

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Health Record ◽

Learning Models ◽

Record Data ◽

Machine Learning Models

Download Full-text

Effectiveness, Explainability and Reliability of Machine Meta-Learning Methods for Predicting Mortality in Patients with COVID-19: Results of the Brazilian COVID-19 Registry

10.1101/2021.11.01.21265527 ◽

2021 ◽

Author(s):

Bruno Barbosa Miranda de Paiva ◽

Polianna Delfino Pereira ◽

Claudio Moises Valiense de Andrade ◽

Virginia Mara Reis Gomes ◽

Maria Clara Pontello Barbosa Lima ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

State Of The Art ◽

Laboratory Data ◽

Machine Learning Algorithms ◽

Training Data ◽

Learning Models ◽

Learning Methods ◽

Meta Learning ◽

Machine Learning Models

Objective: To provide a thorough comparative study among state ofthe art machine learning methods and statistical methods for determining in-hospital mortality in COVID 19 patients using data upon hospital admission; to study the reliability of the predictions of the most effective methods by correlating the probability of the outcome and the accuracy of the methods; to investigate how explainable are the predictions produced by the most effective methods. Materials and Methods: De-identified data were obtained from COVID 19 positive patients in 36 participating hospitals, from March 1 to September 30, 2020. Demographic, comorbidity, clinical presentation and laboratory data were used as training data to develop COVID 19 mortality prediction models. Multiple machine learning and traditional statistics models were trained on this prediction task using a folded cross validation procedure, from which we assessed performance and interpretability metrics. Results: The Stacking of machine learning models improved over the previous state of the art results by more than 26% in predicting the class of interest (death), achieving 87.1% of AUROC and macroF1 of 73.9%. We also show that some machine learning models can be very interpretable and reliable, yielding more accurate predictions while providing a good explanation for the why. Conclusion: The best results were obtained using the meta learning ensemble model Stacking. State of the art explainability techniques such as SHAP values can be used to draw useful insights into the patterns learned by machine-learning algorithms. Machine learning models can be more explainable than traditional statistics models while also yielding highly reliable predictions. Key words: COVID-19; prognosis; prediction model; machine learning

Download Full-text

Rainfall Prediction for Udaipur, Rajasthan using Machine Learning Models Based on Temperature, Vapour Pressure and Relative Humidity

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f1024.0386s20 ◽

2020 ◽

Vol 8 (6S) ◽

pp. 133-137

Keyword(s):

Machine Learning ◽

Relative Humidity ◽

Vapour Pressure ◽

Predictor Variables ◽

Ensemble Model ◽

Learning Models ◽

Rainfall Prediction ◽

Predictor Importance ◽

The Impact ◽

Machine Learning Models

The study aims at Rainfall prediction using Machine Learning models using the minimum of features. The prediction here is based on temperature, vapour pressure and relative humidity. Numerous studies carried out earlier used more features than this study. A training-test split of 75-25 was used. The best results were obtained by combining the best of the candidate models into an ensemble model to identify that predictor importance of vapour pressure was 0.89 while that of relative humidity was 0.11 with temperature not seen as a significant predictor for rainfall though the high correlation of temperature (°C) with vapour pressure (Torr) and relative humidity (Percentage) suggests that the two predictor variables subsume the impact of temperature.

Download Full-text