scholarly journals A data-driven approach to predicting diabetes and cardiovascular disease with machine learning

Author(s):  
An Dinh ◽  
Stacey Miertschin ◽  
Amber Young ◽  
Somya D. Mohanty

Abstract Background Diabetes and cardiovascular disease are two of the main causes of death in the United States. Identifying and predicting these diseases in patients is the first step towards stopping their progression. We evaluate the capabilities of machine learning models in detecting at-risk patients using survey data (and laboratory results), and identify key variables within the data contributing to these diseases among the patients. Methods Our research explores data-driven approaches which utilize supervised machine learning models to identify patients with such diseases. Using the National Health and Nutrition Examination Survey (NHANES) dataset, we conduct an exhaustive search of all available feature variables within the data to develop models for cardiovascular, prediabetes, and diabetes detection. Using different time-frames and feature sets for the data (based on laboratory data), multiple machine learning models (logistic regression, support vector machines, random forest, and gradient boosting) were evaluated on their classification performance. The models were then combined to develop a weighted ensemble model, capable of leveraging the performance of the disparate models to improve detection accuracy. Information gain of tree-based models was used to identify the key variables within the patient data that contributed to the detection of at-risk patients in each of the diseases classes by the data-learned models. Results The developed ensemble model for cardiovascular disease (based on 131 variables) achieved an Area Under - Receiver Operating Characteristics (AU-ROC) score of 83.1% using no laboratory results, and 83.9% accuracy with laboratory results. In diabetes classification (based on 123 variables), eXtreme Gradient Boost (XGBoost) model achieved an AU-ROC score of 86.2% (without laboratory data) and 95.7% (with laboratory data). For pre-diabetic patients, the ensemble model had the top AU-ROC score of 73.7% (without laboratory data), and for laboratory based data XGBoost performed the best at 84.4%. Top five predictors in diabetes patients were 1) waist size, 2) age, 3) self-reported weight, 4) leg length, and 5) sodium intake. For cardiovascular diseases the models identified 1) age, 2) systolic blood pressure, 3) self-reported weight, 4) occurrence of chest pain, and 5) diastolic blood pressure as key contributors. Conclusion We conclude machine learned models based on survey questionnaire can provide an automated identification mechanism for patients at risk of diabetes and cardiovascular diseases. We also identify key contributors to the prediction, which can be further explored for their implications on electronic health records.

2020 ◽  
Author(s):  
William P.T.M. van Doorn ◽  
Floris Helmich ◽  
Paul M.E.L. van Dam ◽  
Leo H.J. Jacobs ◽  
Patricia M. Stassen ◽  
...  

AbstractIntroductionRisk stratification of patients presenting to the emergency department (ED) is important for appropriate triage. Using machine learning technology, we can integrate laboratory data from a modern emergency department and present these in relation to clinically relevant endpoints for risk stratification. In this study, we developed and evaluated transparent machine learning models in four large hospitals in the Netherlands.MethodsHistorical laboratory data (2013-2018) available within the first two hours after presentation to the ED of Maastricht University Medical Centre+ (Maastricht), Meander Medical Center (Amersfoort), and Zuyderland (locations Sittard and Heerlen) were used. We used the first five years of data to develop the model and the sixth year to evaluate model performance in each hospital separately. Performance was assessed using area under the receiver-operating-characteristic curve (AUROC), brier scores and calibration curves. The SHapley Additive exPlanations (SHAP) algorithm was used to obtain transparent machine learning models.ResultsWe included 266,327 patients with more than 7 million laboratory results available for analysis. Models possessed high diagnostic performance with AUROCs of 0.94 [0.94-0.95], 0.98 [0.97-0.98], 0.88 [0.87-0.89] and 0.90 [0.89-0.91] for Maastricht, Amersfoort, Sittard and Heerlen, respectively. Using the SHAP algorithm, we visualized patient characteristics and laboratory results that drive patient-specific RISKINDEX predictions. As an illustrative example, we applied our models in a triage system for risk stratification that categorized 94.7% of the patients as low risk with a corresponding NPV of ≥99%.DiscussionDeveloped machine learning models are transparent with excellent diagnostic performance in predicting 31-day mortality in ED patients across four hospitals. Follow up studies will assess whether implementation of these algorithm can improve clinically relevant endpoints.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Moojung Kim ◽  
Young Jae Kim ◽  
Sung Jin Park ◽  
Kwang Gi Kim ◽  
Pyung Chun Oh ◽  
...  

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.


2021 ◽  
Vol 12 (02) ◽  
pp. 372-382
Author(s):  
Christine Xia Wu ◽  
Ernest Suresh ◽  
Francis Wei Loong Phng ◽  
Kai Pik Tai ◽  
Janthorn Pakdeethai ◽  
...  

Abstract Objective To develop a risk score for the real-time prediction of readmissions for patients using patient specific information captured in electronic medical records (EMR) in Singapore to enable the prospective identification of high-risk patients for enrolment in timely interventions. Methods Machine-learning models were built to estimate the probability of a patient being readmitted within 30 days of discharge. EMR of 25,472 patients discharged from the medicine department at Ng Teng Fong General Hospital between January 2016 and December 2016 were extracted retrospectively for training and internal validation of the models. We developed and implemented a real-time 30-day readmission risk score generation in the EMR system, which enabled the flagging of high-risk patients to care providers in the hospital. Based on the daily high-risk patient list, the various interfaces and flow sheets in the EMR were configured according to the information needs of the various stakeholders such as the inpatient medical, nursing, case management, emergency department, and postdischarge care teams. Results Overall, the machine-learning models achieved good performance with area under the receiver operating characteristic ranging from 0.77 to 0.81. The models were used to proactively identify and attend to patients who are at risk of readmission before an actual readmission occurs. This approach successfully reduced the 30-day readmission rate for patients admitted to the medicine department from 11.7% in 2017 to 10.1% in 2019 (p < 0.01) after risk adjustment. Conclusion Machine-learning models can be deployed in the EMR system to provide real-time forecasts for a more comprehensive outlook in the aspects of decision-making and care provision.


2016 ◽  
Vol 23 (2) ◽  
pp. 124 ◽  
Author(s):  
Douglas Detoni ◽  
Cristian Cechinel ◽  
Ricardo Araujo Matsumura ◽  
Daniela Francisco Brauner

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.


PLoS ONE ◽  
2020 ◽  
Vol 15 (1) ◽  
pp. e0227419 ◽  
Author(s):  
Varvara Turova ◽  
Irina Sidorenko ◽  
Laura Eckardt ◽  
Esther Rieger-Fackeldey ◽  
Ursula Felderhoff-Müser ◽  
...  

2021 ◽  
Author(s):  
Bruno Barbosa Miranda de Paiva ◽  
Polianna Delfino Pereira ◽  
Claudio Moises Valiense de Andrade ◽  
Virginia Mara Reis Gomes ◽  
Maria Clara Pontello Barbosa Lima ◽  
...  

Objective: To provide a thorough comparative study among state ofthe art machine learning methods and statistical methods for determining in-hospital mortality in COVID 19 patients using data upon hospital admission; to study the reliability of the predictions of the most effective methods by correlating the probability of the outcome and the accuracy of the methods; to investigate how explainable are the predictions produced by the most effective methods. Materials and Methods: De-identified data were obtained from COVID 19 positive patients in 36 participating hospitals, from March 1 to September 30, 2020. Demographic, comorbidity, clinical presentation and laboratory data were used as training data to develop COVID 19 mortality prediction models. Multiple machine learning and traditional statistics models were trained on this prediction task using a folded cross validation procedure, from which we assessed performance and interpretability metrics. Results: The Stacking of machine learning models improved over the previous state of the art results by more than 26% in predicting the class of interest (death), achieving 87.1% of AUROC and macroF1 of 73.9%. We also show that some machine learning models can be very interpretable and reliable, yielding more accurate predictions while providing a good explanation for the why. Conclusion: The best results were obtained using the meta learning ensemble model Stacking. State of the art explainability techniques such as SHAP values can be used to draw useful insights into the patterns learned by machine-learning algorithms. Machine learning models can be more explainable than traditional statistics models while also yielding highly reliable predictions. Key words: COVID-19; prognosis; prediction model; machine learning


The study aims at Rainfall prediction using Machine Learning models using the minimum of features. The prediction here is based on temperature, vapour pressure and relative humidity. Numerous studies carried out earlier used more features than this study. A training-test split of 75-25 was used. The best results were obtained by combining the best of the candidate models into an ensemble model to identify that predictor importance of vapour pressure was 0.89 while that of relative humidity was 0.11 with temperature not seen as a significant predictor for rainfall though the high correlation of temperature (°C) with vapour pressure (Torr) and relative humidity (Percentage) suggests that the two predictor variables subsume the impact of temperature.


Sign in / Sign up

Export Citation Format

Share Document