Evaluation of Machine Learning Methods Developed for Prediction of Diabetes Complications: A Systematic Review

2021 ◽  
pp. 193229682110569
Author(s):  
Kuo Ren Tan ◽  
Jun Jie Benjamin Seng ◽  
Yu Heng Kwan ◽  
Ying Jie Chen ◽  
Sueziani Binte Zainudin ◽  
...  

Background: With the rising prevalence of diabetes, machine learning (ML) models have been increasingly used for prediction of diabetes and its complications, due to their ability to handle large complex data sets. This study aims to evaluate the quality and performance of ML models developed to predict microvascular and macrovascular diabetes complications in an adult Type 2 diabetes population. Methods: A systematic review was conducted in MEDLINE®, Embase®, the Cochrane® Library, Web of Science®, and DBLP Computer Science Bibliography databases according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist. Studies that developed or validated ML prediction models for microvascular or macrovascular complications in people with Type 2 diabetes were included. Prediction performance was evaluated using area under the receiver operating characteristic curve (AUC). An AUC >0.75 indicates clearly useful discrimination performance, while a positive mean relative AUC difference indicates better comparative model performance. Results: Of 13 606 articles screened, 32 studies comprising 87 ML models were included. Neural networks (n = 15) were the most frequently utilized. Age, duration of diabetes, and body mass index were common predictors in ML models. Across predicted outcomes, 36% of the models demonstrated clearly useful discrimination. Most ML models reported positive mean relative AUC compared with non-ML methods, with random forest showing the best overall performance for microvascular and macrovascular outcomes. Majority (n = 31) of studies had high risk of bias. Conclusions: Random forest was found to have the overall best prediction performance. Current ML prediction models remain largely exploratory, and external validation studies are required before their clinical implementation. Protocol Registration: Open Science Framework (registration number: 10.17605/OSF.IO/UP49X).

2021 ◽  
Vol 7 ◽  
pp. 205520762110473
Author(s):  
Kushan De Silva ◽  
Joanne Enticott ◽  
Christopher Barton ◽  
Andrew Forbes ◽  
Sajal Saha ◽  
...  

Objective Machine learning involves the use of algorithms without explicit instructions. Of late, machine learning models have been widely applied for the prediction of type 2 diabetes. However, no evidence synthesis of the performance of these prediction models of type 2 diabetes is available. We aim to identify machine learning prediction models for type 2 diabetes in clinical and community care settings and determine their predictive performance. Methods The systematic review of English language machine learning predictive modeling studies in 12 databases will be conducted. Studies predicting type 2 diabetes in predefined clinical or community settings are eligible. Standard CHARMS and TRIPOD guidelines will guide data extraction. Methodological quality will be assessed using a predefined risk of bias assessment tool. The extent of validation will be categorized by Reilly–Evans levels. Primary outcomes include model performance metrics of discrimination ability, calibration, and classification accuracy. Secondary outcomes include candidate predictors, algorithms used, level of validation, and intended use of models. The random-effects meta-analysis of c-indices will be performed to evaluate discrimination abilities. The c-indices will be pooled per prediction model, per model type, and per algorithm. Publication bias will be assessed through funnel plots and regression tests. Sensitivity analysis will be conducted to estimate the effects of study quality and missing data on primary outcome. The sources of heterogeneity will be assessed through meta-regression. Subgroup analyses will be performed for primary outcomes. Ethics and dissemination No ethics approval is required, as no primary or personal data are collected. Findings will be disseminated through scientific sessions and peer-reviewed journals. PROSPERO registration number CRD42019130886


2019 ◽  
Author(s):  
Lei Zhang ◽  
Xianwen Shang ◽  
Subhashaan Sreedharan ◽  
Xixi Yan ◽  
Jianbin Liu ◽  
...  

BACKGROUND Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. OBJECTIVE We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. METHODS We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. RESULTS Overall, 6.05% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30% (8.08%-8.49%), which was significantly higher (odds ratio 1.37, 95% CI 1.32-1.41) than that in women at 6.20% (6.00%-6.40%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78% [17.05%-18.43%]; women: 14.59% [13.99%-15.17%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79% in 3-year prediction and 75% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12%-50% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3% to 2.8% (<i>P</i>&lt;.001). CONCLUSIONS A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM.


10.2196/16850 ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. e16850 ◽  
Author(s):  
Lei Zhang ◽  
Xianwen Shang ◽  
Subhashaan Sreedharan ◽  
Xixi Yan ◽  
Jianbin Liu ◽  
...  

Background Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. Objective We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. Methods We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. Results Overall, 6.05% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30% (8.08%-8.49%), which was significantly higher (odds ratio 1.37, 95% CI 1.32-1.41) than that in women at 6.20% (6.00%-6.40%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78% [17.05%-18.43%]; women: 14.59% [13.99%-15.17%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79% in 3-year prediction and 75% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12%-50% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3% to 2.8% (P<.001). Conclusions A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM.


2020 ◽  
Vol 46 (2) ◽  
pp. 89-99 ◽  
Author(s):  
S. Tatulashvili ◽  
G. Fagherazzi ◽  
C. Dow ◽  
R. Cohen ◽  
S. Fosse ◽  
...  

Author(s):  
Henock M. Deberneh ◽  
Intaek Kim

Prediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year (Y). The dataset for this study was collected at a private medical institute as electronic health records from 2013 to 2018. To construct the prediction model, key features were first selected using ANOVA tests, chi-squared tests, and recursive feature elimination methods. The resultant features were fasting plasma glucose (FPG), HbA1c, triglycerides, BMI, gamma-GTP, age, uric acid, sex, smoking, drinking, physical activity, and family history. We then employed logistic regression, random forest, support vector machine, XGBoost, and ensemble machine learning algorithms based on these variables to predict the outcome as normal (non-diabetic), prediabetes, or diabetes. Based on the experimental results, the performance of the prediction model proved to be reasonably good at forecasting the occurrence of T2D in the Korean population. The model can provide clinicians and patients with valuable predictive information on the likelihood of developing T2D. The cross-validation (CV) results showed that the ensemble models had a superior performance to that of the single models. The CV performance of the prediction models was improved by incorporating more medical history from the dataset.


2021 ◽  
Author(s):  
M.S Roobini ◽  
M Lakshmi

Abstract There is a tremendous increase in severe cases of type 2 diabetes in the day today's life. Therefore, proper assessment of the disease is critical to saving society. Many prediction models help identify type 2 diabetes. At the same time, every model varies based on the performance measures. Various kinds of algorithms such as Decision Tree, Logistic Regression, KNN, Random Forest algorithm are applied to identify type 2 diabetes. At this juncture, used the implementation of type 2 Classification by AdaBoost algorithms, an ensemble approach. Here, the proposed methodology of the paper is to implement an ensemble approach of machine learning to receive a better efficiency compared to other existing algorithms for the classification of type 2 diabetes. When compared to all different algorithms, this ensemble approach shows an efficiency of 83%. The accuracy is calculated based on various performance measures.


Heart ◽  
2011 ◽  
Vol 98 (5) ◽  
pp. 360-369 ◽  
Author(s):  
S van Dieren ◽  
J W J Beulens ◽  
A P Kengne ◽  
L M Peelen ◽  
G E H M Rutten ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document