scholarly journals Predicting the Development of Type 2 Diabetes in a Large Australian Cohort Using Machine-Learning Techniques: Longitudinal Survey Study (Preprint)

2019 ◽  
Author(s):  
Lei Zhang ◽  
Xianwen Shang ◽  
Subhashaan Sreedharan ◽  
Xixi Yan ◽  
Jianbin Liu ◽  
...  

BACKGROUND Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. OBJECTIVE We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. METHODS We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. RESULTS Overall, 6.05% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30% (8.08%-8.49%), which was significantly higher (odds ratio 1.37, 95% CI 1.32-1.41) than that in women at 6.20% (6.00%-6.40%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78% [17.05%-18.43%]; women: 14.59% [13.99%-15.17%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79% in 3-year prediction and 75% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12%-50% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3% to 2.8% (<i>P</i>&lt;.001). CONCLUSIONS A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM.

10.2196/16850 ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. e16850 ◽  
Author(s):  
Lei Zhang ◽  
Xianwen Shang ◽  
Subhashaan Sreedharan ◽  
Xixi Yan ◽  
Jianbin Liu ◽  
...  

Background Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. Objective We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. Methods We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. Results Overall, 6.05% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30% (8.08%-8.49%), which was significantly higher (odds ratio 1.37, 95% CI 1.32-1.41) than that in women at 6.20% (6.00%-6.40%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78% [17.05%-18.43%]; women: 14.59% [13.99%-15.17%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79% in 3-year prediction and 75% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12%-50% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3% to 2.8% (P<.001). Conclusions A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM.


Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 1460-P
Author(s):  
LAUREN E. WEDEKIND ◽  
SAYUKO KOBES ◽  
WEN-CHI HSUEH ◽  
LESLIE BAIER ◽  
WILLIAM C. KNOWLER ◽  
...  

2020 ◽  
Author(s):  
Xueyan Li ◽  
Genshan Ma ◽  
Xiaobo Qian ◽  
Yamou Wu ◽  
Xiaochen Huang ◽  
...  

Abstract Background: We aimed to assess the performance of machine learning algorithms for the prediction of risk factors of postoperative ileus (POI) in patients underwent laparoscopic colorectal surgery for malignant lesions. Methods: We conducted analyses in a retrospective observational study with a total of 637 patients at Suzhou Hospital of Nanjing Medical University. Four machine learning algorithms (logistic regression, decision tree, random forest, gradient boosting decision tree) were considered to predict risk factors of POI. The total cases were randomly divided into training and testing data sets, with a ratio of 8:2. The performance of each model was evaluated by area under receiver operator characteristic curve (AUC), precision, recall and F1-score. Results: The morbidity of POI in this study was 19.15% (122/637). Gradient boosting decision tree reached the highest AUC (0.76) and was the best model for POI risk prediction. In addition, the results of the importance matrix of gradient boosting decision tree showed that the five most important variables were time to first passage of flatus, opioids during POD3, duration of surgery, height and weight. Conclusions: The gradient boosting decision tree was the optimal model to predict the risk of POI in patients underwent laparoscopic colorectal surgery for malignant lesions. And the results of our study could be useful for clinical guidelines in POI risk prediction.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Muhammad Muneeb ◽  
Andreas Henschel

Abstract Background Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning. Results The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%. Conclusion Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.


Author(s):  
Henock M. Deberneh ◽  
Intaek Kim

Prediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year (Y). The dataset for this study was collected at a private medical institute as electronic health records from 2013 to 2018. To construct the prediction model, key features were first selected using ANOVA tests, chi-squared tests, and recursive feature elimination methods. The resultant features were fasting plasma glucose (FPG), HbA1c, triglycerides, BMI, gamma-GTP, age, uric acid, sex, smoking, drinking, physical activity, and family history. We then employed logistic regression, random forest, support vector machine, XGBoost, and ensemble machine learning algorithms based on these variables to predict the outcome as normal (non-diabetic), prediabetes, or diabetes. Based on the experimental results, the performance of the prediction model proved to be reasonably good at forecasting the occurrence of T2D in the Korean population. The model can provide clinicians and patients with valuable predictive information on the likelihood of developing T2D. The cross-validation (CV) results showed that the ensemble models had a superior performance to that of the single models. The CV performance of the prediction models was improved by incorporating more medical history from the dataset.


Author(s):  
Sushma Jaiswal ◽  
Tarun Jaiswal

Introduction: The expansion of an actual diabetes judgement structure by the fascinating improvement of computational intellect is observed as a chief objective currently. Numerous tactics based on the artificial network and machine-learning procedures have been established and verified alongside diabetes datasets, which remained typically associated with the entities of Pima Indian derivation. Nevertheless, extraordinary accuracy up to 99-100% in forecasting the precise diabetes judgement, none of these methods has touched scientific presentation so far. Various tools such as Machine Learning (ML) and Data Mining are used for correct identification of diabetes. These tools improve the diagnosis process associated with T2DM. Diabetes mellitus type 2 (DMT2) is a major problem in several developing countries but its early diagnosis can provide enhanced treatment and can save several people life. Accordingly, we have to develop a structure that diagnoses type 2 diabetes. In this paper, a fuzzy expert system is proposed that present the Mamdani fuzzy inference structure (MFIS) to diagnose type 2 diabetes meritoriously. For necessary evaluation of the proposed structure, a proportional revision has been originated, that provide the anticipated structure with Machine Learning algorithms, specifically J48 Decision-tree (DT), multilayer perceptron (MLP), support-vector-machine (SVM), and Naïve- Bayes (NB), fusion and mixed fusion-based methods. The advanced fuzzy expert system (FES) and the machine learning algorithms are authenticated with actual data commencing the UCI machine learning datasets. Furthermore, the concert of the fuzzy expert structure is appraised by equating it to connected work that used the MFIS to detect the occurrence of type 2 diabetes. Objective: This survey paper presents a review of recent advances in the area of machine learning based classification models for diagnosis of diabetes. Methods: This paper presents an extensive work done in the field of machine learning based classification models for diagnosis of type 2 diabetes where modified fusion of machine learning methods are compared to the basic models i.e. Radial basis function, K-nearest neighbor, support vector machine, J48, logistic regression, classification and regression tress etc. based on training and testing. Results: Fig. 3 and Fig. 4 summarizes the result based on prediction accurateness for each classifier of training and testing. Conclusion: The fuzzy expert system is the best among its rival classifiers; SVM performs very poorly with a very low true positive rate, i.e. a very high number of positive cases misclassified as (Non-diabetic) negative. Based on the evaluation it is clear that the fuzzy expert system has the highest precision value. However, J48 is the least accurate classifier. It has the highest number of false positives relative to the other classifiers mentioned in the testing part. The results show that the fuzzy expert system has the uppermost cost for both precision and recall. Thus, it has the uppermost value for F-measure in the training and testing datasets. J48 is considered the second-best classifier for the training dataset, whereas Naïve Bayes comes in the second rank in the testing dataset.


Sign in / Sign up

Export Citation Format

Share Document