A Framework for Type-II Diabetes Prediction Using Machine Learning Approaches

Author(s):  
Mohammed Faim Uddin Bhuiyan ◽  
Md. Tanzim Rahman ◽  
Mehfuj Ahmed Anik ◽  
Musharrat Khan
2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Mingyue Xue ◽  
Yinxia Su ◽  
Chen Li ◽  
Shuxia Wang ◽  
Hua Yao

Background. An estimated 425 million people globally have diabetes, accounting for 12% of the world’s health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. Methods. A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables’ importance scores of T2DM. Results. The results indicated that XGBoost had the best performance (accuracy=0.906, precision=0.910, recall=0.902, F‐1=0.906, and AUC=0.968). The degree of variables’ importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). Conclusions. We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables’ importance scores gives a clue to prevent diabetes occurrence.


Different computational procedures and gadgets are open for data examination. At the present time, took the advantages of those open developments to improve the adequacy of the estimate model for the desire for a Type-2 Diabetic Patient. We intend to inquire about how diabetes scenes are impacted by patients' characteristics and estimations. The capable gauge model is required for clinical researchers. Until generally, Type II diabetes was evaluated uncommon in children. The contamination is, nonetheless, creating among youths in peoples with high paces of Type II diabetes in adults. This work presents the adequacy of Gradient Boosted Classifier which is obscure in past current works. It is related to two AI figuring’s, for instance, Neural Networks, Random Forest. These estimations are applied to the Pima Indians Diabetes Database (PIDD) which is sourced from the UCI AI storage facility. The models made are surveyed by standard techniques, for instance, AUC, Recall, and Accuracy. As obvious, Gradient helped classifier clobbers other two classifiers in all introduction qualities.


Diabetes is the most common chronic disease among the world. Early prediction of these will assist the physicians to provide the improved treatment. Machine learning approaches are widely used for predicting the disease at the earlier stage. However the selecting the significant features and the suitable classifier are still reduces the diagnosis accuracy. In this paper the PCA based feature transformation and the hybrid random forest classifier is utilized for diabetes prediction. PCA attempt to identify the best subset of transformed components that greatly improves the classification result. The system is compared with priori machine learning approaches to evaluate the efficiency of this work. The experimental result shows that the present study enhances the prediction accuracy.


Sign in / Sign up

Export Citation Format

Share Document