scholarly journals Supervised Machine Learning based Ensemble Model for Accurate Prediction of Type 2 Diabetes

Author(s):  
Ramya Akula ◽  
Ni Nguyen ◽  
Ivan Garibay
2021 ◽  
Vol 33 (2) ◽  
pp. 93-114
Author(s):  
Mallika G.C. ◽  
Abeer Alsadoon ◽  
Duong Thu Hang Pham ◽  
Salma Hameedi Abdullah ◽  
Ha Thi Mai ◽  
...  

Type 2 Diabetes (T2DM) makes up about 90% of diabetes cases, as well as tough restriction on continuous monitoring and detecting become one of key aspects in T2DM. This research aims to develop an ensemble of several machine learning and deep learning models for early detection of T2DM with high accuracy. With high diversity of models, the ensemble will provide more excessive performance than single models. Methodology: The proposed system is modified enhanced ensemble of machine learning models for T2DM prediction. It is composed of Logistic Regression, Random Forest, SVM and Deep Neural Network models to generate a modified ensemble model. Results: The output of each model in the modified ensemble is used to figure out the final output of the system. The datasets being used for these models include Practice Fusion HER, Pima Indians diabetic's data, UCI AIM94 Dataset and CA Diabetes Prevalence 2014. In comparison to the previous solutions, the proposed ensemble model solution exposes the effectiveness of accuracy, sensitivity, and specificity. It provides an accuracy of 87.5% from 83.51% in average, sensitivity of 35.8% from 29.59% as well as specificity of 98.9% from 96.27%. The processing time of the proposed model solution with 96.6ms is faster than the state-of-the-art with 97.5ms. Conclusion: The proposed modified enhanced system in this work improves the overall prediction capability of T2DM using an ensemble of several machine learning and deep learning models. A majority voting scheme utilizes the output from several models to make the final accurate prediction. Regularization function in this work is modified in order to include the regularization of all the models in ensemble, that helps prevent the overfitting and encourages the generalization capacity of the proposed system.


2020 ◽  
Author(s):  
Benjamin Lam ◽  
Michael Catt ◽  
Sophie Cassidy ◽  
Jaume Bacardit ◽  
Philip Darke ◽  
...  

BACKGROUND Between 2013 and 2015, the UK Biobank collected accelerometer traces using wrist-worn triaxial accelerometers for 103,712 volunteers aged between 40 and 69, for one week each. This dataset has been used in the past to verify that individuals with chronic diseases exhibit reduced activity levels compared to healthy populations. Yet, the dataset is likely to be noisy, as the devices were allocated to participants without a specific set of inclusion criteria, and the traces reflect uncontrolled free-living conditions. OBJECTIVE To determine the extent to which accelerometer traces can be used to distinguish individuals with Type-2 Diabetes (T2D) from normoglycaemic controls, and to quantify their limitations. METHODS Supervised machine learning classifiers were trained using the different sets of features, to segregate T2D positive individuals from normoglycaemic individuals. Multiple criteria, based on a combination of self-assessment UKBiobank variables and primary care health records linked to the participants in UKBiobank, were used to identify 3,103 individuals in this population who have T2D. The remaining non-diabetic 19,852 participants were further scored on their physical activity impairment severity levels based on other conditions found in their primary care data, and those likely to have been physically impaired at the time were excluded. Physical activity features were first extracted from the raw accelerometer traces dataset for each participant, using an algorithm that extends the previously developed Biobank Accelerometry Analysis toolkit from Oxford University [1]. These features were complemented by a selected collection of socio-demographic and lifestyle features available from UK Biobank. RESULTS Three types of classifiers were tested, with AUC close to[0.86; 95% CI: .85-.87] for all three, and F1 scores in the range [.80,.82] for T2D positives and [.73,.74] for controls. Results obtained using non-physically impaired controls were compared to highly physically impaired controls, to test the hypothesis that non-diabetes conditions reduce classifier performance. Models built using a training set that includes highly impaired controls with other conditions had worse performance: AUC [.75-.77; 95% CI: .74-.78] and F1 in the range [.76-.77] (positives) and [.63,.65] (controls). CONCLUSIONS Granular measures of free-living physical activity can be used to successfully train machine learning models that are able to discriminate between T2D and normoglycaemic controls, albeit with limitations due to the intrinsic noise in the datasets. In a broader, clinical perspective, these findings motivate further research into the use of physical activity traces as a means to screen individuals at risk of diabetes and for early detection, in conjunction with routinely used risk scores, provided that appropriate quality control is enforced on the data collection protocol in order to improve the signal-to-noise ratio. CLINICALTRIAL


Sign in / Sign up

Export Citation Format

Share Document