scholarly journals A combined strategy of feature selection and machine learning to identify predictors of prediabetes

2019 ◽  
Vol 27 (3) ◽  
pp. 396-406 ◽  
Author(s):  
Kushan De Silva ◽  
Daniel Jönsson ◽  
Ryan T Demmer

Abstract Objective To identify predictors of prediabetes using feature selection and machine learning on a nationally representative sample of the US population. Materials and Methods We analyzed n = 6346 men and women enrolled in the National Health and Nutrition Examination Survey 2013–2014. Prediabetes was defined using American Diabetes Association guidelines. The sample was randomly partitioned to training (n = 3174) and internal validation (n = 3172) sets. Feature selection algorithms were run on training data containing 156 preselected exposure variables. Four machine learning algorithms were applied on 46 exposure variables in original and resampled training datasets built using 4 resampling methods. Predictive models were tested on internal validation data (n = 3172) and external validation data (n = 3000) prepared from National Health and Nutrition Examination Survey 2011–2012. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC). Predictors were assessed by odds ratios in logistic models and variable importance in others. The Centers for Disease Control (CDC) prediabetes screening tool was the benchmark to compare model performance. Results Prediabetes prevalence was 23.43%. The CDC prediabetes screening tool produced 64.40% AUROC. Seven optimal (≥ 70% AUROC) models identified 25 predictors including 4 potentially novel associations; 20 by both logistic and other nonlinear/ensemble models and 5 solely by the latter. All optimal models outperformed the CDC prediabetes screening tool (P < 0.05). Discussion Combined use of feature selection and machine learning increased predictive performance outperforming the recommended screening tool. A range of predictors of prediabetes was identified. Conclusion This work demonstrated the value of combining feature selection with machine learning to identify a wide range of predictors that could enhance prediabetes prediction and clinical decision-making.

Biosensors ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 228
Author(s):  
Min-Jeong Kim

Smartwatches have the potential to support health care in everyday life by supporting self-monitoring of health conditions and personal activities. This paper aims to develop a model that predicts the prevalence of cardiovascular disease using health-related data that can be easily measured by smartwatch users. To this end, the data corresponding to the health-related data variables provided by the smartwatch are selected from the Korea National Health and Nutrition Examination Survey. To classify the prevalence of cardiovascular disease with these selected variables, we apply logistic regression, artificial neural network, and support vector machine among machine learning classification techniques, and compare the appropriateness of the algorithm through classification performance indicators. The prediction model using support vector machine showed the highest accuracy. Next, we analyze which structures or parameters of the support vector machine contribute to increasing accuracy and derive the importance of input variables. Since it is very important to diagnose cardiovascular disease early correctly, we expect that this model will be very useful if there is a tool to predict whether cardiovascular disease develops or not.


Healthcare ◽  
2021 ◽  
Vol 9 (9) ◽  
pp. 1138
Author(s):  
Eunhee Cho ◽  
Deulle Min ◽  
Hye Sun Lee

Approximately half of the population worldwide suffers from under/undiagnosed diabetes. In South Korea, 27.7% of people aged over 30 years have type 2 diabetes and are unaware of their condition because they have not been diagnosed. Optimal tools for identifying risk factors of undiagnosed diabetes, which is associated with multiple complications, are currently lacking. Secondary data analysis was conducted using the 2010–2016 Korean National Health and Nutrition Examination Survey. This study aimed to identify the risk factors in individuals not diagnosed with type 2 diabetes, using glycated hemoglobin as the diagnostic standard. Furthermore, we aimed to develop an accurate screening tool for diabetes using HbA1c values by analyzing the data of 12,843 adults (aged ≥ 20 years) not diagnosed with type 2 diabetes. Age, gender, family history of diabetes, hypertension diagnosis, waist-to-height ratio, smoking, and health check-ups were identified as significant risk factors for undiagnosed type 2 diabetes. A screening tool with total and cutoff scores of 13 and 7 points was developed, and it had a sensitivity of 82.7% and specificity of 58.2%. The developed screening tool appears to be a simple and cost-effective method for detecting undiagnosed type 2 diabetes.


2021 ◽  
Vol 3 ◽  
Author(s):  
Gregory M. Ellis ◽  
Pamela E. Souza

Even before the COVID-19 pandemic, there was mounting interest in remote testing solutions for audiology. The ultimate goal of such work was to improve access to hearing healthcare for individuals that might be unable or reluctant to seek audiological help in a clinic. In 2015, Diane Van Tasell patented a method for measuring an audiogram when the precise signal level was unknown (patent US 8,968,209 B2). In this method, the slope between pure-tone thresholds measured at 2 and 4 kHz is calculated and combined with questionnaire information in order to reconstruct the most likely audiograms from a database of options. An approach like the Van Tasell method is desirable because it is quick and feasible to do in a patient's home where exact stimulus levels are unknown. The goal of the present study was to use machine learning to assess the effectiveness of such audiogram-estimation methods. The National Health and Nutrition Examination Survey (NHANES), a database of audiologic and demographic information, was used to train and test several machine learning algorithms. Overall, 9,256 cases were analyzed. Audiometric data were classified using the Wisconsin Age-Related Hearing Impairment Classification Scale (WARHICS), a method that places hearing loss into one of eight categories. Of the algorithms tested, a random forest machine learning algorithm provided the best fit with only a few variables: the slope between 2 and 4 kHz; gender; age; military experience; and self-reported hearing ability. Using this method, 54.79% of the individuals were correctly classified, 34.40% were predicted to have a milder loss than measured, and 10.82% were predicted to have a more severe loss than measured. Although accuracy was low, it is unlikely audibility would be severely affected if classifications were used to apply gains. Based on audibility calculations, underamplification still provided sufficient gain to achieve ~95% correct (Speech Intelligibility Index ≥ 0.45) for sentence materials for 88% of individuals. Fewer than 1% of individuals were overamplified by 10 dB for any audiometric frequency. Given these results, this method presents a promising direction toward remote assessment; however, further refinement is needed before use in clinical fittings.


Sign in / Sign up

Export Citation Format

Share Document