Understanding Early Childhood Obesity via Interpretation of Machine Learning Model Predictions

Author(s):  
Xueqin Pang ◽  
Christopher B. Forrest ◽  
Felice Le-Scherban ◽  
Aaron J. Masino
2016 ◽  
Vol 07 (03) ◽  
pp. 693-706 ◽  
Author(s):  
Cassandra Brady ◽  
Bahram Namjou ◽  
Stephanie Kennebeck ◽  
Jonathan Bickel ◽  
Nandan Patibandla ◽  
...  

SummaryThe objective of this study is to develop an algorithm to accurately identify children with severe early onset childhood obesity (ages 1–5.99 years) using structured and unstructured data from the electronic health record (EHR).Childhood obesity increases risk factors for cardiovascular morbidity and vascular disease. Accurate definition of a high precision phenotype through a standardize tool is critical to the success of large-scale genomic studies and validating rare monogenic variants causing severe early onset obesity.Rule based and machine learning based algorithms were developed using structured and unstructured data from two EHR databases from Boston Children’s Hospital (BCH) and Cincinnati Children’s Hospital and Medical Center (CCHMC). Exclusion criteria including medications or comorbid diagnoses were defined. Machine learning algorithms were developed using cross-site training and testing in addition to experimenting with natural language processing features.Precision was emphasized for a high fidelity cohort. The rule-based algorithm performed the best overall, 0.895 (CCHMC) and 0.770 (BCH). The best feature set for machine learning employed Unified Medical Language System (UMLS) concept unique identifiers (CUIs), ICD-9 codes, and RxNorm codes.Detecting severe early childhood obesity is essential for the intervention potential in children at the highest long-term risk of developing comorbidities related to obesity and excluding patients with underlying pathological and non-syndromic causes of obesity assists in developing a high-precision cohort for genetic study. Further such phenotyping efforts inform future practical application in health care environments utilizing clinical decision support.Citation: Lingren T, Thaker V, Brady C, Namjou B, Kennebeck S, Bickel J, Patibandla N, Ni Y, Van Driest SL, Chen L, Roach A, Cobb B, Kirby J, Denny J, Bailey-Davis L, Williams MS, Marsolo K, Solti I, Holm IA, Harley J, Kohane IS, Savova G, Crimmins N. Developing an algorithm to detect early childhood obesity in two tertiary pediatric medical centers.


2015 ◽  
Vol 06 (03) ◽  
pp. 506-520 ◽  
Author(s):  
S. Mukhopadhyay ◽  
A. Carroll ◽  
S. Downs ◽  
T. M. Dugan

Summary Objectives: This paper aims to predict childhood obesity after age two, using only data collected prior to the second birthday by a clinical decision support system called CHICA. Methods: Analyses of six different machine learning methods: RandomTree, RandomForest, J48, ID3, Naïve Bayes, and Bayes trained on CHICA data show that an accurate, sensitive model can be created. Results: Of the methods analyzed, the ID3 model trained on the CHICA dataset proved the best overall performance with accuracy of 85% and sensitivity of 89%. Additionally, the ID3 model had a positive predictive value of 84% and a negative predictive value of 88%. The structure of the tree also gives insight into the strongest predictors of future obesity in children. Many of the strongest predictors seen in the ID3 modeling of the CHICA dataset have been independently validated in the literature as correlated with obesity, thereby supporting the validity of the model. Conclusions: This study demonstrated that data from a production clinical decision support system can be used to build an accurate machine learning model to predict obesity in children after age two. Citation: Dugan TM, Mukhopadhyay S, Carroll AE, Downs SM. Machine learning techniques for prediction of early childhood obesity. Appl Clin Inform 2015; 6: 506–520http://dx.doi.org/10.4338/ACI-2015-03-RA-0036


2018 ◽  
Author(s):  
Steen Lysgaard ◽  
Paul C. Jennings ◽  
Jens Strabo Hummelshøj ◽  
Thomas Bligaard ◽  
Tejs Vegge

A machine learning model is used as a surrogate fitness evaluator in a genetic algorithm (GA) optimization of the atomic distribution of Pt-Au nanoparticles. The machine learning accelerated genetic algorithm (MLaGA) yields a 50-fold reduction of required energy calculations compared to a traditional GA.


Sign in / Sign up

Export Citation Format

Share Document