Development of Prediction Models Using Machine Learning Algorithms for Girls with Suspected Central Precocious Puberty: Retrospective Study (Preprint)

2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.

Author(s):  
You-Hyun Park ◽  
Sung-Hwa Kim ◽  
Yoon-Young Choi

In this study, we developed machine learning-based prediction models for early childhood caries and compared their performances with the traditional regression model. We analyzed the data of 4195 children aged 1–5 years from the Korea National Health and Nutrition Examination Survey data (2007–2018). Moreover, we developed prediction models using the XGBoost (version 1.3.1), random forest, and LightGBM (version 3.1.1) algorithms in addition to logistic regression. Two different methods were applied for variable selection, including a regression-based backward elimination and a random forest-based permutation importance classifier. We compared the area under the receiver operating characteristic (AUROC) values and misclassification rates of the different models and observed that all four prediction models had AUROC values ranging between 0.774 and 0.785. Furthermore, no significant difference was observed between the AUROC values of the four models. Based on the results, we can confirm that both traditional logistic regression and ML-based models can show favorable performance and can be used to predict early childhood caries, identify ECC high-risk groups, and implement active preventive treatments. However, further research is essential to improving the performance of the prediction model using recent methods, such as deep learning.


JAMIA Open ◽  
2020 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huiying Liang

Abstract Objective The study aimed to develop simplified diagnostic models for identifying girls with central precocious puberty (CPP), without the expensive and cumbersome gonadotropin-releasing hormone (GnRH) stimulation test, which is the gold standard for CPP diagnosis. Materials and methods Female patients who had secondary sexual characteristics before 8 years old and had taken a GnRH analog (GnRHa) stimulation test at a medical center in Guangzhou, China were enrolled. Data from clinical visiting, laboratory tests, and medical image examinations were collected. We first extracted features from unstructured data such as clinical reports and medical images. Then, models based on each single-source data or multisource data were developed with Extreme Gradient Boosting (XGBoost) classifier to classify patients as CPP or non-CPP. Results The best performance achieved an area under the curve (AUC) of 0.88 and Youden index of 0.64 in the model based on multisource data. The performance of single-source models based on data from basal laboratory tests and the feature importance of each variable showed that the basal hormone test had the highest diagnostic value for a CPP diagnosis. Conclusion We developed three simplified models that use easily accessed clinical data before the GnRH stimulation test to identify girls who are at high risk of CPP. These models are tailored to the needs of patients in different clinical settings. Machine learning technologies and multisource data fusion can help to make a better diagnosis than traditional methods.


2021 ◽  
Vol 10 (5) ◽  
pp. 972
Author(s):  
Jeeyae Choi ◽  
Hee-Tae Jung ◽  
Anastasiya Ferrell ◽  
Seoyoon Woo ◽  
Linda Haddad

Despite the harmful effect on health, e-cigarette and hookah smoking in youth in the U.S. has increased. Developing tailored e-cigarette and hookah cessation programs for youth is imperative. The aim of this study was to identify predictor variables such as social, mental, and environmental determinants that cause nicotine addiction in youth e-cigarette or hookah users and build nicotine addiction prediction models using machine learning algorithms. A total of 6511 participants were identified as ever having used e-cigarettes or hookah from the National Youth Tobacco Survey (2019) datasets. Prediction models were built by Random Forest with ReliefF and Least Absolute Shrinkage and Selection Operator (LASSO). ReliefF identified important predictor variables, and the Davies–Bouldin clustering evaluation index selected the optimal number of predictors for Random Forest. A total of 193 predictor variables were included in the final analysis. Performance of prediction models was measured by Root Mean Square Error (RMSE) and Confusion Matrix. The results suggested high performance of prediction. Identified predictor variables were aligned with previous research. The noble predictors found, such as ‘witnessed e-cigarette use in their household’ and ‘perception of their tobacco use’, could be used in public awareness or targeted e-cigarette and hookah youth education and for policymakers.


2019 ◽  
Author(s):  
Ruilin Li ◽  
Xinyin Han ◽  
Liping Sun ◽  
Yannan Feng ◽  
Xiaolin Sun ◽  
...  

AbstractPrecisely predicting the required pre-surgery blood volume (PBV) in surgical patients is a formidable challenge in China. Inaccurate estimation is associate with excessive costs, postponed surgeries and adverse outcome after surgery due to in sufficient supply or inventory. This study aimed to predict required PBV based on machine learning techniques. 181,027 medical documents over 6 years were cleaned and finally obtained 92,057 blood transfusion records. The blood transfusion and surgery related factors of perioperative patients, surgeons experience volumes and the actual volumes of transfused RBCs were extracted. 6 machine learning algorithms were used to build prediction models. The surgery patients received allogenic RBCs or without transfusion, had total volume less than 10 units, or had the latest laboratory examinations of pre-surgery within 7 days were included, providing 118,823 data points. 39 predictive factors related to the RBCs transfusion were identified. Random forest model was selected to predict the required PBV of RBCs with 72.9% accuracy and strikingly improved the accuracy by 30.4% compared with surgeons experience, where 90% of data was used for training. We tested and demonstrated that both the data-driven models and the random forest model achieved higher accuracy than surgeons experience. Furthermore, we developed a computational tool, PTRBC, to precisely estimate the required PBV in surgical patients and we believe this tool will find more applications in assisting clinician decisions, not only confined to making accurate pre-surgery blood requirement predicting.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ruixue Cao ◽  
Jinrong Liu ◽  
Pinguo Fu ◽  
Yonghai Zhou ◽  
Zhe Li ◽  
...  

ObjectiveThe present study aimed to assess the diagnostic utility of the Luteinizing hormone (LH) levels and single 60-minute post gonadotropin-releasing hormone (GnRH) agonist stimulation test for idiopathic central precocious puberty (CPP) in girls.MethodsData from 1,492 girls diagnosed with precocious puberty who underwent GnRH agonist stimulation testing between January 1, 2016, and October 8, 2020, were retrospectively reviewed. LH levels and LH/follicle-stimulating hormone (FSH) ratios were measured by immuno-chemiluminescence assay before and at several timepoints after GnRH analogue stimulation testing. Mann–Whitney U test, Spearman’s correlation, χ2 test, and receiver operating characteristic (ROC) analyses were performed to determine the diagnostic utility of these hormone levels.ResultsThe 1,492 subjects were split into two groups: an idiopathic CPP group (n = 518) and a non-CPP group (n = 974). Basal LH levels and LH/FSH ratios were significantly different between the two groups at 30, 60, 90, and 120 minutes after GnRH analogue stimulation testing. Spearman’s correlation analysis showed the strongest correlation between peak LH and LH levels at 60 minutes after GnRH agonist stimulation (r = 0.986, P < 0.001). ROC curve analysis revealed that the 60-minute LH/FSH ratio yielded the highest consistency, with an area under the ROC curve (AUC) of 0.988 (95% confidence interval [CI], 0.982–0.993) and a cut-off point of 0.603 mIU/L (sensitivity 97.3%, specificity 93.0%). The cut-off points of basal LH and LH/FSH were 0.255 mIU/L (sensitivity 68.9%, specificity 86.0%) and 0.07 (sensitivity 73.2%, specificity 89.5%), respectively, with AUCs of 0.823 (95% CI, 0.799–0.847) and 0.843 (95% CI, 0.819–0.867), respectively.ConclusionsA basal LH value greater than 0.535 mIU/L can be used to diagnose CPP without a GnRH agonist stimulation test. A single 60-minute post-stimulus gonadotropin result of LH and LH/FSH can be used instead of a GnRH agonist stimulation test, or samples can be taken only at 0, 30, and 60 minutes after a GnRH agonist stimulation test. This reduces the number of blood draws required compared with the traditional stimulation test, while still achieving a high level of diagnostic accuracy.


Sign in / Sign up

Export Citation Format

Share Document