Machine learning identifies girls with central precocious puberty based on multisource data

JAMIA Open ◽

10.1093/jamiaopen/ooaa063 ◽

2020 ◽

Author(s):

Liyan Pan ◽

Guangjian Liu ◽

Xiaojian Mao ◽

Huiying Liang

Keyword(s):

Machine Learning ◽

Precocious Puberty ◽

Laboratory Tests ◽

Medical Center ◽

Central Precocious Puberty ◽

Stimulation Test ◽

Youden Index ◽

Gradient Boosting ◽

Single Source ◽

Extreme Gradient Boosting

Abstract Objective The study aimed to develop simplified diagnostic models for identifying girls with central precocious puberty (CPP), without the expensive and cumbersome gonadotropin-releasing hormone (GnRH) stimulation test, which is the gold standard for CPP diagnosis. Materials and methods Female patients who had secondary sexual characteristics before 8 years old and had taken a GnRH analog (GnRHa) stimulation test at a medical center in Guangzhou, China were enrolled. Data from clinical visiting, laboratory tests, and medical image examinations were collected. We first extracted features from unstructured data such as clinical reports and medical images. Then, models based on each single-source data or multisource data were developed with Extreme Gradient Boosting (XGBoost) classifier to classify patients as CPP or non-CPP. Results The best performance achieved an area under the curve (AUC) of 0.88 and Youden index of 0.64 in the model based on multisource data. The performance of single-source models based on data from basal laboratory tests and the feature importance of each variable showed that the basal hormone test had the highest diagnostic value for a CPP diagnosis. Conclusion We developed three simplified models that use easily accessed clinical data before the GnRH stimulation test to identify girls who are at high risk of CPP. These models are tailored to the needs of patients in different clinical settings. Machine learning technologies and multisource data fusion can help to make a better diagnosis than traditional methods.

Download Full-text

Development of Prediction Models Using Machine Learning Algorithms for Girls with Suspected Central Precocious Puberty: Retrospective Study (Preprint)

10.2196/preprints.11728 ◽

2018 ◽

Author(s):

Liyan Pan ◽

Guangjian Liu ◽

Xiaojian Mao ◽

Huixian Li ◽

Jiexin Zhang ◽

...

Keyword(s):

Machine Learning ◽

Retrospective Study ◽

Random Forest ◽

Precocious Puberty ◽

Prediction Models ◽

Central Precocious Puberty ◽

Machine Learning Algorithms ◽

Stimulation Test ◽

Gnrh Analogue ◽

Prediction Probability

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.

Download Full-text

An Interpretable Early Dynamic Sequential Predictor for Sepsis-Induced Coagulopathy Progression in the Real-World Using Machine Learning

Frontiers in Medicine ◽

10.3389/fmed.2021.775047 ◽

2021 ◽

Vol 8 ◽

Author(s):

Ruixia Cui ◽

Wenbo Hua ◽

Kai Qu ◽

Heran Yang ◽

Yingmu Tong ◽

...

Keyword(s):

Machine Learning ◽

Real World ◽

Time Series Data ◽

Time Window ◽

Medical Center ◽

Characteristic Curve ◽

Series Data ◽

Gradient Boosting ◽

Early Management ◽

Extreme Gradient Boosting

Sepsis-associated coagulation dysfunction greatly increases the mortality of sepsis. Irregular clinical time-series data remains a major challenge for AI medical applications. To early detect and manage sepsis-induced coagulopathy (SIC) and sepsis-associated disseminated intravascular coagulation (DIC), we developed an interpretable real-time sequential warning model toward real-world irregular data. Eight machine learning models including novel algorithms were devised to detect SIC and sepsis-associated DIC 8n (1 ≤ n ≤ 6) hours prior to its onset. Models were developed on Xi'an Jiaotong University Medical College (XJTUMC) and verified on Beth Israel Deaconess Medical Center (BIDMC). A total of 12,154 SIC and 7,878 International Society on Thrombosis and Haemostasis (ISTH) overt-DIC labels were annotated according to the SIC and ISTH overt-DIC scoring systems in train set. The area under the receiver operating characteristic curve (AUROC) were used as model evaluation metrics. The eXtreme Gradient Boosting (XGBoost) model can predict SIC and sepsis-associated DIC events up to 48 h earlier with an AUROC of 0.929 and 0.910, respectively, and even reached 0.973 and 0.955 at 8 h earlier, achieving the highest performance to date. The novel ODE-RNN model achieved continuous prediction at arbitrary time points, and with an AUROC of 0.962 and 0.936 for SIC and DIC predicted 8 h earlier, respectively. In conclusion, our model can predict the sepsis-associated SIC and DIC onset up to 48 h in advance, which helps maximize the time window for early management by physicians.

Download Full-text

Enhancing Robustness of Machine Learning Integration With Routine Laboratory Blood Tests to Predict Inpatient Mortality After Intracerebral Hemorrhage

Frontiers in Neurology ◽

10.3389/fneur.2021.790682 ◽

2022 ◽

Vol 12 ◽

Author(s):

Wei Chen ◽

Xiangkui Li ◽

Lu Ma ◽

Dong Li

Keyword(s):

Machine Learning ◽

Intracerebral Hemorrhage ◽

Laboratory Tests ◽

Model Building ◽

Laboratory Data ◽

Gradient Boosting ◽

Routine Laboratory ◽

Inpatient Mortality ◽

Laboratory Parameters ◽

Extreme Gradient Boosting

Objective: The accurate evaluation of outcomes at a personalized level in patients with intracerebral hemorrhage (ICH) is critical clinical implications. This study aims to evaluate how machine learning integrates with routine laboratory tests and electronic health records (EHRs) data to predict inpatient mortality after ICH.Methods: In this machine learning-based prognostic study, we included 1,835 consecutive patients with acute ICH between October 2010 and December 2018. The model building process incorporated five pre-implant ICH score variables (clinical features) and 13 out of 59 available routine laboratory parameters. We assessed model performance according to a range of learning metrics, such as the mean area under the receiver operating characteristic curve [AUROC]. We also used the Shapley additive explanation algorithm to explain the prediction model.Results: Machine learning models using laboratory data achieved AUROCs of 0.71–0.82 in a split-by-year development/testing scheme. The non-linear eXtreme Gradient Boosting model yielded the highest prediction accuracy. In the held-out validation set of development cohort, the predictive model using comprehensive clinical and laboratory parameters outperformed those using clinical alone in predicting in-hospital mortality (AUROC [95% bootstrap confidence interval], 0.899 [0.897–0.901] vs. 0.875 [0.872–0.877]; P <0.001), with over 81% accuracy, sensitivity, and specificity. We observed similar performance in the testing set.Conclusions: Machine learning integrated with routine laboratory tests and EHRs could significantly promote the accuracy of inpatient ICH mortality prediction. This multidimensional composite prediction strategy might become an intelligent assistive prediction for ICH risk reclassification and offer an example for precision medicine.

Download Full-text

A Pragmatic Machine Learning Model to Predict Carbapenem Resistance

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.00063-21 ◽

2021 ◽

Author(s):

Ryan J. McGuire ◽

Sean C. Yu ◽

Philip R. O. Payne ◽

Albert M. Lai ◽

M. Cristina Vazquez-Guillamet ◽

...

Keyword(s):

Machine Learning ◽

Predictive Value ◽

Medical Center ◽

Academic Medical Center ◽

Tertiary Care ◽

Carbapenem Resistance ◽

The United States ◽

Gradient Boosting ◽

Negative Case ◽

Extreme Gradient Boosting

Infection caused by carbapenem resistant (CR) organisms is a rising problem in the United States. While the risk factors for antibiotic resistance are well known, there remains a large need for the early identification of antibiotic resistant infections. Using machine learning (ML), we sought to develop a prediction model for carbapenem resistance. All patients >18 years of age admitted to a tertiary-care academic medical center between Jan 1, 2012 and Oct 10, 2017 with ≥1 bacterial culture were eligible for inclusion. All demographic, medication, vital sign, procedure, laboratory, and culture/sensitivity data was extracted from the electronic health record. Organisms were considered CR if a single isolate was reported as intermediate or resistant. CR and non-CR patients were temporally matched to maintain positive/negative case ratio. Extreme gradient boosting was used for model development. In total, 68,472 patients met inclusion criteria with 1,088 CR patients identified. Sixty-seven features were used for predictive modeling. The most important features were number of prior antibiotic days, recent central venous catheter placement, and inpatient surgery. After model training, the area under the receiver operating characteristic curve was 0.846. The sensitivity of the model was 30%, with a positive predictive value (PPV) of 30% and a negative predictive value of 99%. Using readily available clinical data, we were able to create a ML model capable of predicting CR infections at the time of culture collection with a high PPV.

Download Full-text

Predicting Undesired Treatment Outcome in Mental Healthcare: Machine Learning Study (Preprint)

10.2196/preprints.17235 ◽

2019 ◽

Author(s):

Kasper Van Mens ◽

Joran Lokkerbol ◽

Richard Janssen ◽

Robert de Lange ◽

Bea Tiemens

Keyword(s):

Machine Learning ◽

Treatment Outcome ◽

Mental Health Treatment ◽

Mental Healthcare ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Trade Off ◽

Trade Offs ◽

Outcome Monitoring ◽

Extreme Gradient Boosting

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

A Machine Learning Method for Predicting Vegetation Indices in China

Remote Sensing ◽

10.3390/rs13061147 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1147

Author(s):

Xiangqian Li ◽

Wenping Yuan ◽

Wenjie Dong

Keyword(s):

Machine Learning ◽

Growing Season ◽

Crop Growth ◽

Spatiotemporal Distribution ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Severe Drought ◽

Vegetation Growth ◽

Extreme Gradient Boosting ◽

Boosting Method

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.

Download Full-text

Patient-Specific Predictive Antibiogram in Decision Support for Empiric Antibiotic Treatment

Infection Control and Hospital Epidemiology ◽

10.1017/ice.2020.1205 ◽

2020 ◽

Vol 41 (S1) ◽

pp. s521-s522

Author(s):

Debarka Sengupta ◽

Vaibhav Singh ◽

Seema Singh ◽

Dinesh Tewari ◽

Mudit Kapoor ◽

...

Keyword(s):

Machine Learning ◽

Antimicrobial Resistance ◽

Model Building ◽

Medical Center ◽

Bacterial Species ◽

Model Performance ◽

The United States ◽

Patient Specific ◽

Gradient Boosting ◽

Comparative Performance

Background: The rising trend of antibiotic resistance imposes a heavy burden on healthcare both clinically and economically (US$55 billion), with 23,000 estimated annual deaths in the United States as well as increased length of stay and morbidity. Machine-learning–based methods have, of late, been used for leveraging patient’s clinical history and demographic information to predict antimicrobial resistance. We developed a machine-learning model ensemble that maximizes the accuracy of such a drug-sensitivity versus resistivity classification system compared to the existing best-practice methods. Methods: We first performed a comprehensive analysis of the association between infecting bacterial species and patient factors, including patient demographics, comorbidities, and certain healthcare-specific features. We leveraged the predictable nature of these complex associations to infer patient-specific antibiotic sensitivities. Various base-learners, including k-NN (k-nearest neighbors) and gradient boosting machine (GBM), were used to train an ensemble model for confident prediction of antimicrobial susceptibilities. Base learner selection and model performance evaluation was performed carefully using a variety of standard metrics, namely accuracy, precision, recall, F1 score, and Cohen κ. Results: For validating the performance on MIMIC-III database harboring deidentified clinical data of 53,423 distinct patient admissions between 2001 and 2012, in the intensive care units (ICUs) of the Beth Israel Deaconess Medical Center in Boston, Massachusetts. From ~11,000 positive cultures, we used 4 major specimen types namely urine, sputum, blood, and pus swab for evaluation of the model performance. Figure 1 shows the receiver operating characteristic (ROC) curves obtained for bloodstream infection cases upon model building and prediction on 70:30 split of the data. We received area under the curve (AUC) values of 0.88, 0.92, 0.92, and 0.94 for urine, sputum, blood, and pus swab samples, respectively. Figure 2 shows the comparative performance of our proposed method as well as some off-the-shelf classification algorithms. Conclusions: Highly accurate, patient-specific predictive antibiogram (PSPA) data can aid clinicians significantly in antibiotic recommendation in ICU, thereby accelerating patient recovery and curbing antimicrobial resistance.Funding: This study was supported by Circle of Life Healthcare Pvt. Ltd.Disclosures: None

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Prediction of population behavior of Listeria monocytogenes in food using machine learning and a microbial growth and survival database

Scientific Reports ◽

10.1038/s41598-021-90164-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Satoko Hiura ◽

Shige Koseki ◽

Kento Koyama

Keyword(s):

Machine Learning ◽

Data Mining ◽

Listeria Monocytogenes ◽

Water Activity ◽

Bacterial Population ◽

Gradient Boosting ◽

Initial Cell ◽

Data Mining Approach ◽

Cell Counts ◽

Extreme Gradient Boosting

AbstractIn predictive microbiology, statistical models are employed to predict bacterial population behavior in food using environmental factors such as temperature, pH, and water activity. As the amount and complexity of data increase, handling all data with high-dimensional variables becomes a difficult task. We propose a data mining approach to predict bacterial behavior using a database of microbial responses to food environments. Listeria monocytogenes, which is one of pathogens, population growth and inactivation data under 1,007 environmental conditions, including five food categories (beef, culture medium, pork, seafood, and vegetables) and temperatures ranging from 0 to 25 °C, were obtained from the ComBase database (www.combase.cc). We used eXtreme gradient boosting tree, a machine learning algorithm, to predict bacterial population behavior from eight explanatory variables: ‘time’, ‘temperature’, ‘pH’, ‘water activity’, ‘initial cell counts’, ‘whether the viable count is initial cell number’, and two types of categories regarding food. The root mean square error of the observed and predicted values was approximately 1.0 log CFU regardless of food category, and this suggests the possibility of predicting viable bacterial counts in various foods. The data mining approach examined here will enable the prediction of bacterial population behavior in food by identifying hidden patterns within a large amount of data.

Download Full-text