Core Genome Allelic Profiles of Clinical Klebsiella pneumoniae Strains Using a Random Forest Algorithm Based on Multilocus Sequence Typing Scheme for Hypervirulence Analysis

Abstract Background Hypervirulent Klebsiella pneumoniae (hvKP) infections can have high morbidity and mortality rates owing to their invasiveness and virulence. However, there are no effective tools or biomarkers to discriminate between hvKP and nonhypervirulent K. pneumoniae (nhvKP) strains. We aimed to use a random forest algorithm to predict hvKP based on core-genome data. Methods In total, 272 K. pneumoniae strains were collected from 20 tertiary hospitals in China and divided into hvKP and nhvKP groups according to clinical criteria. Clinical data comparisons, whole-genome sequencing, virulence profile analysis, and core genome multilocus sequence typing (cgMLST) were performed. We then established a random forest predictive model based on the cgMLST scheme to prospectively identify hvKP. The random forest is an ensemble learning method that generates multiple decision trees during the training process and each decision tree will output its own prediction results corresponding to the input. The predictive ability of the model was assessed by means of area under the receiver operating characteristic curve. Results Patients in the hvKP group were younger than those in the nhvKP group (median age, 58.0 and 68.0 years, respectively; P < .001). More patients in the hvKP group had underlying diabetes mellitus (43.1% vs 20.1%; P < .001). Clinically, carbapenem-resistant K. pneumoniae was less common in the hvKP group (4.1% vs 63.8%; P < .001), whereas the K1/K2 serotype, sequence type (ST) 23, and positive string tests were significantly higher in the hvKP group. A cgMLST-based minimal spanning tree revealed that hvKP strains were scattered sporadically within nhvKP clusters. ST23 showed greater genome diversification than did ST11, according to cgMLST-based allelic differences. Primary virulence factors (rmpA, iucA, positive string test result, and the presence of virulence plasmid pLVPK) were poor predictors of the hypervirulence phenotype. The random forest model based on the core genome allelic profile presented excellent predictive power, both in the training and validating sets (area under receiver operating characteristic curve, 0.987 and 0.999 in the training and validating sets, respectively). Conclusions A random forest algorithm predictive model based on the core genome allelic profiles of K. pneumoniae was accurate to identify the hypervirulent isolates.

Download Full-text

Using The Random Forest Algorithm To Detect The Activity of Thyroid-Associated Ophthalmopathy

10.21203/rs.3.rs-787674/v1 ◽

2021 ◽

Author(s):

Minghui Wang ◽

Hanqiao Zhang ◽

Li Dong ◽

Yang Li ◽

Zhijia Hou ◽

...

Keyword(s):

Random Forest ◽

Predictive Value ◽

Diagnostic Performance ◽

Operating Characteristic ◽

Characteristic Curve ◽

Random Forest Model ◽

Random Forest Algorithm ◽

Forest Model ◽

Thyroid Associated Ophthalmopathy ◽

Operating Characteristic Curve

Abstract Objective: The aim of this study is to establish a random forest model to detect active and quiescent phases of patients with Thyroid-associated ophthalmopathy (TAO) and to evaluate its diagnostic performance.Methods：A total of 146 patients (292 eyes) who were diagnosed with TAO and were treated in the Ophthalmology Outpatient Clinic of Beijing TongRen hospital were retrospectively included in the study. We took the clinical activity score of TAO as the target; took gender, age, smoking status, I-131 treatment history, thyroid nodules, thyromegaly, thyroid hormone and TSH-receptor antibodies (TRAb) as predictive characteristic variables to establish a random forest model. The proportion of the training group to the testing group was 7:3. We analyzed the model’s accuracy, precision, sensitivity, specificity, positive predictive value (PPV), negative predictive value (PPV), F1 score and out-of-bag (OOB) error, with the accuracy, the brier loss and the area under the receiver operating characteristic curve compared with logistic regression model.Results：Our model has an accuracy of 0.93, a sensitivity of 0.88, a specificity of 0.96, a positive predictive value of 0.94, a negative predictive value of 0.93, an F1 score of 0.91 and an OOB error of 0.12. The accuracy of the random forest model and the logistic regression model were 0.93 and 0.79, respectively, the brier loss were 0.06 and 0.20, and the area under the receiver operating characteristic curve were 0.95 and 0.86.Conclusion：By integrating these high-risk factors, the random forest algorithm can be used as a complementary diagnostic method to determine the activity of TAO, showing prominent diagnostic performance.

Download Full-text

Machine learning for identification of surgeries with high risks of cancellation

Health Informatics Journal ◽

10.1177/1460458218813602 ◽

2018 ◽

Vol 26 (1) ◽

pp. 141-155 ◽

Cited By ~ 2

Author(s):

Li Luo ◽

Fengyi Zhang ◽

Yao Yao ◽

RenRong Gong ◽

Martina Fu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Value ◽

Operating Characteristic ◽

Sampling Methods ◽

Characteristic Curve ◽

Support Vector ◽

Chi Square ◽

Stable Performance ◽

Operating Characteristic Curve

Surgery cancellations waste scarce operative resources and hinder patients’ access to operative services. In this study, the Wilcoxon and chi-square tests were used for predictor selection, and three machine learning models – random forest, support vector machine, and XGBoost – were used for the identification of surgeries with high risks of cancellation. The optimal performances of the identification models were as follows: sensitivity − 0.615; specificity − 0.957; positive predictive value − 0.454; negative predictive value − 0.904; accuracy − 0.647; and area under the receiver operating characteristic curve − 0.682. Of the three models, the random forest model achieved the best performance. Thus, the effective identification of surgeries with high risks of cancellation is feasible with stable performance. Models and sampling methods significantly affect the performance of identification. This study is a new application of machine learning for the identification of surgeries with high risks of cancellation and facilitation of surgery resource management.

Download Full-text

Prediction of Nephrotoxicity Associated With Cisplatin-Based Chemotherapy in Testicular Cancer Patients

JNCI Cancer Spectrum ◽

10.1093/jncics/pkaa032 ◽

2020 ◽

Vol 4 (3) ◽

Author(s):

Sara L Garcia ◽

Jakob Lauritsen ◽

Zeyu Zhang ◽

Mikkel Bandak ◽

Marlene D Dalgaard ◽

...

Keyword(s):

Random Forest ◽

Testicular Cancer ◽

Receiver Operating Characteristic Curve ◽

Receiver Operating Characteristic ◽

Cancer Patients ◽

Clinical Data ◽

Operating Characteristic ◽

Characteristic Curve ◽

Patients At Risk ◽

Operating Characteristic Curve

Abstract Background Cisplatin-based chemotherapy may induce nephrotoxicity. This study presents a random forest predictive model that identifies testicular cancer patients at risk of nephrotoxicity before treatment. Methods Clinical data and DNA from saliva samples were collected for 433 patients. These were genotyped on Illumina HumanOmniExpressExome-8 v1.2 (964 193 markers). Clinical and genomics-based random forest models generated a risk score for each individual to develop nephrotoxicity defined as a 20% drop in isotopic glomerular filtration rate during chemotherapy. The area under the receiver operating characteristic curve was the primary measure to evaluate models. Sensitivity, specificity, and positive and negative predictive values were used to discuss model clinical utility. Results Of 433 patients assessed in this study, 26.8% developed nephrotoxicity after bleomycin-etoposide-cisplatin treatment. Genomic markers found to be associated with nephrotoxicity were located at NAT1, NAT2, and the intergenic region of CNTN6 and CNTN4. These, in addition to previously associated markers located at ERCC1, ERCC2, and SLC22A2, were found to improve predictions in a clinical feature–trained random forest model. Using only clinical data for training the model, an area under the receiver operating characteristic curve of 0.635 (95% confidence interval [CI] = 0.629 to 0.640) was obtained. Retraining the classifier by adding genomics markers increased performance to 0.731 (95% CI = 0.726 to 0.736) and 0.692 (95% CI = 0.688 to 0.696) on the holdout set. Conclusions A clinical and genomics-based machine learning algorithm improved the ability to identify patients at risk of nephrotoxicity compared with using clinical variables alone. Novel genetics associations with cisplatin-induced nephrotoxicity were found for NAT1, NAT2, CNTN6, and CNTN4 that require replication in larger studies before application to clinical practice.

Download Full-text

Evaluation of corneal topographic, tomographic and biomechanical indices for detecting clinical and subclinical keratoconus: a comprehensive three-device study

International Journal of Ophthalmology ◽

10.18240/ijo.2021.02.08 ◽

2021 ◽

Vol 14 (2) ◽

pp. 228-239

Author(s):

Zahra Heidari ◽

◽

Mehrdad Mohammadpour ◽

Kazem Amanzadeh ◽

Akbar Fotouhi ◽

...

Keyword(s):

Random Forest ◽

Early Detection ◽

Operating Characteristic ◽

Cross Validation ◽

Characteristic Curve ◽

Diagnostic Ability ◽

Test Study ◽

Biomechanical Parameters ◽

Operating Characteristic Curve ◽

Is Value

AIM: To evaluate the diagnostic ability of topographic and tomographic indices with Pentacam and Sirius as well as biomechanical parameters with Corvis ST for the detection of clinical and subclinical forms of keratoconus (KCN). METHODS: In this prospective diagnostic test study, 70 patients with clinical KCN, 79 patients with abnormal findings in topography and tomography maps with no evidence on clinical examination (subclinical KCN), and 68 normal control subjects were enrolled. The accuracy of topographic, tomographic, and biomechanical parameters was evaluated using the area under the receiver operating characteristic curve (AUC) and cross-validation analysis. The Delong method was used for comparing AUCs. RESULTS: In distinguishing KCN from normal, all parameters showed statistically significant differences between the two groups (P<0.001). Indices with the perfect diagnostic ability (AUC≥0.999) were Sirius KCN vertex of back (KVb), Pentacam random forest index (PRFI), Pentacam index of height decentration (IHD), and Corvis integrated tomographic/biomechanical index (TBI). In distinguishing subclinical KCN from normal, Sirius symmetry index of back (SIb; AUC=0.908), Pentacam inferior-superior difference (IS) value (AUC=0.862), PRFI (AUC=0.847), and Corvis TBI (AUC=0.820) performed best. There were no significant differences between the highest AUCs within keratoconic groups (DeLong, P>0.05). CONCLUSION: In clinical KCN, all topographic, tomographic, and biomechanical indices have acceptable outcomes in terms of sensitivity and specificity. However, in differentiating subclinical forms of KCN from normal corneas, curvature-based parameters (SIb and IS value) followed by integrated indices (PRFI and TBI) are the most powerful tools for early detection of KCN.

Download Full-text

Can machine learning improve mortality prediction following cardiac surgery?

European Journal of Cardio-Thoracic Surgery ◽

10.1093/ejcts/ezaa229 ◽

2020 ◽

Vol 58 (6) ◽

pp. 1130-1136

Author(s):

Umberto Benedetto ◽

Shubhra Sinha ◽

Matt Lyon ◽

Arnaldo Dimagli ◽

Tom R Gaunt ◽

...

Keyword(s):

Machine Learning ◽

Cardiac Surgery ◽

Random Forest ◽

Hospital Mortality ◽

Receiver Operating Characteristic Curve ◽

Receiver Operating Characteristic ◽

Operating Characteristic ◽

Characteristic Curve ◽

Operating Characteristic Curve ◽

Receiver Operating

Abstract OBJECTIVES Interest in the clinical usefulness of machine learning for risk prediction has bloomed recently. Cardiac surgery patients are at high risk of complications and therefore presurgical risk assessment is of crucial relevance. We aimed to compare the performance of machine learning algorithms over traditional logistic regression (LR) model to predict in-hospital mortality following cardiac surgery. METHODS A single-centre data set of prospectively collected information from patients undergoing adult cardiac surgery from 1996 to 2017 was split into 70% training set and 30% testing set. Prediction models were developed using neural network, random forest, naive Bayes and retrained LR based on features included in the EuroSCORE. Discrimination was assessed using area under the receiver operating characteristic curve, and calibration analysis was undertaken using the calibration belt method. Model calibration drift was assessed by comparing Goodness of fit χ2 statistics observed in 2 equal bins from the testing sample ordered by procedure date. RESULTS A total of 28 761 cardiac procedures were performed during the study period. The in-hospital mortality rate was 2.7%. Retrained LR [area under the receiver operating characteristic curve 0.80; 95% confidence interval (CI) 0.77–0.83] and random forest model (0.80; 95% CI 0.76–0.83) showed the best discrimination. All models showed significant miscalibration. Retrained LR proved to have the weakest calibration drift. CONCLUSIONS Our findings do not support the hypothesis that machine learning methods provide advantage over LR model in predicting operative mortality after cardiac surgery.

Download Full-text

Analysis of shoulder MR imaging using Receiver Operating Characteristic curve

Journal of the Korean Radiological Society ◽

10.3348/jkrs.1998.38.4.723 ◽

1998 ◽

Vol 38 (4) ◽

pp. 723

Author(s):

Yoon Joon Hwang ◽

Jin Suck Suh ◽

Jae Hyun Cho

Keyword(s):

Mr Imaging ◽

Receiver Operating Characteristic Curve ◽

Receiver Operating Characteristic ◽

Operating Characteristic ◽

Characteristic Curve ◽

Operating Characteristic Curve ◽

Receiver Operating

Download Full-text

Serum miR-21 and miR-26a Levels Negatively Correlate with Severity of Cirrhosis in Patients with Chronic Hepatitis B

MicroRNA ◽

10.2174/2211536607666180821162850 ◽

2018 ◽

Vol 8 (1) ◽

pp. 86-92 ◽

Cited By ~ 2

Author(s):

Shili Jiang ◽

Wei Jiang ◽

Ying Xu ◽

Xiaoning Wang ◽

Yongping Mu ◽

...

Keyword(s):

Chronic Hepatitis ◽

Hepatitis B ◽

Chronic Hepatitis B ◽

Operating Characteristic ◽

Characteristic Curve ◽

Curve Analysis ◽

Pugh Class ◽

Operating Characteristic Curve ◽

Class C ◽

Circulating Levels

Background and Objective: Accurately evaluating the severity of liver cirrhosis is essential for clinical decision making and disease management. This study aimed to evaluate the value of circulating levels of microRNA (miR)-26a and miR-21 as novel noninvasive biomarkers in detecting severity of cirrhosis in patients with chronic hepatitis B. </P><P> Methods: Thirty patients with clinically diagnosed chronic hepatitis B-related cirrhosis and 30 healthy individuals were selected. The serum levels of miR-26a and miR-21 were quantified by qRT-PCR. Receiver operating characteristic curve analysis was performed to evaluate the sensitivity and specificity of the miRNAs for detecting the severity of cirrhosis. Results: Serum miR-26a and miR-21 levels were found to be significantly downregulated in patients with severe cirrhosis scored at Child-Pugh class C in comparison to healthy controls (miR-26a p<0.01, and miR-21 p<0.001, respectively). The circulating miR-26a and miR-21 levels in patients were positively correlated with serum albumin concentration but negatively correlated with serum total bilirubin concentration and prothrombin time. Receiver operating characteristic curve analysis revealed that both serum miR-26a and miR-21 levels were associated with a high diagnostic accuracy for patients with cirrhosis scored at Child-Pugh class C (miR-26a Cut-off fold change at ≤0.4, Sensitivity: 84.62%, Specificity: 89.36%, P<0.0001; miR-21 Cut-off fold change at ≤0.6, Sensitivity: 84.62%, Specificity: 78.72%, P<0.0001). Our results indicate that the circulating levels of miR-26a and miR-21 are closely related to the extent of liver decompensation, and the decreased levels are capable of discriminating patients with cirrhosis at Child-Pugh class C from the whole cirrhosis cases.

Download Full-text

A validated novel preoperative index to predict the extent of intraperitoneal contamination in patients with acute abdominal pathology: A cohort study

Journal of Perioperative Practice ◽

10.1177/1750458919875592 ◽

2019 ◽

Vol 30 (7-8) ◽

pp. 221-228

Author(s):

Shahab Hajibandeh ◽

Shahin Hajibandeh ◽

Nicholas Hobbs ◽

Jigar Shah ◽

Matthew Harris ◽

...

Keyword(s):

Receiver Operating Characteristic Curve ◽

Receiver Operating Characteristic ◽

Operating Characteristic ◽

Validation Cohort ◽

Characteristic Curve ◽

Curve Analysis ◽

Emergency Laparotomy ◽

Contamination Index ◽

Operating Characteristic Curve ◽

Receiver Operating

Aims To investigate whether an intraperitoneal contamination index (ICI) derived from combined preoperative levels of C-reactive protein, lactate, neutrophils, lymphocytes and albumin could predict the extent of intraperitoneal contamination in patients with acute abdominal pathology. Methods Patients aged over 18 who underwent emergency laparotomy for acute abdominal pathology between January 2014 and October 2018 were randomly divided into primary and validation cohorts. The proposed intraperitoneal contamination index was calculated for each patient in each cohort. Receiver operating characteristic curve analysis was performed to determine discrimination of the index and cut-off values of preoperative intraperitoneal contamination index that could predict the extent of intraperitoneal contamination. Results Overall, 468 patients were included in this study; 234 in the primary cohort and 234 in the validation cohort. The analyses identified intraperitoneal contamination index of 24.77 and 24.32 as cut-off values for purulent contamination in the primary cohort (area under the curve (AUC): 0.73, P < 0.0001; sensitivity: 84%, specificity: 60%) and validation cohort (AUC: 0.83, P < 0.0001; sensitivity: 91%, specificity: 69%), respectively. Receiver operating characteristic curve analysis also identified intraperitoneal contamination index of 33.70 and 33.41 as cut-off values for feculent contamination in the primary cohort (AUC: 0.78, P < 0.0001; sensitivity: 87%, specificity: 64%) and validation cohort (AUC: 0.79, P < 0.0001; sensitivity: 86%, specificity: 73%), respectively. Conclusions As a predictive measure which is derived purely from biomarkers, intraperitoneal contamination index may be accurate enough to predict the extent of intraperitoneal contamination in patients with acute abdominal pathology and to facilitate decision-making together with clinical and radiological findings.

Download Full-text

Evaluation of factors that predict the success rate of trial of labor after the cesarean section

BMC Pregnancy and Childbirth ◽

10.1186/s12884-021-04004-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yang Mi ◽

Pengfei Qu ◽

Na Guo ◽

Ruimiao Bai ◽

Jiayi Gao ◽

...

Keyword(s):

Logistic Regression ◽

Cesarean Section ◽

Receiver Operating Characteristic Curve ◽

Success Rate ◽

Operating Characteristic ◽

Characteristic Curve ◽

Predictive Ability ◽

Training Set ◽

History Of ◽

Operating Characteristic Curve

Abstract Background For most women who have had a previous cesarean section, vaginal birth after cesarean section (VBAC) is a reasonable and safe choice, but which will increase the risk of adverse outcomes such as uterine rupture. In order to reduce the risk, we evaluated the factors that may affect VBAC and and established a model for predicting the success rate of trial of the labor after cesarean section (TOLAC). Methods All patients who gave birth at Northwest Women’s and Children’s Hospital from January 2016 to December 2018, had a history of cesarean section and voluntarily chose the TOLAC were recruited. Among them, 80% of the population was randomly assigned to the training set, while the remaining 20% were assigned to the external validation set. In the training set, univariate and multivariate logistic regression models were used to identify indicators related to successful TOLAC. A nomogram was constructed based on the results of multiple logistic regression analysis, and the selected variables included in the nomogram were used to predict the probability of successfully obtaining TOLAC. The area under the receiver operating characteristic curve was used to judge the predictive ability of the model. Results A total of 778 pregnant women were included in this study. Among them, 595 (76.48%) successfully underwent TOLAC, whereas 183 (23.52%) failed and switched to cesarean section. In multi-factor logistic regression, parity = 1, pre-pregnancy BMI < 24 kg/m2, cervical score ≥ 5, a history of previous vaginal delivery and neonatal birthweight < 3300 g were associated with the success of TOLAC. The area under the receiver operating characteristic curve in the prediction and validation models was 0.815 (95% CI: 0.762–0.854) and 0.730 (95% CI: 0.652–0.808), respectively, indicating that the nomogram prediction model had medium discriminative power. Conclusion The TOLAC was useful to reducing the cesarean section rate. Being primiparous, not overweight or obese, having a cervical score ≥ 5, a history of previous vaginal delivery or neonatal birthweight < 3300 g were protective indicators. In this study, the validated model had an approving predictive ability.

Download Full-text

The gut hormone GLP-2 predicts cardiovascular risk in patients with acute myocardial infarction

European Heart Journal ◽

10.1093/ehjci/ehaa946.1592 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

F Kahles ◽

R.W Mertens ◽

M.V Rueckbeil ◽

M.C Arrivas ◽

J Moellmann ◽

...

Keyword(s):

Myocardial Infarction ◽

Cardiovascular Disease ◽

Acute Myocardial Infarction ◽

Receiver Operating Characteristic Curve ◽

Operating Characteristic ◽

Characteristic Curve ◽

Funding Source ◽

Cardiovascular Prognosis ◽

Operating Characteristic Curve ◽

Hs Crp

Abstract Background GLP-1 and GLP-2 (glucagon-like peptide-1/2) are gut derived hormones that are co-secreted from intestinal L-cells in response to food intake. While GLP-1 is known to induce postprandial insulin secretion, GLP-2 enhances intestinal nutrient absorption and is clinically used for the treatment of patients with short bowel syndrome. The relevance of the GLP-2 system for cardiovascular disease is unknown. Purpose The aim of this study was to assess the predictive capacity of GLP-2 for cardiovascular prognosis in patients with myocardial infarction. Methods Total GLP-2 levels, NT-proBNP concentrations and the Global Registry of Acute Coronary Events (GRACE) score were assessed at time of admission in 918 patients with myocardial infarction, among them 597 patients with NSTEMI and 321 with STEMI. The primary composite outcome of the study was the first occurrence of cardiovascular death, nonfatal myocardial infarction, or nonfatal stroke (3-P-MACE) with a median follow-up of 311 days. Results Kaplan-Meier survival plots (separated by the median of GLP-2 with a cut-off value of 4.4 ng/mL) and univariable cox regression analyses found GLP-2 values to be associated with adverse outcome (logarithmized GLP-2 values HR: 2.87; 95% CI: 1.75–4.68; p<0.0001). Further adjustment for age, sex, smoking, hypertension, hypercholesterolemia, diabetes mellitus, family history of cardiovascular disease, hs-Troponin T, NT-proBNP and hs-CRP levels did not affect the association of GLP-2 with poor prognosis (logarithmized GLP-2 values HR: 2.96; 95% CI: 1.38–6.34; p=0.0053). Receiver operating characteristic curve (ROC) analyses illustrated that GLP-2 is a strong indicator for cardiovascular events and proved to be comparable to other established risk markers (area under the curve of the combined endpoint at 6 months; GLP-2: 0.72; hs-Troponin: 0.56; NT-proBNP: 0.70; hs-CRP: 0.62). Adjustment of the GRACE risk estimate by GLP-2 increased the area under the receiver-operating characteristic curve for the combined triple endpoint after 6 months from 0.70 (GRACE) to 0.75 (GRACE + GLP-2) in NSTEMI patients. Addition of GLP-2 to a model containing GRACE and NT-proBNP led to a further improvement in model performance (increase in AUC from 0.72 for GRACE + NT-proBNP to 0.77 for GRACE + NT-proBNP + GLP-2). Conclusions In patients admitted with acute myocardial infarction, GLP-2 levels are associated with adverse cardiovascular prognosis. This demonstrates a strong yet not appreciated crosstalk between the heart and the gut with relevance for cardiovascular outcome. Future studies are needed to further explore this crosstalk with the possibility of new treatment avenues for cardiovascular disease. Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): German Society of Cardiology (DGK), German Research Foundation (DFG)

Download Full-text