recall curve
Recently Published Documents


TOTAL DOCUMENTS

40
(FIVE YEARS 26)

H-INDEX

5
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Qiu-Xia Feng ◽  
Bo Tang ◽  
Xi-Sheng Liu

Abstract Background: The study aimed to evaluate the diagnostic performance of machine learning-based CT radiomics models for predicting the recurrence and metastasis of gastrointestinal stromal tumors (GISTs) preoperatively.Methods: A total of 382 patients with histopathological confirmed GISTs were retrospectively included. According to postoperative follow-up, patients were classified into non-recurrence and metastasis group (NRM) and recurrence or metastasis group (RM). Radiomics features were extracted from arterial and portal venous phase CT images. Four feature selection methods and ten machine learning techniques were used to train predicting models on training cohort with internal validation by 10-fold cross-validation. F1 score was used to evaluate the performance of the classification model. The best model of two phase were stacked to build an ensemble model. The area under the curve (AUC), recall, precision, accuracy, and F1 score were used to evaluate the performance of the models and compare with clinical criteria based on diameter.Results: Eighty machine learning models in two phases were built and the ensemble model was integrated by analysis of variance and Naive Bayes (ANOVA_NB) model in arterial phase which selected only 5 features provided the highest F1 Score of 0.560 and Kruskal Wallis and Adaptive Boosting (KW_ AdaBoost) model in venous phase which selected only 4 features provided the highest F1 Score of 0.500. The AUC of the generated ensemble model and the clinical criteria showed no difference (0.866 vs 0.857; DeLong Test, P = 0.865). But the ensemble model had higher accuracy (0.961), recall (0.826), precision (0.905), F1 Score (0.864), and the area under the Precision-Recall curve (0.774; 95%CI, 0.552 - 0.917), compared with clinical criteria, of which, the accuracy was 0.942, recall was 0.367, precision was 0.478, the F1 Score was 0.415 and the area under the Precision-Recall curve was 0.354(95%CI, 0.552 - 0.917).Conclusions: Our findings highlight the potential of machine learning techniques based on CT radiomics in the prediction of recurrence and metastasis of GISTs preoperatively.


2021 ◽  
Author(s):  
Qiu-Xia Feng ◽  
Lu-Lu Xu ◽  
Qiong Li ◽  
Xiao-Ting Jiang ◽  
Bo Tang ◽  
...  

Abstract Background The study aimed to evaluate the diagnostic performance of machine learning-based CT radiomics models for predicting the recurrence and metastasis of gastrointestinal stromal tumors (GISTs) preoperatively. Methods A total of 382 patients with histopathological confirmed GISTs were retrospectively included. According to postoperative follow-up, patients were classified into non-recurrence and metastasis group (NRM) and recurrence or metastasis group (RM). Radiomics features were extracted from arterial and portal venous phase CT images. Four feature selection methods and ten machine learning techniques were used to train predicting models on training cohort with internal validation by 10-fold cross-validation. F1 score was used to evaluate the performance of the classification model. The best model of two phase were stacked to build an ensemble model. The area under the curve (AUC), recall, precision, accuracy, and F1 score were used to evaluate the performance of the models and compare with clinical criteria based on diameter. Results Eighty machine learning models in two phases were built and the ensemble model was integrated by analysis of variance and Naive Bayes (ANOVA_NB) model in arterial phase which selected only 5 features provided the highest F1 Score of 0.560 and Kruskal Wallis and Adaptive Boosting (KW_ AdaBoost) model in venous phase which selected only 4 features provided the highest F1 Score of 0.500. The AUC of the generated ensemble model and the clinical criteria showed no difference (0.866 vs 0.857; DeLong Test, P = 0.865). But the ensemble model had higher accuracy (0.961), recall (0.826), precision (0.905), F1 Score (0.864), and the area under the Precision-Recall curve (0.774; 95%CI, 0.552 - 0.917), compared with clinical criteria, of which, the accuracy was 0.942, recall was 0.367, precision was 0.478, the F1 Score was 0.415 and the area under the Precision-Recall curve was 0.354(95%CI, 0.552 - 0.917). Conclusions Our findings highlight the potential of machine learning techniques based on CT radiomics in the prediction of recurrence and metastasis of GISTs preoperatively.


Author(s):  
Aaron S. Coyner ◽  
Jimmy S. Chen ◽  
Praveer Singh ◽  
Robert L. Schelonka ◽  
Brian K. Jordan ◽  
...  

BACKGROUND AND OBJECTIVES Retinopathy of prematurity (ROP) is a leading cause of childhood blindness. Screening and treatment reduces this risk, but requires multiple examinations of infants, most of whom will not develop severe disease. Previous work has suggested that artificial intelligence may be able to detect incident severe disease (treatment-requiring retinopathy of prematurity [TR-ROP]) before clinical diagnosis. We aimed to build a risk model that combined artificial intelligence with clinical demographics to reduce the number of examinations without missing cases of TR-ROP. METHODS Infants undergoing routine ROP screening examinations (1579 total eyes, 190 with TR-ROP) were recruited from 8 North American study centers. A vascular severity score (VSS) was derived from retinal fundus images obtained at 32 to 33 weeks’ postmenstrual age. Seven ElasticNet logistic regression models were trained on all combinations of birth weight, gestational age, and VSS. The area under the precision-recall curve was used to identify the highest-performing model. RESULTS The gestational age + VSS model had the highest performance (mean ± SD area under the precision-recall curve: 0.35 ± 0.11). On 2 different test data sets (n = 444 and n = 132), sensitivity was 100% (positive predictive value: 28.1% and 22.6%) and specificity was 48.9% and 80.8% (negative predictive value: 100.0%). CONCLUSIONS Using a single examination, this model identified all infants who developed TR-ROP, on average, >1 month before diagnosis with moderate to high specificity. This approach could lead to earlier identification of incident severe ROP, reducing late diagnosis and treatment while simultaneously reducing the number of ROP examinations and unnecessary physiologic stress for low-risk infants.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruben Chevez-Guardado ◽  
Lourdes Peña-Castillo

AbstractPromoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compare Promotech’s performance with the performance of five other promoter prediction methods. Promotech outperforms these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech.


10.2196/29807 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e29807
Author(s):  
Eunsaem Lee ◽  
Se Young Jung ◽  
Hyung Ju Hwang ◽  
Jaewoo Jung

Background Nationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient-level prediction models should be developed. Objective We aimed to develop cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real-world environments. Methods As source data, we used the Korean National Insurance System Database. Every Korean in ≥40 years old undergoes a national health checkup every 2 years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, and previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, survival analysis, and one-class embedding classifier methods to effectively analyze high dimension data based on deep learning–based anomaly detection. Performance was measured with area under the curve and area under precision recall curve. We validated our models externally with a health checkup database from a tertiary hospital. Results The one-class embedding classifier model received the highest area under the curve scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749, and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. For area under precision recall curve, the light gradient boosting models had the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357, and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. Conclusions Our results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The 7 models showed acceptable performances and explainability, and thus can be distributed easily in real-world environments.


2021 ◽  
Author(s):  
Ruben Chevez-Guardado ◽  
Lourdes Pena-Castillo

Promoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compared Promotech's performance with the performance of five other promoter prediction methods. Promotech outperformed these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Qian M. Zhou ◽  
Lu Zhe ◽  
Russell J. Brooke ◽  
Melissa M. Hudson ◽  
Yan Yuan

Abstract Background Incremental value (IncV) evaluates the performance change between an existing risk model and a new model. Different IncV metrics do not always agree with each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of disagreement is not uncommon, and can create confusion when assessing whether the added information improves the model prediction accuracy. Methods In this article, we examine the analytical connections and differences between the AUC IncV (ΔAUC) and AP IncV (ΔAP). We also compare the true values of these two IncV metrics in a numerical study. Additionally, as both are semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (ΔsBrS) in the numerical study. Results We demonstrate that ΔAUC and ΔAP are both weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, ΔAP assigns heavier weights to the changes in higher-risk regions, whereas ΔAUC weights the changes equally. Due to this difference, the two IncV metrics can disagree, and the numerical study shows that their disagreement becomes more pronounced as the event rate decreases. In the numerical study, we also find that ΔAP has a wide range, from negative to positive, but the range of ΔAUC is much smaller. In addition, ΔAP and ΔsBrS are highly consistent, but ΔAUC is negatively correlated with ΔsBrS and ΔAP when the event rate is low. Conclusions ΔAUC treats the wins and losses of a new risk model equally across different risk regions. When neither the existing or new model is the true model, this equality could attenuate a superior performance of the new model for a sub-region. In contrast, ΔAP accentuates the change in the prediction accuracy for higher-risk regions.


Author(s):  
Makoto Mori ◽  
Thomas J.S. Durant ◽  
Chenxi Huang ◽  
Bobak J. Mortazavi ◽  
Andreas Coppi ◽  
...  

Background: Intraoperative data may improve models predicting postoperative events. We evaluated the effect of incorporating intraoperative variables to the existing preoperative model on the predictive performance of the model for coronary artery bypass graft. Methods: We analyzed 378 572 isolated coronary artery bypass graft cases performed across 1083 centers, using the national Society of Thoracic Surgeons Adult Cardiac Surgery Database between 2014 and 2016. Outcomes were operative mortality, 5 postoperative complications, and composite representation of all events. We fitted models by logistic regression or extreme gradient boosting (XGBoost). For each modeling approach, we used preoperative only, intraoperative only, or pre+intraoperative variables. We developed 84 models with unique combinations of the 3 variable sets, 2 variable selection methods, 2 modeling approaches, and 7 outcomes. Each model was tested in 20 iterations of 70:30 stratified random splitting into development/testing samples. Model performances were evaluated on the testing dataset using the C statistic, area under the precision-recall curve, and calibration metrics, including the Brier score. Results: The mean patient age was 65.3 years, and 24.7% were women. Operative mortality, excluding intraoperative death, occurred in 1.9%. In all outcomes, models that considered pre+intraoperative variables demonstrated significantly improved Brier score and area under the precision-recall curve compared with models considering pre or intraoperative variables alone. XGBoost without external variable selection had the best C statistics, Brier score, and area under the precision-recall curve values in 4 of the 7 outcomes (mortality, renal failure, prolonged ventilation, and composite) compared with logistic regression models with or without variable selection. Based on the calibration plots, risk restratification for mortality showed that the logistic regression model underestimated the risk in 11 114 patients (9.8%) and overestimated in 12 005 patients (10.6%). In contrast, the XGBoost model underestimated the risk in 7218 patients (6.4%) and overestimated in 0 patients (0%). Conclusions: In isolated coronary artery bypass graft, adding intraoperative variables to preoperative variables resulted in improved predictions of all 7 outcomes. Risk models based on XGBoost may provide a better prediction of adverse events to guide clinical care.


2021 ◽  
Vol 28 (1) ◽  
pp. e100312
Author(s):  
Christos A Makridis ◽  
Tim Strebel ◽  
Vincent Marconi ◽  
Gil Alterovitz

Using administrative data on all Veterans who enter Department of Veterans Affairs (VA) medical centres throughout the USA, this paper uses artificial intelligence (AI) to predict mortality rates for patients with COVID-19 between March and August 2020. First, using comprehensive data on over 10 000 Veterans’ medical history, demographics and lab results, we estimate five AI models. Our XGBoost model performs the best, producing an area under the receive operator characteristics curve (AUROC) and area under the precision-recall curve of 0.87 and 0.41, respectively. We show how focusing on the performance of the AUROC alone can lead to unreliable models. Second, through a unique collaboration with the Washington D.C. VA medical centre, we develop a dashboard that incorporates these risk factors and the contributing sources of risk, which we deploy across local VA medical centres throughout the country. Our results provide a concrete example of how AI recommendations can be made explainable and practical for clinicians and their interactions with patients.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Richard Zuech ◽  
John Hancock ◽  
Taghi M. Khoshgoftaar

AbstractClass imbalance is an important consideration for cybersecurity and machine learning. We explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 dataset. This study considers a total of eight random undersampling (RUS) ratios: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. Additionally, seven different classifiers are employed: Decision Tree (DT), Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Naive Bayes (NB), and Logistic Regression (LR). For classification performance metrics, Area Under the Receiver Operating Characteristic Curve (AUC) and Area Under the Precision-Recall Curve (AUPRC) are both utilized to answer the following three research questions. The first question asks: “Are various random undersampling ratios statistically different from each other in detecting web attacks?” The second question asks: “Are different classifiers statistically different from each other in detecting web attacks?” And, our third question asks: “Is the interaction between different classifiers and random undersampling ratios significant for detecting web attacks?” Based on our experiments, the answers to all three research questions is “Yes”. To the best of our knowledge, we are the first to apply random undersampling techniques to web attacks from the CSE-CIC-IDS2018 dataset while exploring various sampling ratios.


Sign in / Sign up

Export Citation Format

Share Document