A Modified AUC for Training Convolutional Neural Networks: Taking Confidence Into Account

Receiver operating characteristic (ROC) curve is an informative tool in binary classification and Area Under ROC Curve (AUC) is a popular metric for reporting performance of binary classifiers. In this paper, first we present a comprehensive review of ROC curve and AUC metric. Next, we propose a modified version of AUC that takes confidence of the model into account and at the same time, incorporates AUC into Binary Cross Entropy (BCE) loss used for training a Convolutional neural Network for classification tasks. We demonstrate this on three datasets: MNIST, prostate MRI, and brain MRI. Furthermore, we have published GenuineAI, a new python library, which provides the functions for conventional AUC and the proposed modified AUC along with metrics including sensitivity, specificity, recall, precision, and F1 for each point of the ROC curve.

Download Full-text

How Does Knowledge of the AUC Constrain the Set of Possible Ground-Truth Labelings?

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015425 ◽

2019 ◽

Vol 33 ◽

pp. 5425-5432

Author(s):

Jacob Whitehill

Keyword(s):

Machine Learning ◽

Empirical Evidence ◽

Recent Work ◽

Roc Curve ◽

Binary Classification ◽

Ground Truth ◽

Mathematical Structure ◽

Test Set ◽

Classification Tasks ◽

N Vector

Recent work on privacy-preserving machine learning has considered how datamining competitions such as Kaggle could potentially be “hacked”, either intentionally or inadvertently, by using information from an oracle that reports a classifier’s accuracy on the test set (Blum and Hardt 2015; Hardt and Ullman 2014; Zheng 2015; Whitehill 2016). For binary classification tasks in particular, one of the most common accuracy metrics is the Area Under the ROC Curve (AUC), and in this paper we explore the mathematical structure of how the AUC is computed from an n-vector of real-valued “guesses” with respect to the ground-truth labels. Under the assumption of perfect knowledge of the test set AUC c=p/q, we show how knowing c constrains the set W of possible ground-truth labelings, and we derive an algorithm both to compute the exact number of such labelings and to enumerate efficiently over them. We also provide empirical evidence that, surprisingly, the number of compatible labelings can actually decrease as n grows, until a test set-dependent threshold is reached. Finally, we show how W can be efficiently whittled down, through pairs of oracle queries, to infer all the groundtruth test labels with complete certainty.

Download Full-text

Sensitivity and Specificity of On-Field Visible Signs of Concussion in the National Football League

Neurosurgery ◽

10.1093/neuros/nyaa072 ◽

2020 ◽

Vol 87 (3) ◽

pp. 530-537 ◽

Cited By ~ 5

Author(s):

Robert J Elbin ◽

Scott L Zuckerman ◽

Allen K Sills ◽

Jeff R Crandall ◽

David J Lessley ◽

...

Keyword(s):

Sensitivity And Specificity ◽

Roc Curve ◽

Operating Characteristic ◽

National Football League ◽

Area Under The Curve ◽

Clinical Decision ◽

Predictive Utility ◽

Frequency Sensitivity ◽

Video Footage ◽

Sensitivity Specificity

Abstract BACKGROUND On-field visible signs (VS) are used to help identify sport-related concussion (SRC) in the National Football League (NFL). However, the predictive utility of a VS checklist for SRC is unknown. OBJECTIVE To report the frequency, sensitivity, specificity, and predictive value of VS in a cohort of NFL athletes. METHODS On-field VS ratings from 2 experts who independently reviewed video footage of a cohort of 251 injury plays that resulted in an SRC diagnosis (n = 211) and no diagnosis (n = 40) from the 2017 NFL season were examined. The frequency, sensitivity, specificity, and a receiver operating characteristic (ROC) curve with area under the curve (AUC) were calculated for each VS. RESULTS Slow to get up (65.9%) and motor incoordination (28.4%) were the most frequent VS in concussed athletes, and slow to get up (60.0%) was the most common VS among nonconcussed athletes. The most sensitive VS was slow to get up (66%); the most specific signs in concussed NFL athletes were blank/vacant look and impact seizure (both 100%). Approximately 26% of concussed NFL players did not exhibit a VS, and the overall sensitivity and specificity for the VS checklist to detect SRC were 73% and 65%, respectively. The VS checklist demonstrated “poor” ability to discriminate between SRC and non-SRC groups (AUC = 0.66). CONCLUSION In the NFL, the diagnosis of concussion cannot be made from on-field VS alone. The VS checklist is one part of the comprehensive sideline/acute evaluation of concussion, and the diagnosis remains a multimodal clinical decision.

Download Full-text

Area under ROC curve, sensitivity, specificity of N-terminal probrain natriuretic peptide in predicting mortality in various subsets of patients with ischemic heart disease

Clinical Research in Cardiology ◽

10.1007/s00392-007-0562-4 ◽

2007 ◽

Vol 96 (10) ◽

pp. 763-765 ◽

Cited By ~ 5

Author(s):

Gjin Ndrepepa ◽

Siegmund Braun ◽

Adnan Kastrati ◽

Albert Schömig

Keyword(s):

Heart Disease ◽

Ischemic Heart Disease ◽

Natriuretic Peptide ◽

Roc Curve ◽

Ischemic Heart ◽

Area Under Roc Curve ◽

Sensitivity Specificity

Download Full-text

Combination IETA Ultrasonographic Characteristics Simple Scoring Method With Tumor Biomarkers Effectively Improves the Differentiation Ability of Benign and Malignant Lesions in Endometrium and Uterine Cavity

Frontiers in Oncology ◽

10.3389/fonc.2021.605847 ◽

2021 ◽

Vol 11 ◽

Author(s):

Dongmei Lin ◽

Liang Zhao ◽

Yunxiao Zhu ◽

Yujun Huang ◽

Kun Yuan ◽

...

Keyword(s):

Roc Curve ◽

Operating Characteristic ◽

Uterine Cavity ◽

Tumor Biomarkers ◽

Coincidence Rate ◽

Benign Lesions ◽

Scoring Method ◽

Cutoff Value ◽

Malignant Lesions ◽

Sensitivity Specificity

ObjectivesTo evaluate International Endometrial Tumor Analysis (IETA) ultrasonographic characteristics simple scoring method and tumor biomarkers for the diagnosis of uterine cavity and endometrial lesions.MethodsWe classified and scored the normalized description of IETA ultrasonic characteristics, according to IETA expert consensus literature, previous IETA-related research articles, and the previous research experience of this project group. We conducted a retrospective analysis of the ultrasound images of 594 patients enrolled from January 2017 to June 2020, scored them item by item, and finally calculated the total score of each case. Meanwhile, we combined the results of seven tumor biomarkers. The objective was to evaluate the sensitivity, specificity, coincidence rate, and the area under receiver operating characteristic (ROC) curve of IETA ultrasonographic characteristics simple scoring method and tumor biomarkers for benign and malignant uterine cavity or endometrial lesions. The diagnostic efficiency between the combined method and the single method was compared.ResultsA total of 594 cases were confirmed by postoperative pathology or surgery records, including 475 benign lesions and 119 malignant lesions. In the simple ultrasound scoring method, the average score of benign lesions was 3.879 ± 1.279 and that of malignant lesions was 9.676 ± 4.491. If ≥6.5 points was taken as the cutoff value for the judgment of malignant lesions, the sensitivity, specificity, coincidence rate, and the area under receiver operating characteristic (ROC) curve (AUC) were 76.5%, 96.0%, 92.1%, and 0.935, respectively. The difference in tumor antigen 19-9 (CA19-9) and human epididymal protein 4 (HE4) between benign and malignant lesions was statistically significant (all p ≤ 0.01). The other five tumor biomarkers (CA125, CA15-3, SCC-Ag, AFP, and CEA) showed no statistically significant difference in benign and malignant lesions. If the value of CA19-9 ≥13.96 U/ml was taken as cutoff value, the sensitivity, specificity, and coincidence rate of the diagnosis of endometrial benign and malignant lesions were 54.8%, 74.7%, and 70.7%, respectively, and the AUC was 0.620. If the value of HE4 ≥ 39.075 pmol/L was taken as cutoff point, the sensitivity, specificity, coincidence rate, and AUC were 77.4%, 67.9%, 69.8%, and 0.796, respectively. The sensitivity was increased to 97.6% and the AUC was 0.939 when IETA ultrasound characteristics simple scoring method combined CA19-9 and HE4 in parallel test.ConclusionsIn IETA ultrasound characteristics simple scoring method, with ≥6.5 points as the cutoff value, it could quickly and accurately assess the benign and malignant in uterine cavity and endometrial lesions, with high diagnostic value. The diagnostic efficacy of seven tumor biomarkers was all mediocre. Combining with these two methods, the comprehensive diagnosis could improve sensitivity and accuracy and reduce the risk of missed diagnosis.

Download Full-text

Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification

Proceedings of the 3rd International Conference on Machine Learning and Soft Computing - ICMLSC 2019 ◽

10.1145/3310986.3311023 ◽

2019 ◽

Cited By ~ 3

Author(s):

Chongomweru Halimu ◽

Asem Kasem ◽

S. H. Shah Newaz

Keyword(s):

Machine Learning ◽

Correlation Coefficient ◽

Roc Curve ◽

Binary Classification ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Empirical Comparison ◽

Imbalanced Datasets ◽

Area Under Roc Curve

Download Full-text

A novel parameter derived from post-processing procedure of dual energy CT for identification of gout

Scientific Reports ◽

10.1038/s41598-021-01100-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chunlin Xiang ◽

Hongyan Zhang ◽

Gang Wu

Keyword(s):

Lower Limit ◽

Roc Curve ◽

Operating Characteristic ◽

Dual Energy ◽

Dual Energy Ct ◽

Post Processing ◽

Optimal Cutoff ◽

Processing Procedure ◽

Area Under Roc Curve ◽

Display Window

AbstractROI analysis is frequently used for obtaining acid content on rapid-kV-switching dual energy CT (DECT), providing inadequate accuracy. A new parameter derived from post-processing procedure, maximum lower limit with stain visible (MLLSV), was used by us to diagnose gout. 30 gout patients and 20 healthy volunteers were analyzed by using MLLSV. MLLSV was defined as the maximum lower limit of display window allowing only one stained site visible. Radiologists were asked to continuously increase the lower limit of display window of uric acid to decrease number of stained sites until the last stained site disappeared. MLLSV obtained by this way was compared between gout patients and volunteers. Receiver operating characteristic (ROC) curve was used to determine the performance. MLLSV of gout patients was significantly higher than that of volunteers (1373.3 ± 23.0 mg/cm3 vs. 1315.4 ± 20.7 mg/cm3, p = 0.000). The area under ROC curve of MLLSV was 0.993 in identifying gout. When using the optimal cutoff of 1342 mg/cm3, the sensitivity and specificity of MLLSV in identification of gout were 96.7% and 95% respectively. MLLSV derived from post-processing procedure of DECT is useful in discriminating gout patients from healthy people.

Download Full-text

A case-control clinical trial on the diagnostic performance for Alzheimer’s Disease of a deep learning-based classification system using brain magnetic resonance imaging

10.21203/rs.3.rs-754254/v1 ◽

2021 ◽

Author(s):

Jong Bin Bae ◽

Subin Lee ◽

Hyunwoo Oh ◽

Jinkyeong Sung ◽

Dongsoo Lee ◽

...

Keyword(s):

Predictive Value ◽

Diagnostic Performance ◽

Operating Characteristic ◽

Control Clinical Trial ◽

Characteristic Curve ◽

Brain Mri ◽

Case Control ◽

Operating Characteristic Curve ◽

Sensitivity Specificity ◽

Normal Controls

Abstract Objective To investigate diagnostic performance of a deep learning-based classification system using structural brain MRI (DLCS) for Alzheimer’s disease (AD). Methods A single-center, case-control clinical trial was conducted. T1-weighted brain MRI scans of 188 patients with mild cognitive impairment or dementia due to AD and 162 cognitively normal controls were retrospectively collected. The patients were amyloid beta (Aβ)-positive, whereas the controls were Aβ-negative, on 18F-florbetaben positron emission tomography. Sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiver operating characteristic curve were calculated to evaluate the performance of DLCS in the classification of Aβ-positive AD patients from Aβ-negative controls. Results The DLCS was excellent in classifying AD patients from normal controls; sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiver operating characteristic curve for AD were 85.6% (95%CI, 79.8–90), 90.1% (95%CI, 84.5–94.2), 91.0% (95%CI, 86.3–94.1), 84.4% (95%CI, 79.2–88.5), and 0.937 (95%CI, 0.911–0.963), respectively. Conclusion The DLCS shows promise in clinical settings where it may improve early detection of AD in any individual who has undergone an MRI scan regardless of purpose. Trial registration: Korean Clinical Trials Registry, KCT0004758. Registered 21 February 2020, https://cris.nih.go.kr/cris/search/detailSearch.do/17665.

Download Full-text

CSF markers for diagnosis of bacterial meningitis in neurosurgical postoperative patients

Arquivos de Neuro-Psiquiatria ◽

10.1590/s0004-282x2006000400012 ◽

2006 ◽

Vol 64 (3a) ◽

pp. 592-595 ◽

Cited By ~ 23

Author(s):

Wagner Malagó Tavares ◽

Andre Guelman Machado ◽

Hamilton Matushita ◽

Jose Pindaro P. Plese

Keyword(s):

Bacterial Meningitis ◽

Prospective Study ◽

Receiver Operating Characteristic ◽

Aseptic Meningitis ◽

Roc Curve ◽

Cerebral Spinal Fluid ◽

Operating Characteristic ◽

Neurosurgical Patients ◽

Area Under Roc Curve ◽

Diagnostic Usefulness

OBJECTIVE: To evaluate the diagnostic usefulness of cerebral spinal fluid (CSF) cellularity, protein, neutrophils, glucose and lactate for detection of postoperative bacterial meningitis. METHOD: This prospective study was conducted in 28 postoperative neurosurgical patients from 2002 to 2005 at University of São Paulo. The CSF markers were plotted in a receiver operating characteristic (ROC) curve to evaluate their accuracy. RESULTS: Based on the area under ROC curve CSF glucose, cellularity, and lactate were considered good tests. Polymorphonuclear and protein did not achieve enough accuracy to be used clinically. CONCLUSION: The CSF glucose, lactate, and cellularity can be used for the diagnosis of bacterial meningitis. Moreover, it can be helpful to differentiate bacterial from aseptic meningitis.

Download Full-text

Diagnostic Value of Clinical Prediction Scores for Acute Appendicitis in Children Younger than 4 Years

European Journal of Pediatric Surgery ◽

10.1055/s-0041-1722860 ◽

2021 ◽

Author(s):

Ricardo Rassi ◽

Florencia Muse ◽

José Sánchez-Martínez ◽

Eduardo Cuestas

Keyword(s):

Acute Appendicitis ◽

Roc Curve ◽

Operating Characteristic ◽

Diagnostic Value ◽

Clinical Prediction ◽

Alvarado Score ◽

Predictive Values ◽

Optimal Criterion ◽

Area Under Roc Curve ◽

Appendicitis Score

Abstract Introduction Acute appendicitis can be difficult to diagnose, especially in children < 4 years old. The aim of the present study was to assess the diagnostic value of Alvarado score (AS), appendicitis inflammatory response (AIR) score, and pediatric appendicitis score (PAS) in children younger than 4 years. Materials and Methods All children younger than 4 years who underwent appendicectomy between 2005 and 2019 were included retrospectively. The diagnostic performance of the scores was analyzed using the area under the receiver-operating characteristic (ROC) curve and by calculating the diagnostic performances at optimal criterion value cutoff points. Results In this study, 100 children were included (58 boys and 42 girls) with a median age of 39.5 (12–47) months. Ninety children were diagnosed with pathologically proven acute appendicitis. The area under ROC curve of AS was 0.73, AIR score was 0.79, and PAS was 0.69 (p > 0.05, respectively). In children with low risk of acute appendicitis, negative predictive values were 75.0% for AS, 50.0% for AIR score, and 66.7% for PAS (p < 0.05, respectively). The positive predictive values in children with high risk of acute appendicitis were of 92.7% for AS, 92.6% for AIR score, and 93.6% for PAS (p > 0.05, respectively). AS, AIR score, and PAS plus positive ultrasonography have 0.58, 0.49, and 0.88 area under ROC curve. Conclusion The three scores can be of assistance in the suspicion of acute appendicitis. PAS markedly improved combined with positive ultrasonography, but none can be used in setting the diagnosis of acute appendicitis in young children.

Download Full-text

Bases for Strategy Choice and Strategy Transitions in Binary Classification Tasks

PsycEXTRA Dataset ◽

10.1037/e537052012-284 ◽

2004 ◽

Author(s):

Lyle E. Bourne ◽

Alice F. Healy ◽

James A. Kole ◽

William D. Raymond

Keyword(s):

Binary Classification ◽

Strategy Choice ◽

Classification Tasks

Download Full-text