An Extension of the Receiver Operating Characteristic Curve and AUC-Optimal Classification

2012 ◽  
Vol 24 (10) ◽  
pp. 2789-2824 ◽  
Author(s):  
Takashi Takenouchi ◽  
Osamu Komori ◽  
Shinto Eguchi

While most proposed methods for solving classification problems focus on minimization of the classification error rate, we are interested in the receiver operating characteristic (ROC) curve, which provides more information about classification performance than the error rate does. The area under the ROC curve (AUC) is a natural measure for overall assessment of a classifier based on the ROC curve. We discuss a class of concave functions for AUC maximization in which a boosting-type algorithm including RankBoost is considered, and the Bayesian risk consistency and the lower bound of the optimum function are discussed. A procedure derived by maximizing a specific optimum function has high robustness, based on gross error sensitivity. Additionally, we focus on the partial AUC, which is the partial area under the ROC curve. For example, in medical screening, a high true-positive rate to the fixed lower false-positive rate is preferable and thus the partial AUC corresponding to lower false-positive rates is much more important than the remaining AUC. We extend the class of concave optimum functions for partial AUC optimality with the boosting algorithm. We investigated the validity of the proposed method through several experiments with data sets in the UCI repository.

1991 ◽  
Vol 124 (3) ◽  
pp. 295-306 ◽  
Author(s):  
A. D. Genazzani ◽  
D. Rodbard

Abstract. We utilize the "Receiver Operating Characteristic" to describe the relationship between sensitivity and specificity as the threshold for peak detection is varied systematically, to provide objective comparison of the performance of methods for detection of episodic hormonal secretion. A computer program was used to generate synthetic data with peaks with variable durations, with constant or variable height, shape and/or interpulse interval. This approach was used to compare the CLUSTER and DETECT programs. For both programs, the observed false positive rates estimated using signal-free data were in good agreement with the nominal rates, but in the presence of signal the observed false positive rates were systematically lower. Sensitivity increases with increasing signal/noise ratio, as expected. Program DETECT, using its standard options, provided excellent sensitivity (90-100%) with very low false positive rate under all conditions tested. Its performance could be further improved by the use of a more stringent definition of a peak requiring the presence of "UP" followed by a "DOWN". The CLUSTER program was found to have very poor sensitivity when using the "local variance" option. Use of the true fixed standard deviation or percent coefficient of variation resulted in a modest improvement. Optimal performance of program CLUSTER was obtained by the use of the best of 3 variance models, testing 12 different cluster sizes (from 1×1) to 4×4 and selecting the best among these: under these conditions it can achieve high sensitivity (90-100%) for very low observed false positive rate, such that its performance was comparable to that of DETECT. The methods developed and illustrated here should permit the definitive characterization and validation of the performance of any one method, the objective comparison of the relative performance of two or more methods for analysis of pulsatile hormone levels for episodic hormone secretion, and lead to the improvement of algorithms for peak detection.


1981 ◽  
Vol 27 (9) ◽  
pp. 1569-1574 ◽  
Author(s):  
E A Robertson ◽  
M H Zweig

Abstract The usefulness of an analytical system in patient care is ultimately judged not by its analytical performance but by its clinical performance, i.e., its ability to separate apparently similar patients into two subgroups, one of which has a particular clinically important condition and another subgroup which does not. This clinical performance can be studied with the tools of signal detectability theory, originally developed to analyze the performance of radar and data-transmission systems. Each classification made by an analytical system may be categorized as a true-positive, true-negative, false-positive, or false-negative decision. For laboratory tests the proportion of decisions in each category depends on the biological overlap between the two subgroups, the analytical performance of the system, and the decision level chosen. The clinical performance of the analytical system for all possible decision levels is represented by the receiver operating characteristic curve, which plots the true-positive rate against the false-positive rate. The use of these curves permits comparison of alternative analytical techniques at equal true-positive rates and at all possible decision levels. These comparisons show the effect of analytical improvements on clinical performance.


PEDIATRICS ◽  
1991 ◽  
Vol 87 (5) ◽  
pp. 670-674 ◽  
Author(s):  
David M. Jaffe ◽  
Gary R. Fleisher

This study was designed to quantify more precisely the accuracy of magnitude of rectal temperature and total white blood cell (WBC) count as indicators of bacteremia in children with an obvious focal bacterial infection. A total of 955 children, aged 3 to 36 months, who had rectal temperature ≥39.0°C and were seeking care at either of two urban pediatric emergency departments had blood drawn for culture; 885 had blood drawn for WBC count. Twenty-seven had bacteremia. Various combinations of temperature and WBC count were selected to construct receiver-operating-characteristic curves by plotting sensitivity vs false-positive rate (1 - specificity). The receiver-operating-characteristic curve of WBC count provided significantly better diagnostic information than the curve for temperature increments above 39.0°C. Each increment of 0.5°C led to large decrements in sensitivity and false-positive rates. At a WBC count cutoff of 10 000/mm3, the sensitivity was 92% while the false-positive rate was 57%. Using this cutoff point, the clinician could have avoided performing 368 of 955 blood cultures and missed only 2 of 26 children with bacteremia. Receiver-operating-characteristic curves combining WBC count and temperature increments above 39.0°C provided no better diagnostic information than that of WBC count at a temperature cutoff of 39.0°C. It is concluded that increments in temperature above 39.0°C provided additional diagnostic specificity for bacteremia only at the expense of unacceptable decreases in sensitivity. Total WBC count provided better information. A WBC count cutoff of 10 000/mm3 increased specificity with minimal decrease in sensitivity. Receiver-operating-characteristic curve analysis allows selection of cutoff criteria by individual practitioners based on the prevalence of bacteremia in their communities and on the perceived risks of bacteremia.


2021 ◽  
pp. 096228022110605
Author(s):  
Luigi Lavazza ◽  
Sandro Morasca

Receiver Operating Characteristic curves have been widely used to represent the performance of diagnostic tests. The corresponding area under the curve, widely used to evaluate their performance quantitatively, has been criticized in several respects. Several proposals have been introduced to improve area under the curve by taking into account only specific regions of the Receiver Operating Characteristic space, that is, the plane to which Receiver Operating Characteristic curves belong. For instance, a region of interest can be delimited by setting specific thresholds for the true positive rate or the false positive rate. Different ways of setting the borders of the region of interest may result in completely different, even opposing, evaluations. In this paper, we present a method to define a region of interest in a rigorous and objective way, and compute a partial area under the curve that can be used to evaluate the performance of diagnostic tests. The method was originally conceived in the Software Engineering domain to evaluate the performance of methods that estimate the defectiveness of software modules. We compare this method with previous proposals. Our method allows the definition of regions of interest by setting acceptability thresholds on any kind of performance metric, and not just false positive rate and true positive rate: for instance, the region of interest can be determined by imposing that [Formula: see text] (also known as the Matthews Correlation Coefficient) is above a given threshold. We also show how to delimit the region of interest corresponding to acceptable costs, whenever the individual cost of false positives and false negatives is known. Finally, we demonstrate the effectiveness of the method by applying it to the Wisconsin Breast Cancer Data. We provide Python and R packages supporting the presented method.


Author(s):  
Mario A. Cleves

The area under the receiver operating characteristic (ROC) curve is often used to summarize and compare the discriminatory accuracy of a diagnostic test or modality, and to evaluate the predictive power of statistical models for binary outcomes. Parametric maximum likelihood methods for fitting of the ROC curve provide direct estimates of the area under the ROC curve and its variance. Nonparametric methods, on the other hand, provide estimates of the area under the ROC curve, but do not directly estimate its variance. Three algorithms for computing the variance for the area under the nonparametric ROC curve are commonly used, although ambiguity exists about their behavior under diverse study conditions. Using simulated data, we found similar asymptotic performance between these algorithms when the diagnostic test produces results on a continuous scale, but found notable differences in small samples, and when the diagnostic test yields results on a discrete diagnostic scale.


2020 ◽  
Vol 11 (02) ◽  
pp. 261-266 ◽  
Author(s):  
Ramdas S. Ransing ◽  
Neha Gupta ◽  
Girish Agrawal ◽  
Nilima Mahapatro

Abstract Objective Panic disorder (PD) is associated with changes in platelet and red blood cell (RBC) indices. However, the diagnostic or predictive value of these indices is unknown. This study assessed the diagnostic and discriminating value of platelet and RBC indices in patients with PD. Materials and Methods In this cross-sectional study including patients with PD (n = 98) and healthy controls (n = 102), we compared the following blood indices: mean platelet volume (MPV), platelet distribution width (PDW), and RBC distribution width (RDW). The receiver operating characteristic (ROC) curve was used to calculate the area under the ROC curve (AUC), sensitivity, specificity, and likelihood ratio for the platelet and RBC indices. Results Statistically significant increase in PDW (17.01 ± 0.91 vs. 14.8 ± 2.06; p < 0.0001) and RDW (16.56 ± 2.32 vs. 15.12 ± 2.43; p < 0.0001) levels were observed in patients with PD. PDW and mean corpuscular hemoglobin concentration had larger AUC (0.89 and 0.74, respectively) and Youden’s index (0.65 and 0.39, respectively), indicating their higher predictive capacity as well as higher sensitivity in discriminating patients with PD from healthy controls. Conclusion PDW can be considered a “good” diagnostic or predictive marker in patients with PD.


2000 ◽  
Vol 23 (2) ◽  
pp. 134-139 ◽  
Author(s):  
Vinod Shidham ◽  
Dilip Gupta ◽  
Lorenzo M. Galindo ◽  
Marian Haber ◽  
Carolyn Grotkowski ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document