scholarly journals Human Versus Machine: How Do We Know Who Is Winning? ROC Analysis for Comparing Human and Machine Performance under Varying Cost-Prevalence Assumptions

Author(s):  
Michael Merry ◽  
Patricia Jean Riddle ◽  
Jim Warren

Abstract Background Receiver operating characteristic (ROC) analysis is commonly used for comparing models and humans; however, the exact analytical techniques vary and some are flawed. Objectives The aim of the study is to identify common flaws in ROC analysis for human versus model performance, and address them. Methods We review current use and identify common errors. We also review the ROC analysis literature for more appropriate techniques. Results We identify concerns in three techniques: (1) using mean human sensitivity and specificity; (2) assuming humans can be approximated by ROCs; and (3) matching sensitivity and specificity. We identify a technique from Provost et al using dominance tables and cost-prevalence gradients that can be adapted to address these concerns. Conclusion Dominance tables and cost-prevalence gradients provide far greater detail when comparing performances of models and humans, and address common failings in other approaches. This should be the standard method for such analyses moving forward.

2020 ◽  
Vol 98 (Supplement_3) ◽  
pp. 7-8
Author(s):  
Miriam S Martin ◽  
Michael Kleinhenz ◽  
Karen Schwartzkopf-Genswein ◽  
Johann Coetzee

Abstract Biomarkers are commonly used to assess pain and analgesic drug efficacy in livestock. However, the diagnostic sensitivity and specificity of these biomarkers for different pain conditions over time have not been described. Receiver operating characteristic (ROC) curves are graphical plots that illustrate the diagnostic ability of a test as its discrimination threshold is varied. The objective of this analysis was to use area under the curve (AUC) values derived from ROC analysis to assess the predictive value of pain biomarkers at specific timepoints. The biomarkers included in the analysis were blood cortisol, salivary cortisol, hair cortisol, infrared thermography (IRT), mechanical nociceptive threshold (MNT), substance P, and outcomes from a pressure/force measurement system and visual analog scale. A total sample size of 7,992 biomarker outcomes were collected from 6 pain studies involving pain associated with castration, dehorning, lameness, and surgery were included in the analysis. Each study consisted of three treatments; pain, no pain, and analgesia. All statistics were performed using statistical software (JMP Pro 14.0, SAS Institute, Inc., Cary, NC). Results comparing analgesia verses pain yielded good diagnostic accuracy (AUC > 0.7; 95% CI: 0.40 to 0.99) for blood cortisol (timepoints 1.5, 2, and 6 hours); IRT (timepoints 6, 8, 12, and 72 hours); and MNT (timepoints 6, 25, and 49 hours). These results indicate that ROC analysis can be a useful indicator of the predictive value of pain biomarkers and certain timepoints seem to yield good diagnostic accuracy while many do not.


2019 ◽  
Vol 25 (11) ◽  
pp. 1117-1126
Author(s):  
Christina S. Foley ◽  
Edwina C. Moore ◽  
Mira Milas ◽  
Eren Berber ◽  
Joyce Shin ◽  
...  

Objective: While intraoperative parathyroid hormone (IOPTH) monitoring with a ≥50% drop commonly guides the extent of exploration for primary hyperparathyroidism (pHPT), receiver operating characteristic (ROC) analysis has not been performed to determine whether other criteria yield better sensitivity and specificity. The aim of this study was to identify the optimum percent change of IOPTH following removal of the abnormal parathyroid pathology, in order to predict biochemical cure. Secondary aims were to identify patient subgroups with increased area under the ROC curve (AUC) and the need for moderated criteria. Methods: A retrospective review was performed on patients undergoing primary parathyroid surgery for sporadic pHPT between 1999 and 2010 at a tertiary center for endocrine surgery. Eight hundred and ninety-six patients with primary hyperparathyroidism were included. Multigland disease (MGD) was defined as the intraoperative detection of more than 1 enlarged hypercellular gland or persistent disease after single gland excision. ROC analysis was used to determine the value with the best performance at predicting MGD, following bilateral exploration. Results: MGD was diagnosed in 174 patients (19.4%). ROC analysis demonstrated an AUC of 0.69. An IOPTH drop of 72% was the point of optimal discrimination with a sensitivity of 55% and specificity of 76% for predicting MGD. Subgroup analysis by preoperative calcium, preoperative PTH, localization studies, or pre- and post-excision IOPTH, did not identify any factors associated with an improved AUC. Conclusion: To our knowledge, this is the first study to use ROC analysis in a large patient cohort. An IOPTH drop of 72% was found to have optimal discriminating ability. We failed to identify a subset of patients for whom there was substantial improvement in the AUC, sensitivity, or specificity. Abbreviations: AUC = area under the ROC curve; BE = bilateral neck exploration; FE = focal parathyroid exploration; IOPTH = intraoperative parathyroid hormone; MGD = multigland disease; MIBI = Tc99m-sestamibi I-123 subtraction single-photon emission computed tomography/computed tomography; pHPT = primary hyperparathyroidism; ROC = receiver operating characteristic; SGD = single gland disease; US = surgeon-performed neck ultrasound


2017 ◽  
Vol 20 (2) ◽  
pp. 122-127 ◽  
Author(s):  
Saverio Paltrinieri ◽  
Marco Fossati ◽  
Valentina Menaballi

Objectives The objective of this study was to evaluate the diagnostic performances of manual and instrumental measurement of reticulocyte percentage (Ret%), reticulocyte number (Ret#) and reticulocyte production index (RPI) to differentiate regenerative anaemia (RA) from non-regenerative anaemia (NRA) in cats. Methods Data from 106 blood samples from anaemic cats with manual counts (n = 74; 68 NRA, six RA) or instrumental counts of reticulocytes (n = 32; 25 NRA, seven RA) collected between 1995 and 2013 were retrospectively analysed. Sensitivity, specificity and positive likelihood ratio (LR+) were calculated using either cut-offs reported in the literature or cut-offs determined from receiver operating characteristic (ROC) curves. Results All the reticulocyte parameters were significantly higher in cats with RA than in cats with NRA. All the ROC curves were significantly different ( P <0.001) from the line of no discrimination, without significant differences between the three parameters. Using the cut-offs published in literature, the Ret% (cut-off: 0.5%) was sensitive (100%) but not specific (<75%), the RPI (cut-off: 1.0) was specific (>92%) but not sensitive (<15%), and the Ret# (cut-off: 50 × 10³/µl) had a sensitivity and specificity >80% and the highest LR+ (manual count: 14; instrumental count: 6). For all the parameters, sensitivity and specificity approached 100% using the cut-offs determined by the ROC curves. These cut-offs were higher than those reported in the literature for Ret% (manual: 1.70%; instrumental: 3.06%), lower for RPI (manual: 0.39; instrumental: 0.59) and variably different, depending on the method (manual: 41 × 10³/µl; instrumental: 57 × 10³/µl), for Ret#. Using these cut-offs, the RPI had the highest LR+ (manual: 22.7; instrumental: 12.5). Conclusions and relevance This study indicated that all the reticulocyte parameters may confirm regeneration when the pretest probability is high, while when this probability is moderate, RA should be identified using the RPI providing that cut-offs <1.0 are used.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Cheng-Hong Yang ◽  
Sin-Hua Moi ◽  
Li-Yeh Chuang ◽  
Shyng-Shiou F. Yuan ◽  
Ming-Feng Hou ◽  
...  

The interaction between the meiotic recombination 11 homolog A (MRE11) oncoprotein and breast cancer recurrence status remains unclear. The aim of this study was to assess the interaction between MRE11 and clinicopathologic variables in breast cancer. A dataset for 254 subjects with breast cancer (220 nonrecurrent and 34 recurrent) was used in individual and cumulated receiver operating characteristic (ROC) analyses of MRE11 and 12 clinicopathologic variables for predicting breast cancer recurrence. In individual ROC analysis, the area under curve (AUC) for each predictor of breast cancer recurrence was smaller than 0.7. In cumulated ROC analysis, however, the AUC value for each predictor improved. Ten relevant variables in breast cancer recurrence were used to find the optimal prognostic indicators. The presence of any six of the following ten variables had a high (79%) sensitivity and a high (70%) specificity for predicting breast cancer recurrence: tumor size ≥ 2.4 cm, tumor stage II/III, therapy other than hormone therapy, age ≥ 52 years, MRE11 positive cells > 50%, body mass index ≥ 24, lymph node metastasis, positivity for progesterone receptor, positivity for epidermal growth factor receptor, and negativity for estrogen receptor. In conclusion, this study revealed that these 10 clinicopathologic variables are the minimum discriminators needed for optimal discriminant effectiveness in predicting breast cancer recurrence.


2021 ◽  
Vol 19 (1) ◽  
pp. 2-15
Author(s):  
Stan Lipovetsky ◽  
Michael W. Conklin

Finding key drivers in regression modeling via Bayesian Sensitivity-Specificity and Receiver Operating Characteristic is suggested, and clearly interpretable results are obtained. Numerical comparisons with other techniques show that this methodology can be useful in practical statistical modeling and analysis helping to researchers and managers in making meaningful decisions.


2013 ◽  
Vol 17 (4) ◽  
pp. 861-869 ◽  
Author(s):  
Carolina Avila Vianna ◽  
Rogério da Silva Linhares ◽  
Renata Moraes Bielemann ◽  
Eduardo Coelho Machado ◽  
David Alejandro González-Chica ◽  
...  

AbstractObjectiveTo evaluate the adequacy and accuracy of cut-off values currently recommended by the WHO for assessment of cardiovascular risk in southern Brazil.DesignPopulation-based study aimed at determining the predictive ability of waist circumference for cardiovascular risk based on the use of previous medical diagnosis for hypertension, diabetes mellitus and/or dyslipidaemia. Descriptive analysis was used for the adequacy of current cut-off values of waist circumference, receiver operating characteristic curves were constructed and the most accurate criteria according to the Youden index and points of optimal sensitivity and specificity were identified.SettingPelotas, southern Brazil.SubjectsIndividuals (n2112) aged ≥20 years living in the city were selected by multistage sampling, since these individuals did not report the presence of previous myocardial infarction, angina pectoris or stroke.ResultsThe cut-off values currently recommended by WHO were more appropriate in men than women, with overestimation of cardiovascular risk in women. The area under the receiver operating characteristic curve showed moderate predictive ability of waist circumference in men (0·74, 95 % CI 0·71, 0·76) and women (0·75, 95 % CI 0·73, 0·77). The method of optimal sensitivity and specificity showed better performance in assessing the accuracy, identifying the values of 95 cm in men and 87 cm in women as the best cut-off values of waist circumference to assess cardiovascular risk.ConclusionsThe cut-off values currently recommended for waist circumference are not suitable for women. Longitudinal studies should be conducted to evaluate the consistency of the findings.


Sign in / Sign up

Export Citation Format

Share Document