scholarly journals The evaluation of binary classification tasks in economical prediction

Author(s):  
Martin Pokorný

In the area of economical classification tasks, the accuracy maximization is often used to evaluate classifier performance. Accuracy maximization (or error rate minimization) suffers from the assumption of equal false positive and false negative error costs. Furthermore, accuracy is not able to express true classifier performance under skewed class distribution. Due to these limitations, the use of accuracy on real tasks is questionable. In a real binary classification task, the difference between the costs of false positive and false negative error is usually critical. To overcome this issue, the Receiver Ope­rating Characteristic (ROC) method in relation to decision-analytic principles can be used. One essential advantage of this method is the possibility of classifier performance visualization by means of a ROC graph. This paper presents concrete examples of binary classification, where the inadequacy of accuracy as the evaluation metric is shown, and on the same examples the ROC method is applied. From the set of possible classification models, the probabilistic classifier with continuous output is under consideration. Mainly two questions are solved. Firstly, the selection of the best classifier from a set of possible classifiers. For example, accuracy metric rates two classifiers almost equiva­lently (87.7 % and 89.3 %), whereas decision analysis (via costs minimization) or ROC analysis reveal differe­nt performance according to target conditions of unequal error costs of false positives and false negatives. Secondly, the setting of an optimal decision threshold at classifier’s output. For example, accuracy maximization finds the optimal threshold at classifier’s output in value of 0.597, but the optimal threshold respecting higher costs of false negatives is discovered by costs minimization or ROC analysis in a value substantially lower (0.477).

2019 ◽  
Vol 152 (Supplement_1) ◽  
pp. S35-S36
Author(s):  
Hadrian Mendoza ◽  
Christopher Tormey ◽  
Alexa Siddon

Abstract In the evaluation of bone marrow (BM) and peripheral blood (PB) for hematologic malignancy, positive immunoglobulin heavy chain (IG) or T-cell receptor (TCR) gene rearrangement results may be detected despite unrevealing results from morphologic, flow cytometric, immunohistochemical (IHC), and/or cytogenetic studies. The significance of positive rearrangement studies in the context of otherwise normal ancillary findings is unknown, and as such, we hypothesized that gene rearrangement studies may be predictive of an emerging B- or T-cell clone in the absence of other abnormal laboratory tests. Data from all patients who underwent IG or TCR gene rearrangement testing at the authors’ affiliated VA hospital between January 1, 2013, and July 6, 2018, were extracted from the electronic medical record. Date of testing; specimen source; and morphologic, flow cytometric, IHC, and cytogenetic characterization of the tissue source were recorded from pathology reports. Gene rearrangement results were categorized as true positive, false positive, false negative, or true negative. Lastly, patient records were reviewed for subsequent diagnosis of hematologic malignancy in patients with positive gene rearrangement results with negative ancillary testing. A total of 136 patients, who had 203 gene rearrangement studies (50 PB and 153 BM), were analyzed. In TCR studies, there were 2 false positives and 1 false negative in 47 PB assays, as well as 7 false positives and 1 false negative in 54 BM assays. Regarding IG studies, 3 false positives and 12 false negatives in 99 BM studies were identified. Sensitivity and specificity, respectively, were calculated for PB TCR studies (94% and 93%), BM IG studies (71% and 95%), and BM TCR studies (92% and 83%). Analysis of PB IG gene rearrangement studies was not performed due to the small number of tests (3; all true negative). None of the 12 patients with false-positive IG/TCR gene rearrangement studies later developed a lymphoproliferative disorder, although 2 patients were later diagnosed with acute myeloid leukemia. Of the 14 false negatives, 10 (71%) were related to a diagnosis of plasma cell neoplasms. Results from the present study suggest that positive IG/TCR gene rearrangement studies are not predictive of lymphoproliferative disorders in the context of otherwise negative BM or PB findings. As such, when faced with equivocal pathology reports, clinicians can be practically advised that isolated positive IG/TCR gene rearrangement results may not indicate the need for closer surveillance.


2021 ◽  
Vol 15 (02) ◽  
pp. 241-262
Author(s):  
Wasif Bokhari ◽  
Ajay Bansal

In medical disease diagnosis, the cost of a false negative could greatly outweigh the cost of a false positive. This is because the former could cost a life, whereas the latter may only cause medical costs and stress to the patient. The unique nature of this problem highlights the need of asymmetric error control for binary classification applications. In this domain, traditional machine learning classifiers may not be ideal as they do not provide a way to control the number of false negatives below a certain threshold. This paper proposes a novel tree-based binary classification algorithm that can control the number of false negatives with a mathematical guarantee, based on Neyman–Pearson (NP) Lemma. This classifier is evaluated on the data obtained from different heart studies and it predicts the risk of cardiac disease, not only with comparable accuracy and AUC-ROC score but also with full control over the number of false negatives. The methodology used to construct this classifier can be expanded to many more use cases, not only in medical disease diagnosis but also beyond as shown from analysis on different diverse datasets.


1990 ◽  
Vol 15 (1) ◽  
pp. 39-52 ◽  
Author(s):  
Huynh Huynh

False positive and false negative error rates are studied for competency testing where examinees are permitted to retake the test if they fail to pass. Formulae are provided for the beta-binomial and Rasch models, and estimates based on these two models are compared for several typical situations. Although Rasch estimates are expected to be more accurate than beta-binomial estimates, differences among them are found not to be substantial in a number of practical situations. Under relatively general conditions and when test retaking is permitted, the probability of making a false negative error is zero. Under the same situation, and given that an examinee is a true nonmaster, the conditional probability of making a false positive error for this examinee is one.


2018 ◽  
Vol 29 (4) ◽  
pp. 435-441 ◽  
Author(s):  
Kazuyoshi Kobayashi ◽  
Kei Ando ◽  
Ryuichi Shinjo ◽  
Kenyu Ito ◽  
Mikito Tsushima ◽  
...  

OBJECTIVEMonitoring of brain evoked muscle-action potentials (Br[E]-MsEPs) is a sensitive method that provides accurate periodic assessment of neurological status. However, occasionally this method gives a relatively high rate of false-positives, and thus hinders surgery. The alarm point is often defined based on a particular decrease in amplitude of a Br(E)-MsEP waveform, but waveform latency has not been widely examined. The purpose of this study was to evaluate onset latency in Br(E)-MsEP monitoring in spinal surgery and to examine the efficacy of an alarm point using a combination of amplitude and latency.METHODSA single-center, retrospective study was performed in 83 patients who underwent spine surgery using intraoperative Br(E)-MsEP monitoring. A total of 1726 muscles in extremities were chosen for monitoring, and acceptable baseline Br(E)-MsEP responses were obtained from 1640 (95%). Onset latency was defined as the period from stimulation until the waveform was detected. Relationships of postoperative motor deficit with onset latency alone and in combination with a decrease in amplitude of ≥ 70% from baseline were examined.RESULTSNine of the 83 patients had postoperative motor deficits. The delay of onset latency compared to the control waveform differed significantly between patients with and without these deficits (1.09% ± 0.06% vs 1.31% ± 0.14%, p < 0.01). In ROC analysis, an intraoperative 15% delay in latency from baseline had a sensitivity of 78% and a specificity of 96% for prediction of postoperative motor deficit. In further ROC analysis, a combination of a decrease in amplitude of ≥ 70% and delay of onset latency of ≥ 10% from baseline had sensitivity of 100%, specificity of 93%, a false positive rate of 7%, a false negative rate of 0%, a positive predictive value of 64%, and a negative predictive value of 100% for this prediction.CONCLUSIONSIn spinal cord monitoring with intraoperative Br(E)-MsEP, an alarm point using a decrease in amplitude of ≥ 70% and delay in onset latency of ≥ 10% from baseline has high specificity that reduces false positive results.


2020 ◽  
pp. jclinpath-2020-206726
Author(s):  
Cornelia Margaret Szecsei ◽  
Jon D Oxley

AimTo examine the effects of specialist reporting on error rates in prostate core biopsy diagnosis.MethodBiopsies were reported by eight specialist uropathologists over 3 years. New cancer diagnoses were double-reported and all biopsies were reviewed for the multidisciplinary team (MDT) meeting. Diagnostic alterations were recorded in supplementary reports and error rates were compared with a decade previously.Results2600 biopsies were reported. 64.1% contained adenocarcinoma, a 19.7% increase. The false-positive error rate had reduced from 0.4% to 0.06%. The false-negative error rate had increased from 1.5% to 1.8%, but represented fewer absolute errors due to increased cancer incidence.ConclusionsSpecialisation and double-reporting have reduced false-positive errors. MDT review of negative cores continues to identify a very low number of false-negative errors. Our data represents a ‘gold standard’ for prostate biopsy diagnostic error rates. Increased use of MRI-targeted biopsies may alter error rates and their future clinical significance.


1977 ◽  
Vol 25 (7) ◽  
pp. 689-695 ◽  
Author(s):  
R S Poulsen ◽  
L H Oliver ◽  
R L Cahn ◽  
C Louis ◽  
G Toussaint

This paper presents preliminary results of research toward the development of a high resolution analysis stage for a dual resolution image processing-based prescreening device for cervical cytology. Experiments using both manual and automatic methods for cell segmentation are described. In both cases, 1500 cervical cells were analyzed and classified as normal or abnormal (dysplastic or malignant) using a minimum Mahalanobis distance classifier with eight subclasses of normal cells, and five subclasses of abnormal cells. With manual segmentation, false positive and false negative error rates of 2.98 and 7.73% were obtained. Similar experiments using automatic cell segmentation methods yielded false positive and false negative error rates of 3.90 and 11.56%, respectively. In both cases, independent training and testing data were used.


1974 ◽  
Vol 22 (7) ◽  
pp. 663-667 ◽  
Author(s):  
DAN H. MOORE

A statistical model is developed that describes the population of women who are given a cytologic screening test for cervical cancer. The model is used to determine false positive and false negative rates as a function of (a) the proportion of "positive" cells in women free from cancer and in those with cancer, (b) the number of cells examined and (c) the minimal number of positive cells for a diagnosis of cancer. The model allows estimation of the minimal number of cells that must be examined in order to reduce both the false positive and the false negative rates below some predetermined levels. An expected cost equation is derived which combines the costs of examining each cell with the costs for false positives and false negatives. It is shown how cancer detection can be optimized through the use of this cost equation. The method determines both the maximal permissible cost for examining each cell and the optimal number of cells to examine in order to reduce the over-all expected cost below some predetermined level.


2021 ◽  
Vol 34 (Supplement_1) ◽  
Author(s):  
Sergey Morozov ◽  
Vasily Kropochev ◽  
Alexey Artemov

Abstract   Not less than ten wet swallows assessment in the primary test position is recommended by Chicago classification 4.0 for high-resolution oesophageal manometry (HREM); however, the required number of measurements are not sufficiently supported. Aim to evaluate the number of wet swallows necessary for correct interpretation of the results of lower esophageal sphincter integrated relaxation pressure (IRP) with low probability of type I and II errors. Methods Patients referred to perform HREM were enrolled. Solid-state 10Fr catheter and Solar (Laborie) software were used. Minimum 10 swallows by 5 mL water were obtained. These were analysed for cumulative means of IRP after 1…9 measurements. Conclusion made at each moment was compared with one based on 10 measurements. The results were characterized as true/false positive/negative for calculation of diagnostic accuracy. To exclude sample influence, Monte-Carlo simulation of sequential decision-making was performed with the use of sequential probability ratio test. Association of the diagnostic accuracy from recall was studied with the use of receiver operating characteristic curve (ROC) analysis. Results One hundred subjects were enrolled (25 with disorders of EGJ outflow). During the simulation, the probability of matching the decisions based on the 10 measurements and lower number of them was high. ROC analysis showed that actual probability to obtain false-positive results was twice as lower then ‘allowed’ rate of 5%. The probability to make false-negative results did not exceed 10% in any number of measurements. The probability that the conclusions made after 2 and after 10 measurements match was 0.9584 in those with disorders of EGJ outflow and 0.9652 in those without (figure 1). Conclusion The standard number of measurements required to support the presence of disorders of EGJ outflow during evaluation of 5 mL wet swallows in the primary position is excessive. Values of the IRP after 2 swallows allows to make similar decision to that after 10 swallows with &gt;95% probability. This allows to reduce the number of wet swallows to assess in the primary position and save time for assessments in alternative position or perform provocation tests.


Sign in / Sign up

Export Citation Format

Share Document