Cost-Sensitive Learning in Medicine

Author(s):  
Alberto Freitas ◽  
Pavel Brazdil ◽  
Altamiro Costa-Pereira

This chapter introduces cost-sensitive learning and its importance in medicine. Health managers and clinicians often need models that try to minimize several types of costs associated with healthcare, including attribute costs (e.g. the cost of a specific diagnostic test) and misclassification costs (e.g. the cost of a false negative test). In fact, as in other professional areas, both diagnostic tests and its associated misclassification errors can have significant financial or human costs, including the use of unnecessary resource and patient safety issues. This chapter presents some concepts related to cost-sensitive learning and cost-sensitive classification and its application to medicine. Different types of costs are also present, with an emphasis on diagnostic tests and misclassification costs. In addition, an overview of research in the area of cost-sensitive learning is given, including current methodological approaches. Finally, current methods for the cost-sensitive evaluation of classifiers are discussed.

2012 ◽  
pp. 1625-1641
Author(s):  
Alberto Freitas ◽  
Pavel Brazdil ◽  
Altamiro Costa-Pereira

This chapter introduces cost-sensitive learning and its importance in medicine. Health managers and clinicians often need models that try to minimize several types of costs associated with healthcare, including attribute costs (e.g. the cost of a specific diagnostic test) and misclassification costs (e.g. the cost of a false negative test). In fact, as in other professional areas, both diagnostic tests and its associated misclassification errors can have significant financial or human costs, including the use of unnecessary resource and patient safety issues. This chapter presents some concepts related to cost-sensitive learning and cost-sensitive classification and its application to medicine. Different types of costs are also present, with an emphasis on diagnostic tests and misclassification costs. In addition, an overview of research in the area of cost-sensitive learning is given, including current methodological approaches. Finally, current methods for the cost-sensitive evaluation of classifiers are discussed.


Author(s):  
Aijun Xue ◽  
Xiaodan Wang

Many real world applications involve multiclass cost-sensitive learning problems. However, some well-worked binary cost-sensitive learning algorithms cannot be extended into multiclass cost-sensitive learning directly. It is meaningful to decompose the complex multiclass cost-sensitive classification problem into a series of binary cost-sensitive classification problems. So, in this paper we propose an alternative and efficient decomposition framework, using the original error correcting output codes. The main problem in our framework is how to evaluate the binary costs for each binary cost-sensitive base classifier. To solve this problem, we proposed to compute the expected misclassification costs starting from the given multiclass cost matrix. Furthermore, the general formulations to compute the binary costs are given. Experimental results on several synthetic and UCI datasets show that our method can obtain comparable performance in comparison with the state-of-the-art methods.


1999 ◽  
Vol 45 (7) ◽  
pp. 934-941 ◽  
Author(s):  
Alan T Remaley ◽  
Maureen L Sampson ◽  
James M DeLeo ◽  
Nancy A Remaley ◽  
Beriuse D Farsi ◽  
...  

Abstract The clinical accuracy of diagnostic tests commonly is assessed by ROC analysis. ROC plots, however, do not directly incorporate the effect of prevalence or the value of the possible test outcomes on test performance, which are two important factors in the practical utility of a diagnostic test. We describe a new graphical method, referred to as a prevalence-value-accuracy (PVA) plot analysis, which includes, in addition to accuracy, the effect of prevalence and the cost of misclassifications (false positives and false negatives) in the comparison of diagnostic test performance. PVA plots are contour plots that display the minimum cost attributable to misclassifications (z-axis) at various optimum decision thresholds over a range of possible values for prevalence (x-axis) and the unit cost ratio (UCR; y-axis), which is an index of the cost of a false-positive vs a false-negative test result. Another index based on the cost of misclassifications can be derived from PVA plots for the quantitative comparison of test performance. Depending on the region of the PVA plot that is used to calculate the misclassification cost index, it can potentially lead to a different interpretation than the ROC area index on the relative value of different tests. A PVA-threshold plot, which is a variation of a PVA plot, is also described for readily identifying the optimum decision threshold at any given prevalence and UCR. In summary, the advantages of PVA plot analysis are the following: (a) it directly incorporates the effect of prevalence and misclassification costs in the analysis of test performance; (b) it yields a quantitative index based on the costs of misclassifications for comparing diagnostic tests; (c) it provides a way to restrict the comparison of diagnostic test performance to a clinically relevant range of prevalence and UCR; and (d) it can be used to directly identify an optimum decision threshold based on prevalence and misclassification costs.


2021 ◽  
Author(s):  
Philipp Sterner ◽  
David Goretzko ◽  
Florian Pargent

Psychology has seen an increase in machine learning (ML) methods. In many applications, observations are classified into one of two groups (binary classification). Off-the-shelf classification algorithms assume that the costs of a misclassification (false-positive or false-negative) are equal. Because this is often not reasonable (e.g., in clinical psychology), cost-sensitive learning (CSL) methods can take different cost ratios into account. We present the mathematical foundations and introduce a taxonomy of the most commonly used CSL methods, before demonstrating their application and usefulness on psychological data, i.e., the drug consumption dataset ($N = 1885$) from the UCI Machine Learning Repository. In our example, all demonstrated CSL methods noticeably reduce mean misclassification costs compared to regular ML algorithms. We discuss the necessity for researchers to perform small benchmarks of CSL methods for their own practical application. Thus, our open materials provide R code, demonstrating how CSL methods can be applied within the mlr3 framework (https://osf.io/cvks7/).


2021 ◽  
Vol 15 (02) ◽  
pp. 241-262
Author(s):  
Wasif Bokhari ◽  
Ajay Bansal

In medical disease diagnosis, the cost of a false negative could greatly outweigh the cost of a false positive. This is because the former could cost a life, whereas the latter may only cause medical costs and stress to the patient. The unique nature of this problem highlights the need of asymmetric error control for binary classification applications. In this domain, traditional machine learning classifiers may not be ideal as they do not provide a way to control the number of false negatives below a certain threshold. This paper proposes a novel tree-based binary classification algorithm that can control the number of false negatives with a mathematical guarantee, based on Neyman–Pearson (NP) Lemma. This classifier is evaluated on the data obtained from different heart studies and it predicts the risk of cardiac disease, not only with comparable accuracy and AUC-ROC score but also with full control over the number of false negatives. The methodology used to construct this classifier can be expanded to many more use cases, not only in medical disease diagnosis but also beyond as shown from analysis on different diverse datasets.


2017 ◽  
Vol 24 (5) ◽  
pp. 354 ◽  
Author(s):  
G.N. Honein-AbouHaidar ◽  
J.S. Hoch ◽  
M.J. Dobrow ◽  
T. Stuart-McEwan ◽  
D.R. McCready ◽  
...  

Objectives Diagnostic assessment programs (daps) appear to improve the diagnosis of cancer, but evidence of their cost-effectiveness is lacking. Given that no earlier study used secondary financial data to estimate the cost of diagnostic tests in the province of Ontario, we explored how to use secondary financial data to retrieve the cost of key diagnostic test services in daps, and we tested the reliability of that cost-retrieving method with hospital-reported costs in preparation for future cost-effectiveness studies.Methods We powered our sample at an alpha of 0.05, a power of 80%, and a margin of error of ±5%, and randomly selected a sample of eligible patients referred to a dap for suspected breast cancer during 1 January–31 December 2012. Confirmatory diagnostic tests received by each patient were identified in medical records. Canadian Classification of Health Intervention procedure codes were used to search the secondary financial data Web portal at the Ontario Case Costing Initiative for an estimate of the direct, indirect, and total costs of each test. The hospital-reported cost of each test received was obtained from the host-hospital’s finance department. Descriptive statistics were used to calculate the cost of individual or group confirmatory diagnostic tests, and the Wilcoxon signed-rank test or the paired t-test was used to compare the Ontario Case Costing Initiative and hospital-reported costs.Results For the 191 identified patients with suspected breast cancer, the estimated total cost of $72,195.50 was not significantly different from the hospital-reported total cost of $72,035.52 (p = 0.24). Costs differed significantly when multiple tests to confirm the diagnosis were completed during one patient visit and when confirmatory tests reported in hospital data and in medical records were discrepant. The additional estimated cost for non-salaried physicians delivering diagnostic services was $28,387.50.Conclusions It was feasible to use secondary financial data to retrieve the cost of key diagnostic tests in a breast cancer dap and to compare the reliability of the costs obtained by that estimation method with hospital-reported costs. We identified the strengths and challenges of each approach. Lessons learned from this study have to be taken into consideration in future cost-effectiveness studies.


2021 ◽  
Vol 9 (3) ◽  
pp. 276-291
Author(s):  
Mawaddah Mawaddah ◽  
Yandi Heryandi

This study aims to: (1) find out the misconceptions experienced by students in the material of similarity and congruence by using three-tier diagnostic tests based on open-ended questions. (2) find out the large percentage of students' misconceptions on the similarity and congruence material using a three-tier diagnostic test based on open-ended questions. The research method used was descriptive qualitative. The data collection instruments used were clinical interviews and three-tier diagnostic tests based on open-ended questions. This study was conducted in the SMP Negeri 2 Palimanan. The research subjects were selected using purposive sampling techniques so that 33 students were selected from 330 students. The results of the analysis of the three-tier diagnostic test based on open-ended questions showed that (1) the misconceptions that occurred in the material of similarity and congruence of 2D shapes included pure misconceptions, false positives, and false negatives. (2) the percentage of misconceptions experienced in the material of similarity and congruence 2D shapes as a whole was 50.2% with a large percentage of pure misconceptions 32.4%, false-positive 15.6%, and false-negative 2.2%


2021 ◽  
Author(s):  
Indrajeet Kumar ◽  
Jyoti Rawat

Abstract The manual diagnostic tests performed in laboratories for pandemic disease such as COVID19 is time-consuming, requires skills and expertise of the performer to yield accurate results. Moreover, it is very cost ineffective as the cost of test kits is high and also requires well-equipped labs to conduct them. Thus, other means of diagnosing the patients with presence of SARS-COV2 (the virus responsible for COVID19) must be explored. A radiography method like chest CT images is one such means that can be utilized for diagnosis of COVID19. The radio-graphical changes observed in CT images of COVID19 patient helps in developing a deep learning-based method for extraction of graphical features which are then used for automated diagnosis of the disease ahead of laboratory-based testing. The proposed work suggests an Artificial Intelligence (AI) based technique for rapid diagnosis of COVID19 from given volumetric CT images of patient’s chest by extracting its visual features and then using these features in the deep learning module. The proposed convolutional neural network is deployed for classifying the infectious and non-infectious SARS-COV2 subjects. The proposed network utilizes 746 chests scanned CT images of which 349 images belong to COVID19 positive cases while remaining 397 belong negative cases of COVID19. The extensive experiment has been completed with the accuracy of 98.4 %, sensitivity of 98.5 %, the specificity of 98.3 %, the precision of 97.1 %, F1score of 97.8 %. The obtained result shows the outstanding performance for classification of infectious and non-infectious for COVID19 cases.


2021 ◽  
Author(s):  
Paolo Frattini ◽  
Gianluca Sala ◽  
Camilla Lanfranconi ◽  
Giulia Rusconi ◽  
Giovanni Crosta

<p>Rainfall is one of the most significant triggering factors for shallow landslides. The early warning for such phenomena requires the definition of a threshold based on a critical rainfall condition that may lead to diffuse landsliding. The developing of these thresholds is frequently done through empirical or statistical approaches that aim at identifying thresholds between rainfall events that triggered or non-triggered landslides. Such approaches present several problems related to the identification of the exact amount of rainfall that triggered landslides, the local geo-environmental conditions at the landslide site, and the minimum rainfall amount used to define the non-triggering events. Furthermore, these thresholds lead to misclassifications (false negative or false positive) that always induce costs for the society. The aim of this research is to address these limitations, accounting for classification costs in order to select the optimal thresholds for landslide risk management.</p><p>Starting from a database of shallow landslides occurred during five regional-scale rainfall events in the Italian Central Alps, we extracted the triggering rainfall intensities by adjusting rain gouge data with weather radar data. This adjustment significantly improved the information regarding the rainfall intensity at the landslide site and, although an uncertainty related to the exact timing of occurrence has still remained. Therefore, we identified the rainfall thresholds through the Receiver Operating Characteristic (ROC) approach, by identifying the optimal rainfall intensity that separates triggering and non-triggering events. To evaluate the effect related to the application of different minimum rainfall for non-triggering events, we have adopted three different values obtaining similar results, thus demonstrating that the ROC approach is not sensitive to the choice of the minimum rainfall threshold. In order to include the effect of misclassification costs we have developed cost-sensitive rainfall threshold curves by using cost-curve approach (Drummond and Holte 2000). As far as we know, this is the first attempt to build a cost-sensitive rainfall threshold for landslides that allows to explicitly account for misclassification costs. For the development of the cost-sensitive threshold curve, we had to define a reference cost scenario in which we have quantified several cost items for both missed alarms and false alarms. By using this scenario, the cost-sensitive rainfall threshold results to be lower than the ROC threshold to minimize the missed alarms, the costs of which are seven times greater than the false alarm costs. Since the misclassification costs could vary according to different socio-economic contexts and emergency organization, we developed different extreme scenarios to evaluate the sensitivity of misclassification costs on the rainfall thresholds. In the scenario with maximum false-alarm cost and minimum missed-alarm cost, the rainfall threshold increases in order to minimize the false alarms. Conversely, the rainfall thresholds decreases in the scenario with minimum false-alarm cost and maximum missed-alarm costs. We found that the range of variation between the curves of these extreme scenarios is as much as half an order of magnitude.</p>


2021 ◽  
pp. 722-730
Author(s):  
Angela M. Parsons ◽  
Joseph F. Drazkowski

Correctly diagnosing seizures and seizurelike events is important for numerous reasons, including safety issues, social consequences, and therapy. Patients with transient neurologic events of unknown cause are commonly admitted to hospitals, and an estimated 10% of the people in the United States have a seizure in their lifetime. These facts highlight the importance of diagnostic accuracy. History taking is imperfect, but it is still a cornerstone in making the proper diagnosis of transient neurologic events. Focused, supporting diagnostic tests may add accuracy in arriving at the proper diagnosis, but even with a good history, diagnostic testing, and physical examination findings, the diagnosis may be inaccurate. Self-reports of seizure frequency are notoriously inaccurate and often miss more than 50% of focal-onset seizures, especially if the seizures begin in the dominant hemisphere (largely because the effects of the event cause an altered level of consciousness).


Sign in / Sign up

Export Citation Format

Share Document