misclassification costs
Recently Published Documents


TOTAL DOCUMENTS

49
(FIVE YEARS 7)

H-INDEX

11
(FIVE YEARS 0)

Author(s):  
D. J. Hand ◽  
C. Anagnostopoulos

AbstractThe H-measure is a classifier performance measure which takes into account the context of application without requiring a rigid value of relative misclassification costs to be set. Since its introduction in 2009 it has become widely adopted. This paper answers various queries which users have raised since its introduction, including questions about its interpretation, the choice of a weighting function, whether it is strictly proper, its coherence, and relates the measure to other work.


2021 ◽  
Author(s):  
Philipp Sterner ◽  
David Goretzko ◽  
Florian Pargent

Psychology has seen an increase in machine learning (ML) methods. In many applications, observations are classified into one of two groups (binary classification). Off-the-shelf classification algorithms assume that the costs of a misclassification (false-positive or false-negative) are equal. Because this is often not reasonable (e.g., in clinical psychology), cost-sensitive learning (CSL) methods can take different cost ratios into account. We present the mathematical foundations and introduce a taxonomy of the most commonly used CSL methods, before demonstrating their application and usefulness on psychological data, i.e., the drug consumption dataset ($N = 1885$) from the UCI Machine Learning Repository. In our example, all demonstrated CSL methods noticeably reduce mean misclassification costs compared to regular ML algorithms. We discuss the necessity for researchers to perform small benchmarks of CSL methods for their own practical application. Thus, our open materials provide R code, demonstrating how CSL methods can be applied within the mlr3 framework (https://osf.io/cvks7/).


2021 ◽  
Author(s):  
Viraj Kulkarni ◽  
Manish Gawali ◽  
Amit Kharat

UNSTRUCTURED The use of machine learning to develop intelligent software tools for interpretation of radiology images has gained widespread attention in recent years. The development, deployment, and eventual adoption of these models in clinical practice, however, remains fraught with challenges. In this paper, we propose a list of key considerations that machine learning researchers must recognize and address to make their models accurate, robust, and usable in practice. Namely, we discuss: insufficient training data, decentralized datasets, high cost of annotations, ambiguous ground truth, imbalance in class representation, asymmetric misclassification costs, relevant performance metrics, generalization of models to unseen datasets, model decay, adversarial attacks, explainability, fairness and bias, and clinical validation. We describe each consideration and identify techniques to address it. Although these techniques have been discussed in prior research literature, by freshly examining them in the context of medical imaging and compiling them in the form of a laundry list, we hope to make them more accessible to researchers, software developers, radiologists, and other stakeholders.


Electronics ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 657
Author(s):  
Krzysztof Gajowniczek ◽  
Tomasz Ząbkowski

This paper presents two R packages ImbTreeEntropy and ImbTreeAUC to handle imbalanced data problems. ImbTreeEntropy functionality includes application of a generalized entropy functions, such as Rényi, Tsallis, Sharma–Mittal, Sharma–Taneja and Kapur, to measure impurity of a node. ImbTreeAUC provides non-standard measures to choose an optimal split point for an attribute (as well the optimal attribute for splitting) by employing local, semi-global and global AUC (Area Under the ROC curve) measures. Both packages are applicable for binary and multiclass problems and they support cost-sensitive learning, by defining a misclassification cost matrix, and weighted-sensitive learning. The packages accept all types of attributes, including continuous, ordered and nominal, where the latter type is simplified for multiclass problems to reduce the computational overheads. Both applications enable optimization of the thresholds where posterior probabilities determine final class labels in a way that misclassification costs are minimized. Model overfitting can be managed either during the growing phase or at the end using post-pruning. The packages are mainly implemented in R, however some computationally demanding functions are written in plain C++. In order to speed up learning time, parallel processing is supported as well.


2021 ◽  
Author(s):  
Paolo Frattini ◽  
Gianluca Sala ◽  
Camilla Lanfranconi ◽  
Giulia Rusconi ◽  
Giovanni Crosta

<p>Rainfall is one of the most significant triggering factors for shallow landslides. The early warning for such phenomena requires the definition of a threshold based on a critical rainfall condition that may lead to diffuse landsliding. The developing of these thresholds is frequently done through empirical or statistical approaches that aim at identifying thresholds between rainfall events that triggered or non-triggered landslides. Such approaches present several problems related to the identification of the exact amount of rainfall that triggered landslides, the local geo-environmental conditions at the landslide site, and the minimum rainfall amount used to define the non-triggering events. Furthermore, these thresholds lead to misclassifications (false negative or false positive) that always induce costs for the society. The aim of this research is to address these limitations, accounting for classification costs in order to select the optimal thresholds for landslide risk management.</p><p>Starting from a database of shallow landslides occurred during five regional-scale rainfall events in the Italian Central Alps, we extracted the triggering rainfall intensities by adjusting rain gouge data with weather radar data. This adjustment significantly improved the information regarding the rainfall intensity at the landslide site and, although an uncertainty related to the exact timing of occurrence has still remained. Therefore, we identified the rainfall thresholds through the Receiver Operating Characteristic (ROC) approach, by identifying the optimal rainfall intensity that separates triggering and non-triggering events. To evaluate the effect related to the application of different minimum rainfall for non-triggering events, we have adopted three different values obtaining similar results, thus demonstrating that the ROC approach is not sensitive to the choice of the minimum rainfall threshold. In order to include the effect of misclassification costs we have developed cost-sensitive rainfall threshold curves by using cost-curve approach (Drummond and Holte 2000). As far as we know, this is the first attempt to build a cost-sensitive rainfall threshold for landslides that allows to explicitly account for misclassification costs. For the development of the cost-sensitive threshold curve, we had to define a reference cost scenario in which we have quantified several cost items for both missed alarms and false alarms. By using this scenario, the cost-sensitive rainfall threshold results to be lower than the ROC threshold to minimize the missed alarms, the costs of which are seven times greater than the false alarm costs. Since the misclassification costs could vary according to different socio-economic contexts and emergency organization, we developed different extreme scenarios to evaluate the sensitivity of misclassification costs on the rainfall thresholds. In the scenario with maximum false-alarm cost and minimum missed-alarm cost, the rainfall threshold increases in order to minimize the false alarms. Conversely, the rainfall thresholds decreases in the scenario with minimum false-alarm cost and maximum missed-alarm costs. We found that the range of variation between the curves of these extreme scenarios is as much as half an order of magnitude.</p>


2019 ◽  
Vol 20 (S25) ◽  
Author(s):  
Huijuan Lu ◽  
Yige Xu ◽  
Minchao Ye ◽  
Ke Yan ◽  
Zhigang Gao ◽  
...  

Abstract Background Cost-sensitive algorithm is an effective strategy to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically based on user expertise, which leads to unstable performance of cost-sensitive classification. Therefore, an efficient and accurate method is needed to calculate the optimal cost weights. Results In this paper, two approaches are proposed to search for the optimal cost weights, targeting at the highest weighted classification accuracy (WCA). One is the optimal cost weights grid searching and the other is the function fitting. Comparisons are made between these between the two algorithms above. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Conclusions Comprehensive experimental results show that the function fitting method is generally more efficient, which can well find the optimal cost weights with acceptable WCA.


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Na Liu ◽  
Jiang Shen ◽  
Man Xu ◽  
Dan Gan ◽  
Er-Shi Qi ◽  
...  

As one of the most prevalent cancers among women worldwide, breast cancer has attracted the most attention by researchers. It has been verified that an accurate and early detection of breast cancer can increase the chances for the patients to take the right treatment plan and survive for a long time. Nowadays, numerous classification methods have been utilized for breast cancer diagnosis. However, most of these classification models have concentrated on maximum the classification accuracy, failed to take into account the unequal misclassification costs for the breast cancer diagnosis. To the best of our knowledge, misclassifying the cancerous patient as non-cancerous has much higher cost compared to misclassifying the non-cancerous as cancerous. Consequently, in order to tackle this deficiency and further improve the classification accuracy of the breast cancer diagnosis, we propose an improved cost-sensitive support vector machine classifier (ICS-SVM) for the diagnosis of breast cancer. In the proposed approach, we take full account of unequal misclassification costs of breast cancer intelligent diagnosis and provide more reasonable results over previous works and conventional classification models. To evaluate the performance of the proposed approach, Wisconsin Breast Cancer (WBC) and Wisconsin Diagnostic Breast Cancer (WDBC) breast cancer datasets obtained from the University of California at Irvine (UCI) machine learning repository have been studied. The experimental results demonstrate that the proposed hybrid algorithm outperforms all the existing methods. Promisingly, the proposed method can be regarded as a useful clinical tool for breast cancer diagnosis and could also be applied to other illness diagnosis.


Sign in / Sign up

Export Citation Format

Share Document