cost sensitive learning Latest Research Papers

Instance-dependent cost-sensitive learning: do we really need it?

10.24251/hicss.2022.191 ◽

2022 ◽

Author(s):

Toon Vanderschueren ◽

Wouter Verbeke ◽

Bart Baesens ◽

Tim Verdonck

Keyword(s):

Cost Sensitive Learning

Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study

PeerJ Computer Science ◽

10.7717/peerj-cs.832 ◽

2021 ◽

Vol 7 ◽

pp. e832

Author(s):

Barbara Pes ◽

Giuseppina Lai

Keyword(s):

Feature Selection ◽

Comparative Study ◽

Learning Strategies ◽

Class Imbalance ◽

Imbalanced Data ◽

High Dimensionality ◽

Problem Instance ◽

High Dimensional ◽

Cost Sensitive Learning ◽

Interesting Insight

High dimensionality and class imbalance have been largely recognized as important issues in machine learning. A vast amount of literature has indeed investigated suitable approaches to address the multiple challenges that arise when dealing with high-dimensional feature spaces (where each problem instance is described by a large number of features). As well, several learning strategies have been devised to cope with the adverse effects of imbalanced class distributions, which may severely impact on the generalization ability of the induced models. Nevertheless, although both the issues have been largely studied for several years, they have mostly been addressed separately, and their combined effects are yet to be fully understood. Indeed, little research has been so far conducted to investigate which approaches might be best suited to deal with datasets that are, at the same time, high-dimensional and class-imbalanced. To make a contribution in this direction, our work presents a comparative study among different learning strategies that leverage both feature selection, to cope with high dimensionality, as well as cost-sensitive learning methods, to cope with class imbalance. Specifically, different ways of incorporating misclassification costs into the learning process have been explored. Also different feature selection heuristics have been considered, both univariate and multivariate, to comparatively evaluate their effectiveness on imbalanced data. The experiments have been conducted on three challenging benchmarks from the genomic domain, gaining interesting insight into the beneficial impact of combining feature selection and cost-sensitive learning, especially in the presence of highly skewed data distributions.

Few-Shot Learning for Post-Earthquake Urban Damage Detection

Remote Sensing ◽

10.3390/rs14010040 ◽

2021 ◽

Vol 14 (1) ◽

pp. 40

Author(s):

Eftychia Koukouraki ◽

Leonardo Vanneschi ◽

Marco Painho

Keyword(s):

Damage Assessment ◽

Binary Classification ◽

Training Data ◽

Classification Problems ◽

Deep Convolutional Neural Networks ◽

Damage Classification ◽

Emergency Relief ◽

Cost Sensitive Learning ◽

Urban Structures ◽

Address Data

Among natural disasters, earthquakes are recorded to have the highest rates of human loss in the past 20 years. Their unexpected nature has severe consequences on both human lives and material infrastructure, demanding urgent action to be taken. For effective emergency relief, it is necessary to gain awareness about the level of damage in the affected areas. The use of remotely sensed imagery is popular in damage assessment applications; however, it requires a considerable amount of labeled data, which are not always easy to obtain. Taking into consideration the recent developments in the fields of Machine Learning and Computer Vision, this study investigates and employs several Few-Shot Learning (FSL) strategies in order to address data insufficiency and imbalance in post-earthquake urban damage classification. While small datasets have been tested against binary classification problems, which usually divide the urban structures into collapsed and non-collapsed, the potential of limited training data in multi-class classification has not been fully explored. To tackle this gap, four models were created, following different data balancing methods, namely cost-sensitive learning, oversampling, undersampling and Prototypical Networks. After a quantitative comparison among them, the best performing model was found to be the one based on Prototypical Networks, and it was used for the creation of damage assessment maps. The contribution of this work is twofold: we show that oversampling is the most suitable data balancing method for training Deep Convolutional Neural Networks (CNN) when compared to cost-sensitive learning and undersampling, and we demonstrate the appropriateness of Prototypical Networks in the damage classification context.

Klasifikasi Resiko Kehamilan Menggunakan Ensemble Learning berbasis Classification Tree

INFORMAL: Informatics Journal ◽

10.19184/isj.v6i3.28396 ◽

2021 ◽

Vol 6 (3) ◽

pp. 177

Author(s):

Muhamad Arief Hidayat

Keyword(s):

Ensemble Learning ◽

Classification Tree ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Pregnancy Risk ◽

Risk Status ◽

Cost Sensitive Learning ◽

Best Value ◽

Selection Stage

In health science there is a technique to determine the level of risk of pregnancy, namely the Poedji Rochyati score technique. In this evaluation technique, the level of pregnancy risk is calculated from the values of 22 parameters obtained from pregnant women. Under certain conditions, some parameter values are unknown. This causes the level of risk of pregnancy can not be calculated. For that we need a way to predict pregnancy risk status in cases of incomplete attribute values. There are several studies that try to overcome this problem. The research "classification of pregnancy risk using cost sensitive learning" [3] applies cost sensitive learning to the process of classifying the level of pregnancy risk. In this study, the best classification accuracy achieved was 73% and the best value was 77.9%. To increase the accuracy and recall of predicting pregnancy risk status, in this study several improvements were proposed. 1) Using ensemble learning based on classification tree 2) using the SVMattributeEvaluator evaluator to optimize the feature subset selection stage. In the trials conducted using the classification tree-based ensemble learning method and the SVMattributeEvaluator at the feature subset selection stage, the best value for accuracy was up to 76% and the best value for recall was up to 89.5%

Everything has its Price: Foundations of Cost-Sensitive Learning and its Application in Psychology

10.31234/osf.io/7asgz ◽

2021 ◽

Author(s):

Philipp Sterner ◽

David Goretzko ◽

Florian Pargent

Keyword(s):

Machine Learning ◽

False Positive ◽

Binary Classification ◽

False Negative ◽

Classification Algorithms ◽

Practical Application ◽

Cost Sensitive Learning ◽

Misclassification Costs ◽

Mathematical Foundations ◽

Open Materials

Psychology has seen an increase in machine learning (ML) methods. In many applications, observations are classified into one of two groups (binary classification). Off-the-shelf classification algorithms assume that the costs of a misclassification (false-positive or false-negative) are equal. Because this is often not reasonable (e.g., in clinical psychology), cost-sensitive learning (CSL) methods can take different cost ratios into account. We present the mathematical foundations and introduce a taxonomy of the most commonly used CSL methods, before demonstrating their application and usefulness on psychological data, i.e., the drug consumption dataset ($N = 1885$) from the UCI Machine Learning Repository. In our example, all demonstrated CSL methods noticeably reduce mean misclassification costs compared to regular ML algorithms. We discuss the necessity for researchers to perform small benchmarks of CSL methods for their own practical application. Thus, our open materials provide R code, demonstrating how CSL methods can be applied within the mlr3 framework (https://osf.io/cvks7/).

Cost-Sensitive Learning and Ensemble BERT for Identifying and Categorizing Offensive Language in Social Media

10.1109/icaicta53211.2021.9640280 ◽

2021 ◽

Author(s):

Fajar Muslim ◽

Ayu Purwarianti ◽

Fariska Z Ruskanda

Keyword(s):

Social Media ◽

Cost Sensitive Learning ◽

Offensive Language

A random forest classifier with cost-sensitive learning to extract urban landmarks from an imbalanced dataset

International Journal of Geographical Information Science ◽

10.1080/13658816.2021.1977814 ◽

2021 ◽

pp. 1-18

Author(s):

Mengjun Kang ◽

Yue Liu ◽

Mengqi Wang ◽

Lin Li ◽

Min Weng

Keyword(s):

Random Forest ◽

Random Forest Classifier ◽

Imbalanced Dataset ◽

Cost Sensitive Learning

Cost-sensitive learning for semi-supervised hit-and-run analysis

Accident Analysis & Prevention ◽

10.1016/j.aap.2021.106199 ◽

2021 ◽

Vol 158 ◽

pp. 106199

Author(s):

Siying Zhu ◽

Jianwu Wan

Keyword(s):

Cost Sensitive Learning ◽

Hit And Run

Performance analysis of cost-sensitive learning methods with application to imbalanced medical data

Informatics in Medicine Unlocked ◽

10.1016/j.imu.2021.100690 ◽

2021 ◽

pp. 100690

Author(s):

Ibomoiye Domor Mienye ◽

Yanxia Sun

Keyword(s):

Performance Analysis ◽

Medical Data ◽

Learning Methods ◽

Cost Sensitive Learning

Density-based weighting for imbalanced regression

Machine Learning ◽

10.1007/s10994-021-06023-5 ◽

2021 ◽

Author(s):

Michael Steininger ◽

Konstantin Kobs ◽

Padraig Davidson ◽

Anna Krause ◽

Andreas Hotho

Keyword(s):

Model Performance ◽

Imbalanced Data ◽

Extreme Rainfall ◽

Weighting Scheme ◽

Extreme Rainfall Events ◽

Rainfall Events ◽

Cost Sensitive Learning ◽

Data Points ◽

Sample Weighting ◽

Model Training

AbstractIn many real world settings, imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases. This is especially problematic for tasks focusing on these rare occurrences. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. Of the few solutions for regression tasks, barely any have explored cost-sensitive learning which is known to have advantages compared to sampling-based methods in classification tasks. In this work, we propose a sample weighting approach for imbalanced regression datasets called DenseWeight and a cost-sensitive learning approach for neural network regression with imbalanced data called DenseLoss based on our weighting scheme. DenseWeight weights data points according to their target value rarities through kernel density estimation (KDE). DenseLoss adjusts each data point’s influence on the loss according to DenseWeight, giving rare data points more influence on model training compared to common data points. We show on multiple differently distributed datasets that DenseLoss significantly improves model performance for rare data points through its density-based weighting scheme. Additionally, we compare DenseLoss to the state-of-the-art method SMOGN, finding that our method mostly yields better performance. Our approach provides more control over model training as it enables us to actively decide on the trade-off between focusing on common or rare cases through a single hyperparameter, allowing the training of better models for rare data points.

cost sensitive learning
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Instance-dependent cost-sensitive learning: do we really need it?

Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study

Few-Shot Learning for Post-Earthquake Urban Damage Detection

Klasifikasi Resiko Kehamilan Menggunakan Ensemble Learning berbasis Classification Tree

Everything has its Price: Foundations of Cost-Sensitive Learning and its Application in Psychology

Cost-Sensitive Learning and Ensemble BERT for Identifying and Categorizing Offensive Language in Social Media

A random forest classifier with cost-sensitive learning to extract urban landmarks from an imbalanced dataset

Cost-sensitive learning for semi-supervised hit-and-run analysis

Performance analysis of cost-sensitive learning methods with application to imbalanced medical data

Density-based weighting for imbalanced regression

Export Citation Format

cost sensitive learningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Instance-dependent cost-sensitive learning: do we really need it?

Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study

Few-Shot Learning for Post-Earthquake Urban Damage Detection

Klasifikasi Resiko Kehamilan Menggunakan Ensemble Learning berbasis Classification Tree

Everything has its Price: Foundations of Cost-Sensitive Learning and its Application in Psychology

Cost-Sensitive Learning and Ensemble BERT for Identifying and Categorizing Offensive Language in Social Media

A random forest classifier with cost-sensitive learning to extract urban landmarks from an imbalanced dataset

Cost-sensitive learning for semi-supervised hit-and-run analysis

Performance analysis of cost-sensitive learning methods with application to imbalanced medical data

Density-based weighting for imbalanced regression

cost sensitive learning
Recently Published Documents