class imbalance Latest Research Papers

HCBST: An Efficient Hybrid Sampling Technique for Class Imbalance Problems

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3488280 ◽

2022 ◽

Vol 16 (3) ◽

pp. 1-37

Author(s):

Robert A. Sowah ◽

Bernard Kuditchar ◽

Godfrey A. Mills ◽

Amevi Acakpovi ◽

Raphael A. Twum ◽

...

Keyword(s):

Geometric Mean ◽

Class Imbalance ◽

Sampling Technique ◽

Data Repository ◽

Support Vector ◽

Classification Algorithms ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

High Degree ◽

Hybrid Sampling

Class imbalance problem is prevalent in many real-world domains. It has become an active area of research. In binary classification problems, imbalance learning refers to learning from a dataset with a high degree of skewness to the negative class. This phenomenon causes classification algorithms to perform woefully when predicting positive classes with new examples. Data resampling, which involves manipulating the training data before applying standard classification techniques, is among the most commonly used techniques to deal with the class imbalance problem. This article presents a new hybrid sampling technique that improves the overall performance of classification algorithms for solving the class imbalance problem significantly. The proposed method called the Hybrid Cluster-Based Undersampling Technique (HCBST) uses a combination of the cluster undersampling technique to under-sample the majority instances and an oversampling technique derived from Sigma Nearest Oversampling based on Convex Combination, to oversample the minority instances to solve the class imbalance problem with a high degree of accuracy and reliability. The performance of the proposed algorithm was tested using 11 datasets from the National Aeronautics and Space Administration Metric Data Program data repository and University of California Irvine Machine Learning data repository with varying degrees of imbalance. Results were compared with classification algorithms such as the K-nearest neighbours, support vector machines, decision tree, random forest, neural network, AdaBoost, naïve Bayes, and quadratic discriminant analysis. Tests results revealed that for the same datasets, the HCBST performed better with average performances of 0.73, 0.67, and 0.35 in terms of performance measures of area under curve, geometric mean, and Matthews Correlation Coefficient, respectively, across all the classifiers used for this study. The HCBST has the potential of improving the performance of the class imbalance problem, which by extension, will improve on the various applications that rely on the concept for a solution.

Corrigendum to “Dual Focal Loss to address class imbalance in semantic segmentation” [Neurocomputing 462 (2021) 69-87]

Neurocomputing ◽

10.1016/j.neucom.2022.01.009 ◽

2022 ◽

Vol 477 ◽

pp. 61

Author(s):

Md Sazzad Hossain ◽

John M. Betts ◽

Andrew P. Paplinski

Keyword(s):

Class Imbalance ◽

Semantic Segmentation

Novel regularization method for the class imbalance problem

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115974 ◽

2022 ◽

Vol 188 ◽

pp. 115974

Author(s):

Bosung Kim ◽

Youngjoong Ko ◽

Jungyun Seo

Keyword(s):

Regularization Method ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem

A classification method to classify bone marrow cells with class imbalance problem

Biomedical Signal Processing and Control ◽

10.1016/j.bspc.2021.103296 ◽

2022 ◽

Vol 72 ◽

pp. 103296

Author(s):

Liang Guo ◽

Peiduo Huang ◽

Dehao Huang ◽

Zilan Li ◽

Chenglong She ◽

...

Keyword(s):

Bone Marrow ◽

Bone Marrow Cells ◽

Class Imbalance ◽

Classification Method ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Marrow Cells

Sleep Apnea Detection Based on Multi-Scale Residual Network

Life ◽

10.3390/life12010119 ◽

2022 ◽

Vol 12 (1) ◽

pp. 119

Author(s):

Hengyang Fang ◽

Changhua Lu ◽

Feng Hong ◽

Weiwei Jiang ◽

Tao Wang

Keyword(s):

Sleep Apnea ◽

Loss Function ◽

Physiological Mechanism ◽

Class Imbalance ◽

Residual Network ◽

Residual Structure ◽

Multi Scale ◽

Signal Features ◽

Low Sensitivity ◽

The Impact

Aiming at the fact that traditional convolutional neural networks cannot effectively extract signal features in complex application scenarios, a sleep apnea (SA) detection method based on multi-scale residual networks is proposed. First, we analyze the physiological mechanism of SA, which uses the RR interval signals and R peak signals derived from the ECG signals as input. Then, a multi-scale residual network is used to extract the characteristics of the original signals in order to obtain sensitive characteristics from various angles. Because the residual structure is used in the model, the problem of model degradation can be avoided. Finally, a fully connected layer is introduced for SA detection. In order to overcome the impact of class imbalance, a focal loss function is introduced to replace the traditional cross-entropy loss function, which makes the model pay more attention to learning difficult samples in the training phase. Experimental results from the Apnea-ECG dataset show that the accuracy, sensitivity and specificity of the proposed multi-scale residual network are 86.0%, 84.1% and 87.1%, respectively. These results indicate that the proposed method not only achieves greater recognition accuracy than other methods, but it also effectively resolves the problem of low sensitivity caused by class imbalance.

RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets

Electronics ◽

10.3390/electronics11020228 ◽

2022 ◽

Vol 11 (2) ◽

pp. 228

Author(s):

Ahmad B. Hassanat ◽

Ahmad S. Tarawneh ◽

Samer Subhi Abed ◽

Ghada Awad Altarawneh ◽

Malek Alrashidi ◽

...

Keyword(s):

Machine Learning ◽

Linear Time ◽

Class Imbalance ◽

Data Partitioning ◽

Majority Voting ◽

Random Data ◽

Imbalanced Datasets ◽

Resampling Methods ◽

Voting Rule ◽

Probability Of Overfitting

Since most classifiers are biased toward the dominant class, class imbalance is a challenging problem in machine learning. The most popular approaches to solving this problem include oversampling minority examples and undersampling majority examples. Oversampling may increase the probability of overfitting, whereas undersampling eliminates examples that may be crucial to the learning process. We present a linear time resampling method based on random data partitioning and a majority voting rule to address both concerns, where an imbalanced dataset is partitioned into a number of small subdatasets, each of which must be class balanced. After that, a specific classifier is trained for each subdataset, and the final classification result is established by applying the majority voting rule to the results of all of the trained models. We compared the performance of the proposed method to some of the most well-known oversampling and undersampling methods, employing a range of classifiers, on 33 benchmark machine learning class-imbalanced datasets. The classification results produced by the classifiers employed on the generated data by the proposed method were comparable to most of the resampling methods tested, with the exception of SMOTEFUNA, which is an oversampling method that increases the probability of overfitting. The proposed method produced results that were comparable to the Easy Ensemble (EE) undersampling method. As a result, for solving the challenge of machine learning from class-imbalanced datasets, we advocate using either EE or our method.

A Framework for Pedestrian Attribute Recognition Using Deep Learning

Applied Sciences ◽

10.3390/app12020622 ◽

2022 ◽

Vol 12 (2) ◽

pp. 622

Author(s):

Saadman Sakib ◽

Kaushik Deb ◽

Pranab Kumar Dhar ◽

Oh-Jin Kwon

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Class Imbalance ◽

Recognition Task ◽

Fine Tuning ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Technological Advances ◽

Learning Techniques ◽

Attribute Recognition

The pedestrian attribute recognition task is becoming more popular daily because of its significant role in surveillance scenarios. As the technological advances are significantly more than before, deep learning came to the surface of computer vision. Previous works applied deep learning in different ways to recognize pedestrian attributes. The results are satisfactory, but still, there is some scope for improvement. The transfer learning technique is becoming more popular for its extraordinary performance in reducing computation cost and scarcity of data in any task. This paper proposes a framework that can work in surveillance scenarios to recognize pedestrian attributes. The mask R-CNN object detector extracts the pedestrians. Additionally, we applied transfer learning techniques on different CNN architectures, i.e., Inception ResNet v2, Xception, ResNet 101 v2, ResNet 152 v2. The main contribution of this paper is fine-tuning the ResNet 152 v2 architecture, which is performed by freezing layers, last 4, 8, 12, 14, 20, none, and all. Moreover, data balancing techniques are applied, i.e., oversampling, to resolve the class imbalance problem of the dataset and analysis of the usefulness of this technique is discussed in this paper. Our proposed framework outperforms state-of-the-art methods, and it provides 93.41% mA and 89.24% mA on the RAP v2 and PARSE100K datasets, respectively.

Guidelines for the validation of machine learning predictions of species interactions

10.32942/osf.io/aty7n ◽

2022 ◽

Author(s):

Timothée Poisot

Keyword(s):

Predictive Models ◽

Species Interactions ◽

Specific Problem ◽

Class Imbalance ◽

Training Dataset ◽

Training Set ◽

Interaction Prediction ◽

Mathematical Arguments ◽

Data Volume ◽

Binary Classifiers

1. The prediction of species interactions is gaining momentum as a way to circumvent limitations in data volume. Yet, ecological networks are challenging to predict because they are typically small and sparse. Dealing with extreme class imbalance is a challenge for most binary classifiers, and there are currently no guidelines as to how predictive models can be trained for this specific problem.2. Using simple mathematical arguments and numerical experiments in which a variety of classifiers (for supervised learning) are trained on simulated networks, we develop a series of guidelines related to the choice of measures to use for model selection, and the degree of unbiasing to apply to the training dataset.3. Neither classifier accuracy nor the ROC-AUC are informative measures for the performance of interaction prediction. PR-AUC is a fairer assessment of performance. In some cases, even standard measures can lead to selecting a more biased classifier because the effect of connectance is strong. The amount of correction to apply to the training dataset depends on network connectance, on the measure to be optimized, and only weakly on the classifier.4. These results reveal that training machines to predict networks is a challenging task, and that in virtually all cases, the composition of the training set needs to be experimented on before performing the actual training. We discuss these consequences in the context of the low volume of data.

Automatic Deep Learning-Based Consolidation/Collapse Classification in Lung Ultrasound Images for COVID-19 Induced Pneumonia

10.36227/techrxiv.17912387 ◽

2022 ◽

Author(s):

Nabeel Durrani ◽

Damjan Vukovic ◽

Maria Antico ◽

Jeroen van der Burgt ◽

Ruud JG van van Sloun ◽

...

Keyword(s):

Deep Learning ◽

Class Imbalance ◽

Ultrasound Images ◽

Pleural Effusions ◽

Data Set ◽

Label Noise ◽

Set Size ◽

Video Frames ◽

Two Factors ◽

The Impact

<div>Our automated deep learning-based approach identifies consolidation/collapse in LUS images to aid in the diagnosis of late stages of COVID-19 induced pneumonia, where consolidation/collapse is one of the possible associated pathologies. A common challenge in training such models is that annotating each frame of an ultrasound video requires high labelling effort. This effort in practice becomes prohibitive for large ultrasound datasets. To understand the impact of various degrees of labelling precision, we compare labelling strategies to train fully supervised models (frame-based method, higher labelling effort) and inaccurately supervised models (video-based methods, lower labelling effort), both of which yield binary predictions for LUS videos on a frame-by-frame level. We moreover introduce a novel sampled quaternary method which randomly samples only 10% of the LUS video frames and subsequently assigns (ordinal) categorical labels to all frames in the video based on the fraction of positively annotated samples. This method outperformed the inaccurately supervised video-based method of our previous work on pleural effusions. More surprisingly, this method outperformed the supervised frame-based approach with respect to metrics such as precision-recall area under curve (PR-AUC) and F1 score that are suitable for the class imbalance scenario of our dataset despite being a form of inaccurate learning. This may be due to the combination of a significantly smaller data set size compared to our previous work and the higher complexity of consolidation/collapse compared to pleural effusion, two factors which contribute to label noise and overfitting; specifically, we argue that our video-based method is more robust with respect to label noise and mitigates overfitting in a manner similar to label smoothing. Using clinical expert feedback, separate criteria were developed to exclude data from the training and test sets respectively for our ten-fold cross validation results, which resulted in a PR-AUC score of 73% and an accuracy of 89%. While the efficacy of our classifier using the sampled quaternary method must be verified on a larger consolidation/collapse dataset, when considering the complexity of the pathology, our proposed classifier using the sampled quaternary video-based method is clinically comparable with trained experts and improves over the video-based method of our previous work on pleural effusions.</div>

Vehicle Interaction Behavior Prediction with Self-Attention

Sensors ◽

10.3390/s22020429 ◽

2022 ◽

Vol 22 (2) ◽

pp. 429

Author(s):

Linhui Li ◽

Xin Sui ◽

Jing Lian ◽

Fengning Yu ◽

Yafu Zhou

Keyword(s):

Cluster Structure ◽

Class Imbalance ◽

Behavior Prediction ◽

Time To Event ◽

Future Behavior ◽

Interaction Behavior ◽

Proposed Model ◽

High Interaction ◽

Fully Connected ◽

High Uncertainty

The structured road is a scene with high interaction between vehicles, but due to the high uncertainty of behavior, the prediction of vehicle interaction behavior is still a challenge. This prediction is significant for controlling the ego-vehicle. We propose an interaction behavior prediction model based on vehicle cluster (VC) by self-attention (VC-Attention) to improve the prediction performance. Firstly, a five-vehicle based cluster structure is designed to extract the interactive features between ego-vehicle and target vehicle, such as Deceleration Rate to Avoid a Crash (DRAC) and the lane gap. In addition, the proposed model utilizes the sliding window algorithm to extract VC behavior information. Then the temporal characteristics of the three interactive features mentioned above will be caught by two layers of self-attention encoder with six heads respectively. Finally, target vehicle’s future behavior will be predicted by a sub-network consists of a fully connected layer and SoftMax module. The experimental results show that this method has achieved accuracy, precision, recall, and F1 score of more than 92% and time to event of 2.9 s on a Next Generation Simulation (NGSIM) dataset. It accurately predicts the interactive behaviors in class-imbalance prediction and adapts to various driving scenarios.

class imbalance
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

HCBST: An Efficient Hybrid Sampling Technique for Class Imbalance Problems

Corrigendum to “Dual Focal Loss to address class imbalance in semantic segmentation” [Neurocomputing 462 (2021) 69-87]

Novel regularization method for the class imbalance problem

A classification method to classify bone marrow cells with class imbalance problem

Sleep Apnea Detection Based on Multi-Scale Residual Network

RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets

A Framework for Pedestrian Attribute Recognition Using Deep Learning

Guidelines for the validation of machine learning predictions of species interactions

Automatic Deep Learning-Based Consolidation/Collapse Classification in Lung Ultrasound Images for COVID-19 Induced Pneumonia

Vehicle Interaction Behavior Prediction with Self-Attention

Export Citation Format

class imbalanceRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

HCBST: An Efficient Hybrid Sampling Technique for Class Imbalance Problems

Corrigendum to “Dual Focal Loss to address class imbalance in semantic segmentation” [Neurocomputing 462 (2021) 69-87]

Novel regularization method for the class imbalance problem

A classification method to classify bone marrow cells with class imbalance problem

Sleep Apnea Detection Based on Multi-Scale Residual Network

RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets

A Framework for Pedestrian Attribute Recognition Using Deep Learning

Guidelines for the validation of machine learning predictions of species interactions

Automatic Deep Learning-Based Consolidation/Collapse Classification in Lung Ultrasound Images for COVID-19 Induced Pneumonia

Vehicle Interaction Behavior Prediction with Self-Attention

class imbalance
Recently Published Documents