class imbalance problem
Recently Published Documents





2022 ◽  
Vol 16 (3) ◽  
pp. 1-37
Robert A. Sowah ◽  
Bernard Kuditchar ◽  
Godfrey A. Mills ◽  
Amevi Acakpovi ◽  
Raphael A. Twum ◽  

Class imbalance problem is prevalent in many real-world domains. It has become an active area of research. In binary classification problems, imbalance learning refers to learning from a dataset with a high degree of skewness to the negative class. This phenomenon causes classification algorithms to perform woefully when predicting positive classes with new examples. Data resampling, which involves manipulating the training data before applying standard classification techniques, is among the most commonly used techniques to deal with the class imbalance problem. This article presents a new hybrid sampling technique that improves the overall performance of classification algorithms for solving the class imbalance problem significantly. The proposed method called the Hybrid Cluster-Based Undersampling Technique (HCBST) uses a combination of the cluster undersampling technique to under-sample the majority instances and an oversampling technique derived from Sigma Nearest Oversampling based on Convex Combination, to oversample the minority instances to solve the class imbalance problem with a high degree of accuracy and reliability. The performance of the proposed algorithm was tested using 11 datasets from the National Aeronautics and Space Administration Metric Data Program data repository and University of California Irvine Machine Learning data repository with varying degrees of imbalance. Results were compared with classification algorithms such as the K-nearest neighbours, support vector machines, decision tree, random forest, neural network, AdaBoost, naïve Bayes, and quadratic discriminant analysis. Tests results revealed that for the same datasets, the HCBST performed better with average performances of 0.73, 0.67, and 0.35 in terms of performance measures of area under curve, geometric mean, and Matthews Correlation Coefficient, respectively, across all the classifiers used for this study. The HCBST has the potential of improving the performance of the class imbalance problem, which by extension, will improve on the various applications that rely on the concept for a solution.

2022 ◽  
Vol 72 ◽  
pp. 103296
Liang Guo ◽  
Peiduo Huang ◽  
Dehao Huang ◽  
Zilan Li ◽  
Chenglong She ◽  

2022 ◽  
Vol 12 (2) ◽  
pp. 622
Saadman Sakib ◽  
Kaushik Deb ◽  
Pranab Kumar Dhar ◽  
Oh-Jin Kwon

The pedestrian attribute recognition task is becoming more popular daily because of its significant role in surveillance scenarios. As the technological advances are significantly more than before, deep learning came to the surface of computer vision. Previous works applied deep learning in different ways to recognize pedestrian attributes. The results are satisfactory, but still, there is some scope for improvement. The transfer learning technique is becoming more popular for its extraordinary performance in reducing computation cost and scarcity of data in any task. This paper proposes a framework that can work in surveillance scenarios to recognize pedestrian attributes. The mask R-CNN object detector extracts the pedestrians. Additionally, we applied transfer learning techniques on different CNN architectures, i.e., Inception ResNet v2, Xception, ResNet 101 v2, ResNet 152 v2. The main contribution of this paper is fine-tuning the ResNet 152 v2 architecture, which is performed by freezing layers, last 4, 8, 12, 14, 20, none, and all. Moreover, data balancing techniques are applied, i.e., oversampling, to resolve the class imbalance problem of the dataset and analysis of the usefulness of this technique is discussed in this paper. Our proposed framework outperforms state-of-the-art methods, and it provides 93.41% mA and 89.24% mA on the RAP v2 and PARSE100K datasets, respectively.

2022 ◽  
Vol 10 (1) ◽  
pp. 0-0

Heterogeneous CPDP (HCPDP) attempts to forecast defects in a software application having insufficient previous defect data. Nonetheless, with a Class Imbalance Problem (CIP) perspective, one should have a clear view of data distribution in the training dataset otherwise the trained model would lead to biased classification results. Class Imbalance Learning (CIL) is the method of achieving an equilibrium ratio between two classes in imbalanced datasets. There are a range of effective solutions to manage CIP such as resampling techniques like Over-Sampling (OS) & Under-Sampling (US) methods. The proposed research work employs Synthetic Minority Oversampling TEchnique (SMOTE) and Random Under Sampling (RUS) technique to handle CIP. In addition to this, the paper proposes a novel four-phase HCPDP model and contrasts the efficiency of basic HCPDP model with CIP and after handling CIP using SMOTE & RUS with three prediction pairs. Results show that training performance with SMOTE is substantially improved but RUS displays variations in relation to HCPDP for all three prediction pairs.

Energies ◽  
2021 ◽  
Vol 15 (1) ◽  
pp. 212
Ajit Kumar ◽  
Neetesh Saxena ◽  
Souhwan Jung ◽  
Bong Jun Choi

Critical infrastructures have recently been integrated with digital controls to support intelligent decision making. Although this integration provides various benefits and improvements, it also exposes the system to new cyberattacks. In particular, the injection of false data and commands into communication is one of the most common and fatal cyberattacks in critical infrastructures. Hence, in this paper, we investigate the effectiveness of machine-learning algorithms in detecting False Data Injection Attacks (FDIAs). In particular, we focus on two of the most widely used critical infrastructures, namely power systems and water treatment plants. This study focuses on tackling two key technical issues: (1) finding the set of best features under a different combination of techniques and (2) resolving the class imbalance problem using oversampling methods. We evaluate the performance of each algorithm in terms of time complexity and detection accuracy to meet the time-critical requirements of critical infrastructures. Moreover, we address the inherent skewed distribution problem and the data imbalance problem commonly found in many critical infrastructure datasets. Our results show that the considered minority oversampling techniques can improve the Area Under Curve (AUC) of GradientBoosting, AdaBoost, and kNN by 10–12%.

2021 ◽  
Vol 12 (4) ◽  
pp. 267
Naoui Mohamed ◽  
Flah Aymen ◽  
Mohammed Alqarni

The effectiveness of inductive power transfer (IPT) presents a serious challenge for improving the global recharge system performance. An electric vehicle (EVs) needs to be charged rapidly and have maximum power when it is charged with wireless technology. Based on various research, the performance of this recharge system is attached to several points and the frequency resonance is one of those parameters that can influence. In this paper, we try to explore the relationship between the obtained power and the signal input frequency for charging a lithium battery, solve the class imbalance problem and understand the maximum allowed frequency. To obtain the results, a mathematical model was first created to demonstrate the relationship, then the dynamic model was validated and tested using the Matlab Simulink platform. The performance of the worldwide wireless recharging system in terms of frequency variation is depicted in a summary graph.

Changxu Dong ◽  
Yanna Zhao ◽  
Gaobo Zhang ◽  
Mingrui Xue ◽  
Dengyu Chu ◽  

Epilepsy is a chronic brain disease resulted from the central nervous system lesion, which leads to repeated seizure occurs for the patients. Automatic seizure detection with Electroencephalogram (EEG) has witnessed great progress. However, existing methods paid little attention to the topological relationships of different EEG electrodes. Latest neuroscience researches have demonstrated the connectivity between different brain regions. Besides, class-imbalance is a common problem in EEG based seizure detection. The duration of epileptic EEG signals is much shorter than that of normal signals. In order to deal with the above mentioned two challenges, we propose to model the multi-channel EEG data using the Attention-based Graph ResNet (AGRN). In particular, each channel of the EEG signal represents a node of the graph and the inter-channel relations are modeled via the adjacency matrix in the graph. The loss function of the ARGN model is re-designed using focal loss to cope with the class-imbalance problem. The proposed ARGN with focal model could learn discriminative features from the raw EEG data. Experiments are carried out on the CHB-MIT dataset. The proposed model achieves an average accuracy of 98.70%, a sensitivity of 97.94%, a specificity of 98.66% and a precision of 98.62%. The Area Under the ROC Curve (AUC) is 98.69%.

Dilshad Jahin ◽  
Israt Jahan Emu ◽  
Subrina Akter ◽  
Muhammed J.A. Patwary ◽  
Mohammad Arif Sobhan Bhuiyan ◽  

Sign in / Sign up

Export Citation Format

Share Document