Effect of Feature Selection, SMOTE and under Sampling on Class Imbalance Classification

Author(s):  
Nadeem Qazi ◽  
Kamran Raza
2021 ◽  
Author(s):  
Rekha G ◽  
Krishna Reddy V ◽  
chandrashekar jatoth ◽  
Ugo Fiore

Abstract Class imbalance problems have attracted the research community but a few works have focused on feature selection with imbalanced datasets. To handle class imbalance problems, we developed a novel fitness function for feature selection using the chaotic salp swarm optimization algorithm, an efficient meta-heuristic optimization algorithm that has been successfully used in a wide range of optimization problems. This paper proposes an Adaboost algorithm with chaotic salp swarm optimization. The most discriminating features are selected using salp swarm optimization and Adaboost classifiers are thereafter trained on the features selected. Experiments show the ability of the proposed technique to find the optimal features with performance maximization of Adaboost.


2021 ◽  
Author(s):  
Yizhang Zou ◽  
Xuegang Hu ◽  
Peipei Li ◽  
Junlong Li

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 191803-191814
Author(s):  
Shudong Liu ◽  
Ke Zhang

2020 ◽  
Vol 10 (14) ◽  
pp. 4945
Author(s):  
R. G. Gayathri ◽  
Atul Sajjanhar ◽  
Yong Xiang

Cybersecurity attacks can arise from internal and external sources. The attacks perpetrated by internal sources are also referred to as insider threats. These are a cause of serious concern to organizations because of the significant damage that can be inflicted by malicious insiders. In this paper, we propose an approach for insider threat classification which is motivated by the effectiveness of pre-trained deep convolutional neural networks (DCNNs) for image classification. In the proposed approach, we extract features from usage patterns of insiders and represent these features as images. Hence, images are used to represent the resource access patterns of the employees within an organization. After construction of images, we use pre-trained DCNNs for anomaly detection, with the aim to identify malicious insiders. Random under sampling is used for reducing the class imbalance issue. The proposed approach is evaluated using the MobileNetV2, VGG19, and ResNet50 pre-trained models, and a benchmark dataset. Experimental results show that the proposed method is effective and outperforms other state-of-the-art methods.


Author(s):  
Shaojian Qiu ◽  
Lu Lu ◽  
Siyu Jiang ◽  
Yang Guo

Machine-learning-based software defect prediction (SDP) methods are receiving great attention from the researchers of intelligent software engineering. Most existing SDP methods are performed under a within-project setting. However, there usually is little to no within-project training data to learn an available supervised prediction model for a new SDP task. Therefore, cross-project defect prediction (CPDP), which uses labeled data of source projects to learn a defect predictor for a target project, was proposed as a practical SDP solution. In real CPDP tasks, the class imbalance problem is ubiquitous and has a great impact on performance of the CPDP models. Unlike previous studies that focus on subsampling and individual methods, this study investigated 15 imbalanced learning methods for CPDP tasks, especially for assessing the effectiveness of imbalanced ensemble learning (IEL) methods. We evaluated the 15 methods by extensive experiments on 31 open-source projects derived from five datasets. Through analyzing a total of 37504 results, we found that in most cases, the IEL method that combined under-sampling and bagging approaches will be more effective than the other investigated methods.


Sign in / Sign up

Export Citation Format

Share Document