Effect of Feature Selection, SMOTE and under Sampling on Class Imbalance Classification

Abstract Class imbalance problems have attracted the research community but a few works have focused on feature selection with imbalanced datasets. To handle class imbalance problems, we developed a novel fitness function for feature selection using the chaotic salp swarm optimization algorithm, an efficient meta-heuristic optimization algorithm that has been successfully used in a wide range of optimization problems. This paper proposes an Adaboost algorithm with chaotic salp swarm optimization. The most discriminating features are selected using salp swarm optimization and Adaboost classifiers are thereafter trained on the features selected. Experiments show the ability of the proposed technique to find the optimal features with performance maximization of Adaboost.

Download Full-text

Cost-Sensitive Feature Selection for Class Imbalance Problem

Information Systems Architecture and Technology: Proceedings of 38th International Conference on Information Systems Architecture and Technology – ISAT 2017 - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-67220-5_17 ◽

2017 ◽

pp. 182-194 ◽

Cited By ~ 3

Author(s):

Małgorzata Bach ◽

Aleksandra Werner

Keyword(s):

Feature Selection ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Selection For

Download Full-text

Multi-Label Streaming Feature Selection via Class-Imbalance Aware Rough Set

10.1109/ijcnn52387.2021.9533614 ◽

2021 ◽

Author(s):

Yizhang Zou ◽

Xuegang Hu ◽

Peipei Li ◽

Junlong Li

Keyword(s):

Feature Selection ◽

Rough Set ◽

Class Imbalance

Download Full-text

Under-Sampling and Feature Selection Algorithms for S2SMLP

IEEE Access ◽

10.1109/access.2020.3032520 ◽

2020 ◽

Vol 8 ◽

pp. 191803-191814

Author(s):

Shudong Liu ◽

Ke Zhang

Keyword(s):

Feature Selection ◽

Under Sampling ◽

Selection Algorithms

Download Full-text

Image-Based Feature Representation for Insider Threat Classification

Applied Sciences ◽

10.3390/app10144945 ◽

2020 ◽

Vol 10 (14) ◽

pp. 4945

Author(s):

R. G. Gayathri ◽

Atul Sajjanhar ◽

Yong Xiang

Keyword(s):

Class Imbalance ◽

Feature Representation ◽

Insider Threat ◽

Deep Convolutional Neural Networks ◽

Usage Patterns ◽

Under Sampling ◽

External Sources ◽

Internal Sources ◽

Malicious Insiders ◽

Access Patterns

Cybersecurity attacks can arise from internal and external sources. The attacks perpetrated by internal sources are also referred to as insider threats. These are a cause of serious concern to organizations because of the significant damage that can be inflicted by malicious insiders. In this paper, we propose an approach for insider threat classification which is motivated by the effectiveness of pre-trained deep convolutional neural networks (DCNNs) for image classification. In the proposed approach, we extract features from usage patterns of insiders and represent these features as images. Hence, images are used to represent the resource access patterns of the employees within an organization. After construction of images, we use pre-trained DCNNs for anomaly detection, with the aim to identify malicious insiders. Random under sampling is used for reducing the class imbalance issue. The proposed approach is evaluated using the MobileNetV2, VGG19, and ResNet50 pre-trained models, and a benchmark dataset. Experimental results show that the proposed method is effective and outperforms other state-of-the-art methods.

Download Full-text

An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419590377 ◽

2019 ◽

Vol 33 (12) ◽

pp. 1959037 ◽

Cited By ~ 5

Author(s):

Shaojian Qiu ◽

Lu Lu ◽

Siyu Jiang ◽

Yang Guo

Keyword(s):

Ensemble Learning ◽

Class Imbalance ◽

Training Data ◽

Defect Prediction ◽

Class Imbalance Problem ◽

Learning Methods ◽

Imbalance Problem ◽

Intelligent Software ◽

Under Sampling ◽

Cross Project

Machine-learning-based software defect prediction (SDP) methods are receiving great attention from the researchers of intelligent software engineering. Most existing SDP methods are performed under a within-project setting. However, there usually is little to no within-project training data to learn an available supervised prediction model for a new SDP task. Therefore, cross-project defect prediction (CPDP), which uses labeled data of source projects to learn a defect predictor for a target project, was proposed as a practical SDP solution. In real CPDP tasks, the class imbalance problem is ubiquitous and has a great impact on performance of the CPDP models. Unlike previous studies that focus on subsampling and individual methods, this study investigated 15 imbalanced learning methods for CPDP tasks, especially for assessing the effectiveness of imbalanced ensemble learning (IEL) methods. We evaluated the 15 methods by extensive experiments on 31 open-source projects derived from five datasets. Through analyzing a total of 37504 results, we found that in most cases, the IEL method that combined under-sampling and bagging approaches will be more effective than the other investigated methods.

Download Full-text