Community-Based Feature Selection for Credit Card Default Prediction

Author(s):  
Qiucheng Wang ◽  
Yanmei Hu ◽  
Jun Li
IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 201173-201198 ◽  
Author(s):  
Talha Mahboob Alam ◽  
Kamran Shaukat ◽  
Ibrahim A. Hameed ◽  
Suhuai Luo ◽  
Muhammad Umer Sarwar ◽  
...  

2021 ◽  
Vol 5 (2) ◽  
pp. 20-25
Author(s):  
Azhi Abdalmohammed Faraj ◽  
Didam Ahmed Mahmud ◽  
Bilal Najmaddin Rashid

Credit card defaults pause a business-critical threat in banking systems thus prompt detection of defaulters is a crucial and challenging research problem. Machine learning algorithms must deal with a heavily skewed dataset since the ratio of defaulters to non-defaulters is very small. The purpose of this research is to apply different ensemble methods and compare their performance in detecting the probability of defaults customer’s credit card default payments in Taiwan from the UCI Machine learning repository. This is done on both the original skewed dataset and then on balanced dataset several studies have showed the superiority of neural networks as compared to traditional machine learning algorithms, the results of our study show that ensemble methods consistently outperform Neural Networks and other machine learning algorithms in terms of F1 score and area under receiver operating characteristic curve regardless of balancing the dataset or ignoring the imbalance


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Ying Chen ◽  
Ruirui Zhang

Aiming at the problem that the credit card default data of a financial institution is unbalanced, which leads to unsatisfactory prediction results, this paper proposes a prediction model based on k-means SMOTE and BP neural network. In this model, k-means SMOTE algorithm is used to change the data distribution, and then the importance of data features is calculated by using random forest, and then it is substituted into the initial weights of BP neural network for prediction. The model effectively solves the problem of sample data imbalance. At the same time, this paper constructs five common machine learning models, KNN, logistics, SVM, random forest, and tree, and compares the classification performance of these six prediction models. The experimental results show that the proposed algorithm can greatly improve the prediction performance of the model, making its AUC value from 0.765 to 0.929. Moreover, when the importance of features is taken as the initial weight of BP neural network, the accuracy of model prediction is also slightly improved. In addition, compared with the other five prediction models, the comprehensive prediction effect of BP neural network is better.


Author(s):  
ABU H. M. KAMAL ◽  
XINGQUAN ZHU ◽  
ABHIJIT PANDYA ◽  
SAM HSU ◽  
RAMASWAMY NARAYANAN

Feature selection for supervised learning concerns the problem of selecting a number of important features (w.r.t. the class labels) for the purposes of training accurate prediction models. Traditional feature selection methods, however, fail to take the sample distributions into consideration which may lead to poor prediction for minority class examples. Due to the sophistication and the cost involved in the data collection process, many applications, such as biomedical research, commonly face biased data collections with one class of examples (e.g., diseased samples) significantly less than other classes (e.g., normal samples). For these applications, the minority class examples, such as disease samples, credit card frauds, and network intrusions, are only a small portion of the data but deserve full attention for accurate prediction. In this paper, we propose three filtering techniques, Higher Weight (HW), Differential Minority Repeat (DMR) and Balanced Minority Repeat (BMR), to identify important features from datasets with biased sample distribution. Experimental comparisons with the ReliefF method on five datasets demonstrate the effectiveness of the proposed methods in selecting informative features for accurate prediction of minority class examples.


Sign in / Sign up

Export Citation Format

Share Document