Community-Based Feature Selection for Credit Card Default Prediction

Credit card defaults pause a business-critical threat in banking systems thus prompt detection of defaulters is a crucial and challenging research problem. Machine learning algorithms must deal with a heavily skewed dataset since the ratio of defaulters to non-defaulters is very small. The purpose of this research is to apply different ensemble methods and compare their performance in detecting the probability of defaults customer’s credit card default payments in Taiwan from the UCI Machine learning repository. This is done on both the original skewed dataset and then on balanced dataset several studies have showed the superiority of neural networks as compared to traditional machine learning algorithms, the results of our study show that ensemble methods consistently outperform Neural Networks and other machine learning algorithms in terms of F1 score and area under receiver operating characteristic curve regardless of balancing the dataset or ignoring the imbalance

Download Full-text

Credit Card Default Prediction as a Classification Problem

Lecture Notes in Computer Science - Recent Trends and Future Technology in Applied Intelligence ◽

10.1007/978-3-319-92058-0_9 ◽

2018 ◽

pp. 88-100

Author(s):

Makram Soui ◽

Salima Smiti ◽

Salma Bribech ◽

Ines Gasmi

Keyword(s):

Credit Card ◽

Classification Problem ◽

Default Prediction ◽

Credit Card Default

Download Full-text

Research on Credit Card Default Prediction Based on k-Means SMOTE and BP Neural Network

Complexity ◽

10.1155/2021/6618841 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Ying Chen ◽

Ruirui Zhang

Keyword(s):

Neural Network ◽

Random Forest ◽

Bp Neural Network ◽

Credit Card ◽

Prediction Models ◽

Financial Institution ◽

Classification Performance ◽

Default Prediction ◽

Credit Card Default ◽

Default Data

Aiming at the problem that the credit card default data of a financial institution is unbalanced, which leads to unsatisfactory prediction results, this paper proposes a prediction model based on k-means SMOTE and BP neural network. In this model, k-means SMOTE algorithm is used to change the data distribution, and then the importance of data features is calculated by using random forest, and then it is substituted into the initial weights of BP neural network for prediction. The model effectively solves the problem of sample data imbalance. At the same time, this paper constructs five common machine learning models, KNN, logistics, SVM, random forest, and tree, and compares the classification performance of these six prediction models. The experimental results show that the proposed algorithm can greatly improve the prediction performance of the model, making its AUC value from 0.765 to 0.929. Moreover, when the importance of features is taken as the initial weight of BP neural network, the accuracy of model prediction is also slightly improved. In addition, compared with the other five prediction models, the comprehensive prediction effect of BP neural network is better.

Download Full-text

FEATURE SELECTION FOR DATASETS WITH IMBALANCED CLASS DISTRIBUTIONS

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194010004645 ◽

2010 ◽

Vol 20 (02) ◽

pp. 113-137 ◽

Cited By ~ 6

Author(s):

ABU H. M. KAMAL ◽

XINGQUAN ZHU ◽

ABHIJIT PANDYA ◽

SAM HSU ◽

RAMASWAMY NARAYANAN

Keyword(s):

Feature Selection ◽

Credit Card ◽

Prediction Models ◽

Accurate Prediction ◽

Minority Class ◽

Sample Distribution ◽

Data Collection Process ◽

Selection For ◽

Class Labels ◽

Data Collections

Feature selection for supervised learning concerns the problem of selecting a number of important features (w.r.t. the class labels) for the purposes of training accurate prediction models. Traditional feature selection methods, however, fail to take the sample distributions into consideration which may lead to poor prediction for minority class examples. Due to the sophistication and the cost involved in the data collection process, many applications, such as biomedical research, commonly face biased data collections with one class of examples (e.g., diseased samples) significantly less than other classes (e.g., normal samples). For these applications, the minority class examples, such as disease samples, credit card frauds, and network intrusions, are only a small portion of the data but deserve full attention for accurate prediction. In this paper, we propose three filtering techniques, Higher Weight (HW), Differential Minority Repeat (DMR) and Balanced Minority Repeat (BMR), to identify important features from datasets with biased sample distribution. Experimental comparisons with the ReliefF method on five datasets demonstrate the effectiveness of the proposed methods in selecting informative features for accurate prediction of minority class examples.

Download Full-text

Community-Based Feature Selection for Credit Card Default Prediction

A Feature Selection Based on Network Structure for Credit Card Default Prediction

Support Vector Machines with Evolutionary Feature Selection for Default Prediction

Credit Card Default Prediction using Machine Learning Techniques

Research on Bank Credit Card Default Prediction Based on Machine Learning

An Investigation of Credit Card Default Prediction in the Imbalanced Datasets

The Application of Machine Learning Algorithms in Credit Card Default Prediction

Comparison of Different Ensemble Methods in Credit Card Default Prediction

Credit Card Default Prediction as a Classification Problem

Research on Credit Card Default Prediction Based on k-Means SMOTE and BP Neural Network

FEATURE SELECTION FOR DATASETS WITH IMBALANCED CLASS DISTRIBUTIONS

Export Citation Format