Improved credit scoring model using XGBoost with Bayesian hyper-parameter optimization

Wirot Yotsawat; Pakaket Wattuya; Anongnart Srivihok

doi:10.11591/ijece.v11i6.pp5477-5487

Improved credit scoring model using XGBoost with Bayesian hyper-parameter optimization

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i6.pp5477-5487 ◽

2021 ◽

Vol 11 (6) ◽

pp. 5477

Author(s):

Wirot Yotsawat ◽

Pakaket Wattuya ◽

Anongnart Srivihok

Keyword(s):

Parameter Optimization ◽

Missing Values ◽

Credit Scoring ◽

Gradient Boosting ◽

Support Vector ◽

Scoring Model ◽

Ensemble Models ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

Credit Scoring Model

<span>Several credit-scoring models have been developed using ensemble classifiers in order to improve the accuracy of assessment. However, among the ensemble models, little consideration has been focused on the hyper-parameters tuning of base learners, although these are crucial to constructing ensemble models. This study proposes an improved credit scoring model based on the extreme gradient boosting (XGB) classifier using Bayesian hyper-parameters optimization (XGB-BO). The model comprises two steps. Firstly, data pre-processing is utilized to handle missing values and scale the data. Secondly, Bayesian hyper-parameter optimization is applied to tune the hyper-parameters of the XGB classifier and used to train the model. The model is evaluated on four widely public datasets, i.e., the German, Australia, lending club, and Polish datasets. Several state-of-the-art classification algorithms are implemented for predictive comparison with the proposed method. The results of the proposed model showed promising results, with an improvement in accuracy of 4.10%, 3.03%, and 2.76% on the German, lending club, and Australian datasets, respectively. The proposed model outperformed commonly used techniques, e.g., decision tree, support vector machine, neural network, logistic regression, random forest, and bagging, according to the evaluation results. The experimental results confirmed that the XGB-BO model is suitable for assessing the creditworthiness of applicants.</span>

Get full-text (via PubEx)

HyP-ABC: A Novel Automated Hyper-Parameter Tuning Algorithm Using Evolutionary Optimization

10.36227/techrxiv.14714508.v2 ◽

2021 ◽

Author(s):

Leila Zahedi ◽

Farid Ghareh Mohammadi ◽

M. Hadi Amini

Keyword(s):

Parameter Optimization ◽

Real World ◽

Optimization Problems ◽

State Of The Art ◽

Parameter Tuning ◽

Gradient Boosting ◽

Support Vector ◽

Wide Range ◽

Extreme Gradient Boosting ◽

Art Techniques

Machine learning techniques lend themselves as promising decision-making and analytic tools in a wide range of applications. Different ML algorithms have various hyper-parameters. In order to tailor an ML model towards a specific application, a large number of hyper-parameters should be tuned. Tuning the hyper-parameters directly affects the performance (accuracy and run-time). However, for large-scale search spaces, efficiently exploring the ample number of combinations of hyper-parameters is computationally challenging. Existing automated hyper-parameter tuning techniques suffer from high time complexity. In this paper, we propose HyP-ABC, an automatic innovative hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach, to measure the classification accuracy of three ML algorithms, namely random forest, extreme gradient boosting, and support vector machine. Compared to the state-of-the-art techniques, HyP-ABC is more efficient and has a limited number of parameters to be tuned, making it worthwhile for real-world hyper-parameter optimization problems. We further compare our proposed HyP-ABC algorithm with state-of-the-art techniques. In order to ensure the robustness of the proposed method, the algorithm takes a wide range of feasible hyper-parameter values, and is tested using a real-world educational dataset.

Get full-text (via PubEx)

A credit scoring model using support vector machine

Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No.04EX788) ◽

10.1109/wcica.2004.1341919 ◽

2004 ◽

Author(s):

Xiang Tian ◽

Feiqi Deng

Keyword(s):

Support Vector Machine ◽

Credit Scoring ◽

Support Vector ◽

Scoring Model ◽

Credit Scoring Model

Get full-text (via PubEx)

A Hybrid Credit Scoring Model Based on Genetic Programming and Support Vector Machines

2008 Fourth International Conference on Natural Computation ◽

10.1109/icnc.2008.205 ◽

2008 ◽

Cited By ~ 13

Author(s):

Defu Zhang ◽

Mhand Hifi ◽

Qingshan Chen ◽

Weiguo Ye

Keyword(s):

Support Vector Machines ◽

Genetic Programming ◽

Credit Scoring ◽

Support Vector ◽

Scoring Model ◽

Model Based ◽

Vector Machines ◽

Credit Scoring Model

Get full-text (via PubEx)

Multi-Layer Hybrid Credit Scoring Model Based on Feature Selection, Ensemble Learning, and Ensemble Classifier

Handbook of Research on Emerging Trends and Applications of Machine Learning - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9643-1.ch021 ◽

2020 ◽

pp. 444-460

Author(s):

Diwakar Tripathi ◽

Alok Kumar Shukla ◽

Ramchandra Reddy B. ◽

Ghanshyam S. Bopche

Keyword(s):

Ensemble Learning ◽

Missing Values ◽

Credit Scoring ◽

Predictive Performance ◽

Ensemble Classifier ◽

Learning Approaches ◽

Ensemble Classifiers ◽

Scoring Model ◽

Second Stage ◽

Credit Scoring Model

Credit scoring is a process to calculate the risk associated with a credit product, and it directly affects the profitability of that industry. Periodically, financial institutions apply credit scoring in various steps. The main focus of this study is to improve the predictive performance of the credit scoring model. To improve the predictive performance of the model, this study proposes a multi-layer hybrid credit scoring model. The first stage concerns pre-processing, which includes treatment for missing values, data-transformation, and reduction of irrelevant and noisy features because they may affect predictive performance of model. The second stage applies various ensemble learning approaches such as Bagging, Adaboost, etc. At the last layer, it applies ensemble classifiers approach, which combines three heterogeneous classifiers, namely: random forest (RF), logistic regression (LR), and sequential minimal optimization (SMO) approaches for classification. Further, the proposed multi-layer model is validated on various real-world credit scoring datasets.

Get full-text (via PubEx)

Credit Scoring Model based on Kernel Density Estimation and Support Vector Machine for Group Feature Selection

2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) ◽

10.1109/icacci.2018.8554524 ◽

2018 ◽

Author(s):

Xingzhi Zhang ◽

Zhurong Zhou

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Density Estimation ◽

Kernel Density Estimation ◽

Credit Scoring ◽

Kernel Density ◽

Support Vector ◽

Scoring Model ◽

Model Based ◽

Credit Scoring Model

Get full-text (via PubEx)

Multi-Class Support Vector Machine for Credit Scoring

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.235.419 ◽

2012 ◽

Vol 235 ◽

pp. 419-422 ◽

Cited By ~ 1

Author(s):

Bo Tang ◽

Sai Bing Qiu

Keyword(s):

Support Vector Machine ◽

Credit Scoring ◽

Real Life ◽

Assessment Model ◽

Behavior Assessment ◽

Support Vector ◽

Classification Problems ◽

Scoring Model ◽

Multiple Classification ◽

Credit Scoring Model

The general credit scoring model is to solve the two classification problems, but in real life we often encounter multiple classification problems. This paper proposes a multi-class support vector machine, which can solve multiple classification problems in the behavior assessment model.

Get full-text (via PubEx)

A COST-SENSITIVE LOGISTIC REGRESSION CREDIT SCORING MODEL BASED ON MULTI-OBJECTIVE OPTIMIZATION APPROACH

Technological and Economic Development of Economy ◽

10.3846/tede.2019.11337 ◽

2019 ◽

Vol 26 (2) ◽

pp. 405-429 ◽

Cited By ~ 2

Author(s):

Feng Shen ◽

Run Wang ◽

Yu Shen

Keyword(s):

Logistic Regression ◽

Credit Scoring ◽

Classification Error ◽

Optimization Approach ◽

Multi Objective Optimization ◽

Scoring Model ◽

Multi Objective ◽

Proposed Model ◽

The Cost ◽

Credit Scoring Model

Credit scoring is an important process for peer-to-peer (P2P) lending companies as it determines whether loan applicants are likely to default. The aim of most credit scoring models is to minimize the classification error rate, which implies that all classification errors bear the same cost; however, in reality, there is a significant cost-sensitive problem in credit scoring methods. Therefore, in this paper, a new cost-sensitive logistic regression credit scoring model based on a multi-objective optimization approach is proposed that has two objectives in the cost-sensitive logistic regression process. The cost-sensitive logistic regression parameters are solved using a multiple objective particle swarm optimization (MOPSO) algorithm. In the empirical analysis, the proposed model was applied to the credit scoring of a Chinese famous P2P company, from which it was found that compared with other common credit scoring models, the proposed model was able to effectively reduce type II error rates and total classification error costs, and improve the AUC, the F1 values (reconciliation average of Recall and Precision), and the G-means. The proposed model was compared with other multi-objective optimization algorithms to further demonstrate that MOPSO is the best approach for cost-sensitive logistic regression credit scoring models.

Get full-text (via PubEx)

A XGBoost Model with Weather Similarity Analysis and Feature Engineering for Short-Term Wind Power Forecasting

Applied Sciences ◽

10.3390/app9153019 ◽

2019 ◽

Vol 9 (15) ◽

pp. 3019 ◽

Cited By ~ 5

Author(s):

Huan Zheng ◽

Yanghui Wu

Keyword(s):

Wind Power ◽

Gradient Boosting ◽

Support Vector ◽

Similarity Analysis ◽

Feature Engineering ◽

Short Term ◽

Wind Power Forecasting ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

Power Forecasting

Large-scale wind power access may cause a series of safety and stability problems. Wind power forecasting (WPF) is beneficial to dispatch in advance. In this paper, a new extreme gradient boosting (XGBoost) model with weather similarity analysis and feature engineering is proposed for short-term wind power forecasting. Based on the similarity among historical days’ weather, k-means clustering algorithm is used to divide the samples into several categories. Additionally, we also create some time features and drop unimportant features through feature engineering. For each category, we make predictions using XGBoost. The results of the proposed model are compared with the back propagation neural network (BPNN) and classification and regression tree (CART), random forests (RF), support vector regression (SVR), and a single XGBoost model. It is shown that the proposed model produces the highest forecasting accuracy among all these models.

Get full-text (via PubEx)

Feature Selection in a Credit Scoring Model

Mathematics ◽

10.3390/math9070746 ◽

2021 ◽

Vol 9 (7) ◽

pp. 746

Author(s):

Juan Laborda ◽

Seyong Ryoo

Keyword(s):

Feature Selection ◽

Credit Scoring ◽

Superior Performance ◽

Filter Method ◽

Support Vector ◽

Classification Algorithms ◽

Scoring Model ◽

Stepwise Selection ◽

Forward Stepwise ◽

Credit Scoring Model

This paper proposes different classification algorithms—logistic regression, support vector machine, K-nearest neighbors, and random forest—in order to identify which candidates are likely to default for a credit scoring model. Three different feature selection methods are used in order to mitigate the overfitting in the curse of dimensionality of these classification algorithms: one filter method (Chi-squared test and correlation coefficients) and two wrapper methods (forward stepwise selection and backward stepwise selection). The performances of these three methods are discussed using two measures, the mean absolute error and the number of selected features. The methodology is applied for a valuable database of Taiwan. The results suggest that forward stepwise selection yields superior performance in each one of the classification algorithms used. The conclusions obtained are related to those in the literature, and their managerial implications are analyzed.

Get full-text (via PubEx)

An Improved Support Vector Machine for Credit Scoring

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.4407 ◽

2014 ◽

Vol 513-517 ◽

pp. 4407-4410 ◽

Cited By ~ 2

Author(s):

Bo Tang ◽

Sai Bing Qiu

Keyword(s):

Support Vector Machine ◽

Credit Scoring ◽

Real Life ◽

Assessment Model ◽

Support Vector ◽

Classification Problems ◽

Scoring Model ◽

Multiple Classification ◽

Improved Support Vector Machine ◽

Credit Scoring Model

With the development of Chinas economy, credit scoring has become important. The general credit scoring model is to solve the two classification problems, but in real life we often encounter multiple classification problems. This paper proposes a multi-class support vector machine based on genetic algorithm, which can solve multiple classification problems in the behavior assessment model.

Get full-text (via PubEx)