scholarly journals Improved credit scoring model using XGBoost with Bayesian hyper-parameter optimization

Author(s):  
Wirot Yotsawat ◽  
Pakaket Wattuya ◽  
Anongnart Srivihok

<span>Several credit-scoring models have been developed using ensemble classifiers in order to improve the accuracy of assessment. However, among the ensemble models, little consideration has been focused on the hyper-parameters tuning of base learners, although these are crucial to constructing ensemble models. This study proposes an improved credit scoring model based on the extreme gradient boosting (XGB) classifier using Bayesian hyper-parameters optimization (XGB-BO). The model comprises two steps. Firstly, data pre-processing is utilized to handle missing values and scale the data. Secondly, Bayesian hyper-parameter optimization is applied to tune the hyper-parameters of the XGB classifier and used to train the model. The model is evaluated on four widely public datasets, i.e., the German, Australia, lending club, and Polish datasets. Several state-of-the-art classification algorithms are implemented for predictive comparison with the proposed method. The results of the proposed model showed promising results, with an improvement in accuracy of 4.10%, 3.03%, and 2.76% on the German, lending club, and Australian datasets, respectively. The proposed model outperformed commonly used techniques, e.g., decision tree, support vector machine, neural network, logistic regression, random forest, and bagging, according to the evaluation results. The experimental results confirmed that the XGB-BO model is suitable for assessing the creditworthiness of applicants.</span>

2021 ◽  
Author(s):  
Leila Zahedi ◽  
Farid Ghareh Mohammadi ◽  
M. Hadi Amini

Machine learning techniques lend themselves as promising decision-making and analytic tools in a wide range of applications. Different ML algorithms have various hyper-parameters. In order to tailor an ML model towards a specific application, a large number of hyper-parameters should be tuned. Tuning the hyper-parameters directly affects the performance (accuracy and run-time). However, for large-scale search spaces, efficiently exploring the ample number of combinations of hyper-parameters is computationally challenging. Existing automated hyper-parameter tuning techniques suffer from high time complexity. In this paper, we propose HyP-ABC, an automatic innovative hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach, to measure the classification accuracy of three ML algorithms, namely random forest, extreme gradient boosting, and support vector machine. Compared to the state-of-the-art techniques, HyP-ABC is more efficient and has a limited number of parameters to be tuned, making it worthwhile for real-world hyper-parameter optimization problems. We further compare our proposed HyP-ABC algorithm with state-of-the-art techniques. In order to ensure the robustness of the proposed method, the algorithm takes a wide range of feasible hyper-parameter values, and is tested using a real-world educational dataset.


Author(s):  
Diwakar Tripathi ◽  
Alok Kumar Shukla ◽  
Ramchandra Reddy B. ◽  
Ghanshyam S. Bopche

Credit scoring is a process to calculate the risk associated with a credit product, and it directly affects the profitability of that industry. Periodically, financial institutions apply credit scoring in various steps. The main focus of this study is to improve the predictive performance of the credit scoring model. To improve the predictive performance of the model, this study proposes a multi-layer hybrid credit scoring model. The first stage concerns pre-processing, which includes treatment for missing values, data-transformation, and reduction of irrelevant and noisy features because they may affect predictive performance of model. The second stage applies various ensemble learning approaches such as Bagging, Adaboost, etc. At the last layer, it applies ensemble classifiers approach, which combines three heterogeneous classifiers, namely: random forest (RF), logistic regression (LR), and sequential minimal optimization (SMO) approaches for classification. Further, the proposed multi-layer model is validated on various real-world credit scoring datasets.


2012 ◽  
Vol 235 ◽  
pp. 419-422 ◽  
Author(s):  
Bo Tang ◽  
Sai Bing Qiu

The general credit scoring model is to solve the two classification problems, but in real life we often encounter multiple classification problems. This paper proposes a multi-class support vector machine, which can solve multiple classification problems in the behavior assessment model.


2019 ◽  
Vol 26 (2) ◽  
pp. 405-429 ◽  
Author(s):  
Feng Shen ◽  
Run Wang ◽  
Yu Shen

Credit scoring is an important process for peer-to-peer (P2P) lending companies as it determines whether loan applicants are likely to default. The aim of most credit scoring models is to minimize the classification error rate, which implies that all classification errors bear the same cost; however, in reality, there is a significant cost-sensitive problem in credit scoring methods. Therefore, in this paper, a new cost-sensitive logistic regression credit scoring model based on a multi-objective optimization approach is proposed that has two objectives in the cost-sensitive logistic regression process. The cost-sensitive logistic regression parameters are solved using a multiple objective particle swarm optimization (MOPSO) algorithm. In the empirical analysis, the proposed model was applied to the credit scoring of a Chinese famous P2P company, from which it was found that compared with other common credit scoring models, the proposed model was able to effectively reduce type II error rates and total classification error costs, and improve the AUC, the F1 values (reconciliation average of Recall and Precision), and the G-means. The proposed model was compared with other multi-objective optimization algorithms to further demonstrate that MOPSO is the best approach for cost-sensitive logistic regression credit scoring models.


2019 ◽  
Vol 9 (15) ◽  
pp. 3019 ◽  
Author(s):  
Huan Zheng ◽  
Yanghui Wu

Large-scale wind power access may cause a series of safety and stability problems. Wind power forecasting (WPF) is beneficial to dispatch in advance. In this paper, a new extreme gradient boosting (XGBoost) model with weather similarity analysis and feature engineering is proposed for short-term wind power forecasting. Based on the similarity among historical days’ weather, k-means clustering algorithm is used to divide the samples into several categories. Additionally, we also create some time features and drop unimportant features through feature engineering. For each category, we make predictions using XGBoost. The results of the proposed model are compared with the back propagation neural network (BPNN) and classification and regression tree (CART), random forests (RF), support vector regression (SVR), and a single XGBoost model. It is shown that the proposed model produces the highest forecasting accuracy among all these models.


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 746
Author(s):  
Juan Laborda ◽  
Seyong Ryoo

This paper proposes different classification algorithms—logistic regression, support vector machine, K-nearest neighbors, and random forest—in order to identify which candidates are likely to default for a credit scoring model. Three different feature selection methods are used in order to mitigate the overfitting in the curse of dimensionality of these classification algorithms: one filter method (Chi-squared test and correlation coefficients) and two wrapper methods (forward stepwise selection and backward stepwise selection). The performances of these three methods are discussed using two measures, the mean absolute error and the number of selected features. The methodology is applied for a valuable database of Taiwan. The results suggest that forward stepwise selection yields superior performance in each one of the classification algorithms used. The conclusions obtained are related to those in the literature, and their managerial implications are analyzed.


2014 ◽  
Vol 513-517 ◽  
pp. 4407-4410 ◽  
Author(s):  
Bo Tang ◽  
Sai Bing Qiu

With the development of Chinas economy, credit scoring has become important. The general credit scoring model is to solve the two classification problems, but in real life we often encounter multiple classification problems. This paper proposes a multi-class support vector machine based on genetic algorithm, which can solve multiple classification problems in the behavior assessment model.


Sign in / Sign up

Export Citation Format

Share Document