Gradient boosting survival tree with applications in credit scoring

Application of Ensemble Models in Credit Scoring Models

Business Perspectives and Research ◽

10.1177/2278533718765531 ◽

2018 ◽

Vol 6 (2) ◽

pp. 129-141

Author(s):

Anjali Chopra ◽

Priyanka Bhilare

Keyword(s):

Decision Tree ◽

Empirical Analysis ◽

Credit Scoring ◽

Bank Loan ◽

Machine Learning Techniques ◽

Risk Scores ◽

Gradient Boosting ◽

Loan Default ◽

Linear Discriminant ◽

Learning Techniques

Loan default is a serious problem in banking industries. Banking systems have strong processes in place for identification of customers with poor credit risk scores; however, most of the credit scoring models need to be constantly updated with newer variables and statistical techniques for improved accuracy. While totally eliminating default is almost impossible, loan risk teams, however, minimize the rate of default, thereby protecting banks from the adverse effects of loan default. Credit scoring models have used logistic regression and linear discriminant analysis for identification of potential defaulters. Newer and contemporary machine learning techniques have the ability to outperform classic old age techniques. This article aims to conduct empirical analysis on publically available bank loan dataset to study banking loan default using decision tree as the base learner and comparing it with ensemble tree learning techniques such as bagging, boosting, and random forests. The results of the empirical analysis suggest that the gradient boosting model outperforms the base decision tree learner, indicating that ensemble model works better than individual models. The study recommends that the risk team should adopt newer contemporary techniques to achieve better accuracy resulting in effective loan recovery strategies.

Get full-text (via PubEx)

Step-wise multi-grained augmented gradient boosting decision trees for credit scoring

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2020.104036 ◽

2021 ◽

Vol 97 ◽

pp. 104036

Author(s):

Wanan Liu ◽

Hong Fan ◽

Min Xia

Keyword(s):

Decision Trees ◽

Credit Scoring ◽

Gradient Boosting

Get full-text (via PubEx)

Comparative analysis of a deep learning approach with various classification techniques for credit score computation

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200721004720 ◽

2020 ◽

Vol 13 ◽

Author(s):

Arvind Pandey ◽

Shipra Shukla ◽

Krishna Kumar Mohbey

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Comparative Analysis ◽

Predictive Accuracy ◽

Credit Scoring ◽

Gradient Boosting ◽

Learning Approach ◽

Probit Regression ◽

Credit Score ◽

Machine Learning Approach

Background: Large financial companies are perpetually creating and updating customer scoring techniques. From a risk management view, this research for the predictive accuracy of probability is of vital importance than the traditional binary result of classification, i.e., non-credible and credible customers. The customer's default payment in Taiwan is explored for the case study. Objective: The aim is to audit the comparison between the predictive accuracy of the probability of default with various techniques of statistics and machine learning. Method: In this paper, nine predictive models are compared from which the results of the six models are taken into consideration. Deep learning-based H2O, XGBoost, logistic regression, gradient boosting, naïve Bayes, logit model, and probit regression comparative analysis is performed. The software tools such as R and SAS (university edition) is employed for machine learning and statistical model evaluation. Results: Through the experimental study, we demonstrate that XGBoost performs better than other AI and ML algorithms. Conclusion: Machine learning approach such as XGBoost effectively used for credit scoring, among other data mining and statistical approaches.

Get full-text (via PubEx)

Improved credit scoring model using XGBoost with Bayesian hyper-parameter optimization

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i6.pp5477-5487 ◽

2021 ◽

Vol 11 (6) ◽

pp. 5477

Author(s):

Wirot Yotsawat ◽

Pakaket Wattuya ◽

Anongnart Srivihok

Keyword(s):

Parameter Optimization ◽

Missing Values ◽

Credit Scoring ◽

Gradient Boosting ◽

Support Vector ◽

Scoring Model ◽

Ensemble Models ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

Credit Scoring Model

<span>Several credit-scoring models have been developed using ensemble classifiers in order to improve the accuracy of assessment. However, among the ensemble models, little consideration has been focused on the hyper-parameters tuning of base learners, although these are crucial to constructing ensemble models. This study proposes an improved credit scoring model based on the extreme gradient boosting (XGB) classifier using Bayesian hyper-parameters optimization (XGB-BO). The model comprises two steps. Firstly, data pre-processing is utilized to handle missing values and scale the data. Secondly, Bayesian hyper-parameter optimization is applied to tune the hyper-parameters of the XGB classifier and used to train the model. The model is evaluated on four widely public datasets, i.e., the German, Australia, lending club, and Polish datasets. Several state-of-the-art classification algorithms are implemented for predictive comparison with the proposed method. The results of the proposed model showed promising results, with an improvement in accuracy of 4.10%, 3.03%, and 2.76% on the German, lending club, and Australian datasets, respectively. The proposed model outperformed commonly used techniques, e.g., decision tree, support vector machine, neural network, logistic regression, random forest, and bagging, according to the evaluation results. The experimental results confirmed that the XGB-BO model is suitable for assessing the creditworthiness of applicants.</span>

Get full-text (via PubEx)

An Explanation Framework for Interpretable Credit Scoring

International Journal of Artificial Intelligence & Applications ◽

10.5121/ijaia.2021.12102 ◽

2021 ◽

Vol 12 (1) ◽

pp. 19-38

Author(s):

Lara Marie Demajo ◽

Vince Vella ◽

Alexiei Dingli

Keyword(s):

Credit Scoring ◽

Imbalanced Data ◽

Scoring Systems ◽

Gradient Boosting ◽

Home Equity ◽

General Data Protection Regulation ◽

Box Models ◽

Extreme Gradient Boosting ◽

Feature Based ◽

The Right

With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the biggest obstacle in most AI systems is their lack of interpretability. This deficiency of transparency limits their application in different domains including credit scoring. Credit scoring systems help financial experts make better decisions regarding whether or not to accept a loan application so that loans with a high probability of default are not accepted. Apart from the noisy and highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the `right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic decisions are understandable and coherent. A recently introduced concept is eXplainable AI (XAI), which focuses on making black-box models more interpretable. In this work, we present a credit scoring model that is both accurate and interpretable. For classification, state-of-the-art performance on the Home Equity Line of Credit (HELOC) and Lending Club (LC) Datasets is achieved using the Extreme Gradient Boosting (XGBoost) model. The model is then further enhanced with a 360-degree explanation framework, which provides different explanations (i.e. global, local feature-based and local instance- based) that are required by different people in different situations. Evaluation through the use of functionally-grounded, application-grounded and human-grounded analysis shows that the explanations provided are simple and consistent as well as correct, effective, easy to understand, sufficiently detailed and trustworthy.

Get full-text (via PubEx)

Explainable AI for Interpretable Credit Scoring

10.5121/csit.2020.101516 ◽

2020 ◽

Author(s):

Lara Marie Demajo ◽

Vince Vella ◽

Alexiei Dingli

Keyword(s):

Credit Scoring ◽

Imbalanced Data ◽

Gradient Boosting ◽

Home Equity ◽

General Data Protection Regulation ◽

Box Models ◽

Extreme Gradient Boosting ◽

Explainable Ai ◽

Feature Based ◽

The Right

With the ever-growing achievements in Artificial Intelligence (AI) and the recent boosted enthusiasm in Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. Credit scoring helps financial experts make better decisions regarding whether or not to accept a loan application, such that loans with a high probability of default are not accepted. Apart from the noisy and highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the `right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic decisions are understandable and coherent. An interesting concept that has been recently introduced is eXplainable AI (XAI), which focuses on making black-box models more interpretable. In this work, we present a credit scoring model that is both accurate and interpretable. For classification, state-of-the-art performance on the Home Equity Line of Credit (HELOC) and Lending Club (LC) Datasets is achieved using the Extreme Gradient Boosting (XGBoost) model. The model is then further enhanced with a 360-degree explanation framework, which provides different explanations (i.e. global, local feature-based and local instance-based) that are required by different people in different situations. Evaluation through the use of functionallygrounded, application-grounded and human-grounded analysis show that the explanations provided are simple, consistent as well as satisfy the six predetermined hypotheses testing for correctness, effectiveness, easy understanding, detail sufficiency and trustworthiness.

Get full-text (via PubEx)

Multi-grained and multi-layered gradient boosting decision tree for credit scoring

Applied Intelligence ◽

10.1007/s10489-021-02715-6 ◽

2021 ◽

Author(s):

Wan’an Liu ◽

Hong Fan ◽

Min Xia

Keyword(s):

Decision Tree ◽

Credit Scoring ◽

Gradient Boosting

Get full-text (via PubEx)

Credit scoring based on tree-enhanced gradient boosting decision trees

Expert Systems with Applications ◽

10.1016/j.eswa.2021.116034 ◽

2021 ◽

pp. 116034

Author(s):

Wanan Liu ◽

Hong Fan ◽

Meng Xia

Keyword(s):

Decision Trees ◽

Credit Scoring ◽

Gradient Boosting

Get full-text (via PubEx)

A DYNAMIC CREDIT SCORING MODEL BASED ON SURVIVAL GRADIENT BOOSTING DECISION TREE APPROACH

Technological and Economic Development of Economy ◽

10.3846/tede.2020.13997 ◽

2020 ◽

Vol 0 (0) ◽

pp. 1-24

Author(s):

Yufei Xia ◽

Lingyun He ◽

Yinguo Li ◽

Yating Fu ◽

Yixin Xu

Keyword(s):

Survival Analysis ◽

Decision Tree ◽

Credit Scoring ◽

Classification Problem ◽

Survival Models ◽

Gradient Boosting ◽

Dynamic Prediction ◽

Misclassification Cost ◽

Scoring Model ◽

Credit Scoring Model

Credit scoring, which is typically transformed into a classification problem, is a powerful tool to manage credit risk since it forecasts the probability of default (PD) of a loan application. However, there is a growing trend of integrating survival analysis into credit scoring to provide a dynamic prediction on PD over time and a clear explanation on censoring. A novel dynamic credit scoring model (i.e., SurvXGBoost) is proposed based on survival gradient boosting decision tree (GBDT) approach. Our proposal, which combines survival analysis and GBDT approach, is expected to enhance predictability relative to statistical survival models. The proposed method is compared with several common benchmark models on a real-world consumer loan dataset. The results of out-of-sample and out-of-time validation indicate that SurvXGBoost outperform the benchmarks in terms of predictability and misclassification cost. The incorporation of macroeconomic variables can further enhance performance of survival models. The proposed SurvXGBoost meanwhile maintains some interpretability since it provides information on feature importance.

Get full-text (via PubEx)

Loan default prediction of Chinese P2P market: a machine learning methodology

Scientific Reports ◽

10.1038/s41598-021-98361-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Junhui Xu ◽

Zekai Lu ◽

Ying Xie

Keyword(s):

Machine Learning ◽

Credit Scoring ◽

Gradient Boosting ◽

Or Education ◽

Risk Regulation ◽

Loan Default ◽

The Sustainable Development ◽

Factors Affecting ◽

Extreme Gradient Boosting ◽

Kappa Value

AbstractRepayment failures of borrowers have greatly affected the sustainable development of the peer-to-peer (P2P) lending industry. The latest literature reveals that existing risk evaluation systems may ignore important signals and risk factors affecting P2P repayment. In our study, we applied four machine learning methods (random forest (RF), extreme gradient boosting tree (XGBT), gradient boosting model (GBM), and neural network (NN)) to predict important factors affecting repayment by utilizing data from Renrendai.com in China from Thursday, January 1, 2015, to Tuesday, June 30, 2015. The results showed that borrowers who have passed video, mobile phone, job, residence or education level verification are more likely to default on loan repayment, whereas those who have passed identity and asset certification are less likely to default on loans. The accuracy and kappa value of the four methods all exceed 90%, and RF is superior to the other classification models. Our findings demonstrate important techniques for borrower screening by P2P companies and risk regulation by regulatory agencies. Our methodology and findings will help regulators, banks and creditors combat current financial disasters caused by the coronavirus disease 2019 (COVID-19) pandemic by addressing various financial risks and translating credit scoring improvements.

Get full-text (via PubEx)