Credit scoring based on tree-enhanced gradient boosting decision trees

Personal credit scoring is a challenging issue. In recent years, research has shown that machine learning has satisfactory performance in credit scoring. Because of the advantages of feature combination and feature selection, decision trees can match credit data which have high dimension and a complex correlation. Decision trees tend to overfitting yet. eXtreme Gradient Boosting is an advanced gradient enhanced tree that overcomes its shortcomings by integrating tree models. The structure of the model is determined by hyperparameters, which is aimed at the time-consuming and laborious problem of manual tuning, and the optimization method is employed for tuning. As particle swarm optimization describes the particle state and its motion law as continuous real numbers, the hyperparameter applicable to eXtreme Gradient Boosting can find its optimal value in the continuous search space. However, classical particle swarm optimization tends to fall into local optima. To solve this problem, this paper proposes an eXtreme Gradient Boosting credit scoring model that is based on adaptive particle swarm optimization. The swarm split, which is based on the clustering idea and two kinds of learning strategies, is employed to guide the particles to improve the diversity of the subswarms, in order to prevent the algorithm from falling into a local optimum. In the experiment, several traditional machine learning algorithms and popular ensemble learning classifiers, as well as four hyperparameter optimization methods (grid search, random search, tree-structured Parzen estimator, and particle swarm optimization), are considered for comparison. Experiments were performed with four credit datasets and seven KEEL benchmark datasets over five popular evaluation measures: accuracy, error rate (type I error and type II error), Brier score, and F 1 score. Results demonstrate that the proposed model outperforms other models on average. Moreover, adaptive particle swarm optimization performs better than the other hyperparameter optimization strategies.

Download Full-text

Gradient boosting survival tree with applications in credit scoring

Journal of the Operational Research Society ◽

10.1080/01605682.2021.1919035 ◽

2021 ◽

pp. 1-17

Author(s):

Miaojun Bai ◽

Yan Zheng ◽

Yun Shen

Keyword(s):

Credit Scoring ◽

Gradient Boosting ◽

Survival Tree

Download Full-text

Large-Scale Linear RankSVM

Neural Computation ◽

10.1162/neco_a_00571 ◽

2014 ◽

Vol 26 (4) ◽

pp. 781-817 ◽

Cited By ~ 48

Author(s):

Ching-Pei Lee ◽

Chih-Jen Lin

Keyword(s):

Decision Trees ◽

Computational Efficiency ◽

Efficient Algorithm ◽

Large Scale ◽

Learning To Rank ◽

Gradient Boosting ◽

Baseline Model ◽

Nonlinear Methods ◽

Advantages And Disadvantages ◽

Linear Ranksvm

Linear rankSVM is one of the widely used methods for learning to rank. Although its performance may be inferior to nonlinear methods such as kernel rankSVM and gradient boosting decision trees, linear rankSVM is useful to quickly produce a baseline model. Furthermore, following its recent development for classification, linear rankSVM may give competitive performance for large and sparse data. A great deal of works have studied linear rankSVM. The focus is on the computational efficiency when the number of preference pairs is large. In this letter, we systematically study existing works, discuss their advantages and disadvantages, and propose an efficient algorithm. We discuss different implementation issues and extensions with detailed experiments. Finally, we develop a robust linear rankSVM tool for public use.

Download Full-text

Prediction of heart disease using apache spark analysing decision trees and gradient boosting algorithm

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/263/4/042078 ◽

2017 ◽

Vol 263 ◽

pp. 042078

Author(s):

Saryu Chugh ◽

K Arivu Selvan ◽

RK Nadesh

Keyword(s):

Heart Disease ◽

Decision Trees ◽

Apache Spark ◽

Gradient Boosting ◽

Boosting Algorithm

Download Full-text

Application of Ensemble Models in Credit Scoring Models

Business Perspectives and Research ◽

10.1177/2278533718765531 ◽

2018 ◽

Vol 6 (2) ◽

pp. 129-141

Author(s):

Anjali Chopra ◽

Priyanka Bhilare

Keyword(s):

Decision Tree ◽

Empirical Analysis ◽

Credit Scoring ◽

Bank Loan ◽

Machine Learning Techniques ◽

Risk Scores ◽

Gradient Boosting ◽

Loan Default ◽

Linear Discriminant ◽

Learning Techniques

Loan default is a serious problem in banking industries. Banking systems have strong processes in place for identification of customers with poor credit risk scores; however, most of the credit scoring models need to be constantly updated with newer variables and statistical techniques for improved accuracy. While totally eliminating default is almost impossible, loan risk teams, however, minimize the rate of default, thereby protecting banks from the adverse effects of loan default. Credit scoring models have used logistic regression and linear discriminant analysis for identification of potential defaulters. Newer and contemporary machine learning techniques have the ability to outperform classic old age techniques. This article aims to conduct empirical analysis on publically available bank loan dataset to study banking loan default using decision tree as the base learner and comparing it with ensemble tree learning techniques such as bagging, boosting, and random forests. The results of the empirical analysis suggest that the gradient boosting model outperforms the base decision tree learner, indicating that ensemble model works better than individual models. The study recommends that the risk team should adopt newer contemporary techniques to achieve better accuracy resulting in effective loan recovery strategies.

Download Full-text

Machine learning techniques for short-term solar power stations operational mode planning

E3S Web of Conferences ◽

10.1051/e3sconf/20185102004 ◽

2018 ◽

Vol 51 ◽

pp. 02004 ◽

Cited By ~ 3

Author(s):

Stanislav Eroshenko ◽

Alexandra Khalyasmaa ◽

Denis Snegirev

Keyword(s):

Decision Trees ◽

Solar Power ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Operational Mode ◽

Mathematical Methods ◽

Short Term ◽

Advantages And Disadvantages ◽

Power Stations ◽

Operational Forecasting

The paper presents the operational model of very-short term solar power stations (SPS) generation forecasting developed by the authors, based on weather information and built into the existing software product as a separate module for SPS operational forecasting. It was revealed that one of the optimal mathematical methods for SPS generation operational forecasting is gradient boosting on decision trees. The paper describes the basic principles of operational forecasting based on the boosting of decision trees, the main advantages and disadvantages of implementing this algorithm. Moreover, this paper presents an example of this algorithm implementation being analyzed using the example of data analysis and forecasting the generation of the existing SPS.

Download Full-text

Building more accurate decision trees with the additive tree

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1816748116 ◽

2019 ◽

Vol 116 (40) ◽

pp. 19887-19893 ◽

Cited By ~ 15

Author(s):

José Marcio Luna ◽

Efstathios D. Gennatas ◽

Lyle H. Ungar ◽

Eric Eaton ◽

Eric S. Diffenderfer ◽

...

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Ensemble Methods ◽

Predictive Performance ◽

Additive Models ◽

Gradient Boosting ◽

Clear Understanding ◽

High Stakes ◽

Additive Tree ◽

Full Interaction

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.

Download Full-text