Building a Credit Scoring Model Based on Data Mining Approaches

Author(s):  
Jasmina Nalić ◽  
Goran Martinovic

Nowadays, one of the biggest challenges in banking sector, certainly, is assessment of the client’s creditworthiness. In order to improve the decision-making process and risk management, banks resort to using data mining techniques for hidden patterns recognition within a wide data. The main objective of this study is to build a high-performance customized credit scoring model. The model named Reliable client is based on Bank’s real dataset and originally built by applying four different classification algorithms: decision tree (DT), naive Bayes (NB), generalized linear model (GLM) and support vector machine (SVM). Since it showed the greatest results, but also seemed as the most appropriate algorithm, the adopted model is based on GLM algorithm. The results of this model are presented based on many performance measures that showed great predictive confidence and accuracy, but we also demonstrated significant impact of data pre-processing on model performance. Statistical analysis of the model identified the most significant parameters on the model outcome. In the end, created credit scoring model was evaluated using another set of real data of the same Bank.

2017 ◽  
Vol 13 (1) ◽  
pp. 51 ◽  
Author(s):  
Oriol Amat ◽  
Raffaele Manini ◽  
Marcos Antón Renart

Purpose: The study herein develops and tests a credit scoring model which can help financial institutions in assessing credit requests. Design/methodology/approach: The empirical study has the objective of answering two questions: (1) Which ratios better discriminate the companies based on their being solvent or insolvent? and (2) What is the relative importance of these ratios? To do this, several statistical techniques with a multifactorial focus have been used (Multivariate Analysis of Variance, Linear Discriminant Analysis, Logit and Probit Models). Several samples of companies have been used in order to obtain and to test the model. Findings: Through the application of several statistical techniques, the credit scoring model has been proved to be effective in discriminating between good and bad creditors. Research limitations:  This study focuses on manufacturing, commercial and services companies of all sizes in Spain; Therefore, the conclusions may differ for other geographical locations.Practical implications:  Because credit is one of the main drivers of growth, a solid credit scoring model can help financial institutions assessing to whom to grant credit and to whom not to grant credit.Social implications: Because of the growing importance of credit for our society and the fear of granting it due to the latest financial turmoil, a solid credit scoring model can strengthen the trust toward the financial institutions assessment’s. Originality/value: There is already a stream of literature related to credit scoring. However, this paper focuses on Spanish firms and proves the results of our model based on real data. The application of the model to detect the probability of default in loans is original.


2021 ◽  
Vol 73 (7) ◽  
pp. 41-44
Author(s):  
Y.S. Zhieru

The final stage of constructing a logistic regression model is checking its validity and testing it on real data. The degree of validity of a logistic regression model is evidenced by its ability to correctly classify borrowers, the model's ability to distinguish "good" borrowers from "bad" borrowers.


2012 ◽  
Vol 235 ◽  
pp. 419-422 ◽  
Author(s):  
Bo Tang ◽  
Sai Bing Qiu

The general credit scoring model is to solve the two classification problems, but in real life we often encounter multiple classification problems. This paper proposes a multi-class support vector machine, which can solve multiple classification problems in the behavior assessment model.


Author(s):  
Wirot Yotsawat ◽  
Pakaket Wattuya ◽  
Anongnart Srivihok

<span>Several credit-scoring models have been developed using ensemble classifiers in order to improve the accuracy of assessment. However, among the ensemble models, little consideration has been focused on the hyper-parameters tuning of base learners, although these are crucial to constructing ensemble models. This study proposes an improved credit scoring model based on the extreme gradient boosting (XGB) classifier using Bayesian hyper-parameters optimization (XGB-BO). The model comprises two steps. Firstly, data pre-processing is utilized to handle missing values and scale the data. Secondly, Bayesian hyper-parameter optimization is applied to tune the hyper-parameters of the XGB classifier and used to train the model. The model is evaluated on four widely public datasets, i.e., the German, Australia, lending club, and Polish datasets. Several state-of-the-art classification algorithms are implemented for predictive comparison with the proposed method. The results of the proposed model showed promising results, with an improvement in accuracy of 4.10%, 3.03%, and 2.76% on the German, lending club, and Australian datasets, respectively. The proposed model outperformed commonly used techniques, e.g., decision tree, support vector machine, neural network, logistic regression, random forest, and bagging, according to the evaluation results. The experimental results confirmed that the XGB-BO model is suitable for assessing the creditworthiness of applicants.</span>


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 746
Author(s):  
Juan Laborda ◽  
Seyong Ryoo

This paper proposes different classification algorithms—logistic regression, support vector machine, K-nearest neighbors, and random forest—in order to identify which candidates are likely to default for a credit scoring model. Three different feature selection methods are used in order to mitigate the overfitting in the curse of dimensionality of these classification algorithms: one filter method (Chi-squared test and correlation coefficients) and two wrapper methods (forward stepwise selection and backward stepwise selection). The performances of these three methods are discussed using two measures, the mean absolute error and the number of selected features. The methodology is applied for a valuable database of Taiwan. The results suggest that forward stepwise selection yields superior performance in each one of the classification algorithms used. The conclusions obtained are related to those in the literature, and their managerial implications are analyzed.


Sign in / Sign up

Export Citation Format

Share Document