scholarly journals An Ensemble Model of Outlier Detection with Random Tree Data Classification for Financial Credit Scoring Prediction System

2019 ◽  
Vol 8 (3) ◽  
pp. 7108-7114

Recently, Financial Credit Scoring (FCS) becomes an essential process in the finance industry for assessing the creditworth of individual or financial firms. Several artificial intelligence (AI) models have been already presented for the classification of financial data. However, the credit as well as financial data generally comprises unwanted and repetitive features which lead to inefficient classification performance. To overcome this issue, in this paper, a new financial credit scoring (FCS) prediction model is developed by incorporating the process of outlier detection (OD) process (i.e. misclassified instance removal) prior to data classification. The presented FCS model involves two main phases namely misclassified instance removal using Naïve Bayes (NB) Tree and Random Tree (RT) based data classification. The presented NB-RT model is validated using the Benchmark German Credit dataset under different validation parameters. The extensive experiments exhibited that a maximum classification accuracy of 90.3% has been achieved by the proposed NB-RT model.

Author(s):  
Varadharajan. Veeramanikandan ◽  
Mohan Jeyakarthic

Background: Presently, financial credit scoring (CS) is considered as a hottest research topic in the financial sectors which assist to determine the credit value of individual persons as well as organizations. Data mining approaches finds useful in the banking sectors which assist them to determine the proper products or services to the customer with minimal risks. Credit risks linked to the risk of loss and loan defaults are the main source of risk exist in the banking sector. <p> Aim: This paper aims to present an effective credit score prediction model for the banking sectors which assist them to foresee the credible customers who have applied for loan. <p> Methods: An optimal deep neural network (DNN) based framework is employed for credit score data classification by the use of stacked autoencoders (SA). Here, SA is applied for extracting the features from the dataset and undergoes classification by the use of SoftMax layer. Besides, tuning of network also takes place through truncated backpropagation through time (TBPTT) model in a supervised way with the training dataset. <p> Results: The proposed model is tested using a benchmark German credit dataset which includes the necessary variables to determine credit score of a loan applicant. The presented SADNN model offers maximum classification with the higher accuracy of 96.10%, F-score of 97.25% and accuracy of 90.52%. <p> Conclusion: The experimental results pointed out that maximum classification performance is attained by proposed model on all the different aspects. The proposed method helps to determine the capability of a borrower in repaying the loan and compute the credit risks properly.


2021 ◽  
pp. 1-16
Author(s):  
Fang He ◽  
Wenyu Zhang ◽  
Zhijia Yan

Credit scoring has become increasingly important for financial institutions. With the advancement of artificial intelligence, machine learning methods, especially ensemble learning methods, have become increasingly popular for credit scoring. However, the problems of imbalanced data distribution and underutilized feature information have not been well addressed sufficiently. To make the credit scoring model more adaptable to imbalanced datasets, the original model-based synthetic sampling method is extended herein to balance the datasets by generating appropriate minority samples to alleviate class overlap. To enable the credit scoring model to extract inherent correlations from features, a new bagging-based feature transformation method is proposed, which transforms features using a tree-based algorithm and selects features using the chi-square statistic. Furthermore, a two-layer ensemble method that combines the advantages of dynamic ensemble selection and stacking is proposed to improve the classification performance of the proposed multi-stage ensemble model. Finally, four standardized datasets are used to evaluate the performance of the proposed ensemble model using six evaluation metrics. The experimental results confirm that the proposed ensemble model is effective in improving classification performance and is superior to other benchmark models.


2020 ◽  
Author(s):  
Luiz Felipe Vercosa ◽  
Rodrigo Lira ◽  
Rodrigo Monteiro ◽  
Kleber Silva ◽  
Jailson Magalhaes ◽  
...  

Standard features used for Credit Scoring includes mainly registration and financial data from customers. However, exploring new features is of great interest for financial companies, since slight improvements in the person score directly impact the company revenue. In this work, we categorize features from open credit scoring datasets and compare them with the features found in a real company dataset. The company dataset contains unusual feature groups such as historical, geolocation, web behavior, and demographic data. We performed bivariate tests using the Kolmogorov-Smirnov metric and features to assess the performance of the particular feature groups. We also generated a score of good payer by using AdaBoost, Multilayer Perceptron, and XGBoost algorithms. Then, we analyzed the results with different metrics and compared them with the real company results. Our main finding was that these features added a small improvement to current datasets. We also identified the most promising feature groups and noticed that the tuned XGBoost performed better than the company solution in three out of four deployed metrics.


2021 ◽  
Vol 2082 (1) ◽  
pp. 012021
Author(s):  
Bingsen Guo

Abstract Data classification is one of the most critical issues in data mining with a large number of real-life applications. In many practical classification issues, there are various forms of anomalies in the real dataset. For example, the training set contains outliers, often enough to confuse the classifier and reduce its ability to learn from the data. In this paper, we propose a new data classification improvement approach based on kernel clustering. The proposed method can improve the classification performance by optimizing the training set. We first use the existing kernel clustering method to cluster the training set and optimize it based on the similarity between the training samples in each class and the corresponding class center. Then, the optimized reliable training set is trained to the standard classifier in the kernel space to classify each query sample. Extensive performance analysis shows that the proposed method achieves high performance, thus improving the classifier’s effectiveness.


2018 ◽  
Vol 5 (2) ◽  
pp. 175-185
Author(s):  
Akhmad Syukron ◽  
Agus Subekti

                                         AbstrakPenilaian kredit telah menjadi salah satu cara utama bagi sebuah lembaga keuangan untuk menilai resiko kredit,  meningkatkan arus kas, mengurangi kemungkinan resiko dan membuat keputusan manajerial. Salah satu permasalahan yang dihadapai pada penilaian kredit yaitu adanya ketidakseimbangan distribusi dataset. Metode untuk mengatasi ketidakseimbangan kelas yaitu dengan metode resampling, seperti menggunakan Oversampling, undersampling dan hibrida yaitu dengan menggabungkan kedua pendekatan sampling. Metode yang diusulkan pada penelitian ini adalah penerapan metode Random Over-Under Sampling Random Forest untuk meningkatkan kinerja akurasi klasifikasi penilaian kredit pada dataset German Credit.  Hasil pengujian menunjukan bahwa klasifikasi tanpa melalui proses resampling menghasilkan kinerja akurasi rata-rata 70 % pada semua classifier. Metode Random Forest memiliki nilai akurasi yang lebih baik dibandingkan dengan beberapa metode lainnya dengan nilai akurasi sebesar 0,76 atau 76%. Sedangkan klasifikasi dengan penerapan metode Random Over-under sampling Random Forest  dapat meningkatkan kinerja akurasi sebesar 14,1% dengan nilai akurasi sebesar 0,901 atau 90,1 %. Hasil penelitian menunjukan bahwa penerapan  resampling dengan metode Random Over-Under Sampling pada algoritma Random Forest dapat meningkatkan kinerja akurasi secara efektif pada klasifikasi  tidak seimbang untuk penilaian kredit pada dataset German Credit. Kata kunci: Penilaian Kredit, Random Forest, Klasifikasi, ketidakseimbangan kelas, Random Over-Under Sampling                                                  AbstractCredit scoring has become one of the main ways for a financial institution to assess credit risk, improve cash flow, reduce the possibility of risk and make managerial decisions. One of the problems faced by credit scoring is the imbalance in the distribution of datasets. The method to overcome class imbalances is the resampling method, such as using Oversampling, undersampling and hybrids by combining both sampling approaches. The method proposed in this study is the application of the Random Over-Under Sampling Random Forest method to improve the accuracy of the credit scoring classification performance on German Credit dataset. The test results show that the classification without going through the resampling process results in an average accuracy performance of 70% for all classifiers. The Random Forest method has a better accuracy value compared to some other methods with an accuracy value of 0.76 or 76%. While classification by applying the Random Over-under sampling + Random Forest method can improve accuracy performance 14.1% with an accuracy value of 0.901 or 90.1%. The results showed that the application of resampling using Random Over-Under Sampling method in the Random Forest algorithm can improve accuracy performance effectively on an unbalanced classification for credit scoring on German Credit dataset. Keywords: Imbalance Class, Credit Scoring, Random Forest, Classification, Resampling


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Pranith Kumar Roy ◽  
Krishnendu Shaw

AbstractSmall- and medium-sized enterprises (SMEs) have a crucial influence on the economic development of every nation, but access to formal finance remains a barrier. Similarly, financial institutions encounter challenges in the assessment of SMEs’ creditworthiness for the provision of financing. Financial institutions employ credit scoring models to identify potential borrowers and to determine loan pricing and collateral requirements. SMEs are perceived as unorganized in terms of financial data management compared to large corporations, making the assessment of credit risk based on inadequate financial data a cause for financial institutions’ concern. The majority of existing models are data-driven and have faced criticism for failing to meet their assumptions. To address the issue of limited financial record keeping, this study developed and validated a system to predict SMEs’ credit risk by introducing a multicriteria credit scoring model. The model was constructed using a hybrid best–worst method (BWM) and the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). Initially, the BWM determines the weight criteria, and TOPSIS is applied to score SMEs. A real-life case study was examined to demonstrate the effectiveness of the proposed model, and a sensitivity analysis varying the weight of the criteria was performed to assess robustness against unpredictable financial situations. The findings indicated that SMEs’ credit history, cash liquidity, and repayment period are the most crucial factors in lending, followed by return on capital, financial flexibility, and integrity. The proposed credit scoring model outperformed the existing commercial model in terms of its accuracy in predicting defaults. This model could assist financial institutions, providing a simple means for identifying potential SMEs to grant credit, and advance further research using alternative approaches.


Sign in / Sign up

Export Citation Format

Share Document