Credit scoring with an ensemble deep learning classification methods – comparison with tradicional methods

Credit scoring attracts special attention of financial institutions. In recent years, deep learning methods have been particularly interesting. In this paper, we compare the performance of ensemble deep learning methods based on decision trees with the best traditional method, logistic regression, and the machine learning method benchmark, support vector machines. Each method tests several different algorithms. We use different performance indicators. The research focuses on standard datasets relevant for this type of classification, the Australian and German datasets. The best method, according to the MCC indicator, proves to be the ensemble method with boosted decision trees. Also, on average, ensemble methods prove to be more successful than SVM.

Download Full-text

Boosted Decision Trees for Credit Scoring

10.4018/978-1-7998-8609-9.ch013 ◽

2022 ◽

pp. 270-292

Author(s):

Luca Di Persio ◽

Alberto Borelli

Keyword(s):

Support Vector Machines ◽

Decision Trees ◽

Prediction Accuracy ◽

Credit Scoring ◽

Support Vector ◽

Scoring Model ◽

Vector Machines ◽

Boosted Decision Trees ◽

The One ◽

Credit Scoring Model

The chapter developed a tree-based method for credit scoring. It is useful because it helps lenders decide whether to grant or reject credit to their applicants. In particular, it proposes a credit scoring model based on boosted decision trees which is a technique consisting of an ensemble of several decision trees to form a single classifier. The analysis used three different publicly available datasets, and then the prediction accuracy of boosted decision trees is compared with the one of support vector machines method.

Download Full-text

A New Hybrid Support Vector Machine Ensemble Classification Model for Credit Scoring

Journal of Information Technology Research ◽

10.4018/jitr.2019010106 ◽

2019 ◽

Vol 12 (1) ◽

pp. 77-88

Author(s):

Jian-Rong Yao ◽

Jia-Rui Chen

Keyword(s):

Credit Scoring ◽

Ensemble Methods ◽

Ensemble Classification ◽

Classification Model ◽

Support Vector ◽

Ensemble Model ◽

Financial Industry ◽

K Nearest Neighbors ◽

Regression Methods ◽

Vector Machines

Credit scoring plays important role in the financial industry. There are different ways employed in the field of credit scoring, such as the traditional logistic regression, discriminant analysis, and linear regression; methods used in the field of machine learning include neural network, k-nearest neighbors, genetic algorithm, support vector machines (SVM), decision tree, and so on. SVM has been demonstrated with good performance in classification. This paper proposes a new hybrid RF-SVM ensemble model, which uses random forest to select important variables, and employs ensemble methods (bagging and boosting) to aggregate single base models (SVM) as a robust classifier. The experimental results suggest that this new model could achieve effective improvement, and has promising potential in the field of credit scoring.

Download Full-text

Similarity-Based Summarization of Music Files for Support Vector Machines

Complexity ◽

10.1155/2018/1935938 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Jan Jakubik ◽

Halina Kwaśnicka

Keyword(s):

Neural Network ◽

Deep Learning ◽

Support Vector Machines ◽

Loss Function ◽

Expert Knowledge ◽

Feature Learning ◽

Compact Representation ◽

Support Vector ◽

Learning Methods ◽

Vector Machines

Automatic retrieval of music information is an active area of research in which problems such as automatically assigning genres or descriptors of emotional content to music emerge. Recent advancements in the area rely on the use of deep learning, which allows researchers to operate on a low-level description of the music. Deep neural network architectures can learn to build feature representations that summarize music files from data itself, rather than expert knowledge. In this paper, a novel approach to applying feature learning in combination with support vector machines to musical data is presented. A spectrogram of the music file, which is too complex to be processed by SVM, is first reduced to a compact representation by a recurrent neural network. An adjustment to loss function of the network is proposed so that the network learns to build a representation space that replicates a certain notion of similarity between annotations, rather than to explicitly make predictions. We evaluate the approach on five datasets, focusing on emotion recognition and complementing it with genre classification. In experiments, the proposed loss function adjustment is shown to improve results in classification and regression tasks, but only when the learned similarity notion corresponds to a kernel function employed within the SVM. These results suggest that adjusting deep learning methods to build data representations that target a specific classifier or regressor can open up new perspectives for the use of standard machine learning methods in music domain.

Download Full-text

A Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Ligand-Target Predictions

10.26434/chemrxiv.11526132 ◽

2020 ◽

Author(s):

Lewis Mervin ◽

Avid M. Afzal ◽

Ola Engkvist ◽

Andreas Bender

Keyword(s):

Target Prediction ◽

Support Vector ◽

Machine Learning Method ◽

Learning Method ◽

Protein Target ◽

Bioactivity Prediction ◽

Vector Machines ◽

Scaling Methods ◽

Data Points ◽

Compound Target

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.

Download Full-text

Combining Market and Accounting-Based Models for Credit Scoring Using a Classification Scheme Based on Support Vector Machines

SSRN Electronic Journal ◽

10.2139/ssrn.2156220 ◽

2012 ◽

Cited By ~ 1

Author(s):

Dimitrios Niklis ◽

Michael Doumpos ◽

C. Zopounidis

Keyword(s):

Support Vector Machines ◽

Classification Scheme ◽

Credit Scoring ◽

Support Vector ◽

Vector Machines

Download Full-text

Identifying Cancer Targets Based on Machine Learning Methods via Chou’s 5-steps Rule and General Pseudo Components

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666191016155543 ◽

2019 ◽

Vol 19 (25) ◽

pp. 2301-2317 ◽

Cited By ~ 2

Author(s):

Ruirui Liang ◽

Jiayang Xie ◽

Chi Zhang ◽

Mengying Zhang ◽

Hai Huang ◽

...

Keyword(s):

Machine Learning ◽

Growth Rate ◽

Big Data ◽

Human Genome Project ◽

Genome Project ◽

Support Vector ◽

Successful Implementation ◽

Learning Methods ◽

Machine Learning Methods ◽

Vector Machines

In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of ‘big data’ derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.

Download Full-text

Why Deep Learning Is More Efficient than Support Vector Machines, and How it is Related to Sparsity Techniques in Signal Processing

Proceedings of the 2020 4th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence ◽

10.1145/3396474.3396478 ◽

2020 ◽

Author(s):

Laxman Bokati ◽

Olga Kosheleva ◽

Vladik Kreinovich ◽

Anibal Sosa

Keyword(s):

Signal Processing ◽

Deep Learning ◽

Support Vector Machines ◽

Support Vector ◽

Vector Machines

Download Full-text

A Hybrid Prognostics Deep Learning Model for Remaining Useful Life Prediction

Electronics ◽

10.3390/electronics10010039 ◽

2020 ◽

Vol 10 (1) ◽

pp. 39

Author(s):

Zhiyuan Xie ◽

Shichang Du ◽

Jun Lv ◽

Yafei Deng ◽

Shiyao Jia

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Model ◽

Recurrent Network ◽

Remaining Useful Life ◽

Support Vector ◽

Second Phase ◽

Learning Methods ◽

Useful Life ◽

Deep Learning Model

Remaining Useful Life (RUL) prediction is significant in indicating the health status of the sophisticated equipment, and it requires historical data because of its complexity. The number and complexity of such environmental parameters as vibration and temperature can cause non-linear states of data, making prediction tremendously difficult. Conventional machine learning models such as support vector machine (SVM), random forest, and back propagation neural network (BPNN), however, have limited capacity to predict accurately. In this paper, a two-phase deep-learning-model attention-convolutional forget-gate recurrent network (AM-ConvFGRNET) for RUL prediction is proposed. The first phase, forget-gate convolutional recurrent network (ConvFGRNET) is proposed based on a one-dimensional analog long short-term memory (LSTM), which removes all the gates except the forget gate and uses chrono-initialized biases. The second phase is the attention mechanism, which ensures the model to extract more specific features for generating an output, compensating the drawbacks of the FGRNET that it is a black box model and improving the interpretability. The performance and effectiveness of AM-ConvFGRNET for RUL prediction is validated by comparing it with other machine learning methods and deep learning methods on the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset and a dataset of ball screw experiment.

Download Full-text

Comparative analysis of Kernel-based versus BFGS-ANN and deep learning methods in monthly reference evaporation estimation

10.5194/hess-2020-224 ◽

2020 ◽

Author(s):

Mohammad Taghi Sattari ◽

Halit Apaydin ◽

Shahab Shamshirband ◽

Amir Mosavi

Keyword(s):

Deep Learning ◽

Minimum Temperature ◽

Meteorological Parameters ◽

Short Term Memory ◽

Sunshine Duration ◽

Gaussian Process Regression ◽

Support Vector ◽

Ann Model ◽

Learning Methods ◽

Average Maximum

Abstract. Proper estimation of the reference evapotranspiration (ET0) amount is an indispensable matter for agricultural water management in the efficient use of water. The aim of study is to estimate the amount of ET0 with a different machine and deep learning methods by using minimum meteorological parameters in the Corum region which is an arid and semi-arid climate with an important agricultural center of Turkey. In this context, meteorological variables of average, maximum and minimum temperature, sunshine duration, wind speed, average, maximum, and minimum relative humidity are used as input data monthly. Two different kernel-based (Gaussian Process Regression (GPR) and Support Vector Regression (SVR)) methods, BFGS-ANN and Long short-term memory models were used to estimate ET0 amounts in 10 different combinations. According to the results obtained, all four methods used predicted ET0 amounts in acceptable accuracy and error levels. BFGS-ANN model showed higher success than the others. In kernel-based GPR and SVR methods, Pearson VII function-based universal kernel was the most successful kernel function. Besides, the scenario that is related to temperature in all scenarios used, including average temperature, maximum and minimum temperature, and sunshine duration gave the best results. The second-best scenario was the one that covers only the sunshine duration. In this case, the ANN (BFGS-ANN) model, which is optimized with the BFGS method that uses only the sunshine duration, can be estimated with the 0.971 correlation coefficient of ET0 without the need for other meteorological parameters.

Download Full-text

Developing a Process for the Analysis of User Journeys and the Prediction of Dropout in Digital Health Interventions: Machine Learning Approach (Preprint)

10.2196/preprints.17738 ◽

2020 ◽

Author(s):

Vincent Bremer ◽

Philip I Chow ◽

Burkhardt Funk ◽

Frances P Thorndike ◽

Lee M Ritterband

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Behavioral Therapy ◽

Digital Health ◽

Area Under The Curve ◽

Prediction Performance ◽

Health Interventions ◽

Drop Out ◽

Support Vector ◽

Boosted Decision Trees

BACKGROUND User dropout is a widespread concern in the delivery and evaluation of digital (ie, web and mobile apps) health interventions. Researchers have yet to fully realize the potential of the large amount of data generated by these technology-based programs. Of particular interest is the ability to predict who will drop out of an intervention. This may be possible through the analysis of user journey data—self-reported as well as system-generated data—produced by the path (or journey) an individual takes to navigate through a digital health intervention. OBJECTIVE The purpose of this study is to provide a step-by-step process for the analysis of user journey data and eventually to predict dropout in the context of digital health interventions. The process is applied to data from an internet-based intervention for insomnia as a way to illustrate its use. The completion of the program is contingent upon completing 7 sequential cores, which include an initial tutorial core. Dropout is defined as not completing the seventh core. METHODS Steps of user journey analysis, including data transformation, feature engineering, and statistical model analysis and evaluation, are presented. Dropouts were predicted based on data from 151 participants from a fully automated web-based program (Sleep Healthy Using the Internet) that delivers cognitive behavioral therapy for insomnia. Logistic regression with L1 and L2 regularization, support vector machines, and boosted decision trees were used and evaluated based on their predictive performance. Relevant features from the data are reported that predict user dropout. RESULTS Accuracy of predicting dropout (area under the curve [AUC] values) varied depending on the program core and the machine learning technique. After model evaluation, boosted decision trees achieved AUC values ranging between 0.6 and 0.9. Additional handcrafted features, including time to complete certain steps of the intervention, time to get out of bed, and days since the last interaction with the system, contributed to the prediction performance. CONCLUSIONS The results support the feasibility and potential of analyzing user journey data to predict dropout. Theory-driven handcrafted features increased the prediction performance. The ability to predict dropout at an individual level could be used to enhance decision making for researchers and clinicians as well as inform dynamic intervention regimens.

Download Full-text