A New Hybrid Support Vector Machine Ensemble Classification Model for Credit Scoring

Jian-Rong Yao; Jia-Rui Chen

doi:10.4018/jitr.2019010106

A New Hybrid Support Vector Machine Ensemble Classification Model for Credit Scoring

Journal of Information Technology Research ◽

10.4018/jitr.2019010106 ◽

2019 ◽

Vol 12 (1) ◽

pp. 77-88

Author(s):

Jian-Rong Yao ◽

Jia-Rui Chen

Keyword(s):

Credit Scoring ◽

Ensemble Methods ◽

Ensemble Classification ◽

Classification Model ◽

Support Vector ◽

Ensemble Model ◽

Financial Industry ◽

K Nearest Neighbors ◽

Regression Methods ◽

Vector Machines

Credit scoring plays important role in the financial industry. There are different ways employed in the field of credit scoring, such as the traditional logistic regression, discriminant analysis, and linear regression; methods used in the field of machine learning include neural network, k-nearest neighbors, genetic algorithm, support vector machines (SVM), decision tree, and so on. SVM has been demonstrated with good performance in classification. This paper proposes a new hybrid RF-SVM ensemble model, which uses random forest to select important variables, and employs ensemble methods (bagging and boosting) to aggregate single base models (SVM) as a robust classifier. The experimental results suggest that this new model could achieve effective improvement, and has promising potential in the field of credit scoring.

Download Full-text

Credit scoring with an ensemble deep learning classification methods – comparison with tradicional methods

Facta Universitatis Series Economics and Organization ◽

10.22190/fueo201028001r ◽

2021 ◽

Author(s):

Ognjen Radović ◽

Srđan Marinković ◽

Jelena Radojičić

Keyword(s):

Deep Learning ◽

Decision Trees ◽

Performance Indicators ◽

Credit Scoring ◽

Ensemble Methods ◽

Support Vector ◽

Machine Learning Method ◽

Learning Methods ◽

Vector Machines ◽

Boosted Decision Trees

Credit scoring attracts special attention of financial institutions. In recent years, deep learning methods have been particularly interesting. In this paper, we compare the performance of ensemble deep learning methods based on decision trees with the best traditional method, logistic regression, and the machine learning method benchmark, support vector machines. Each method tests several different algorithms. We use different performance indicators. The research focuses on standard datasets relevant for this type of classification, the Australian and German datasets. The best method, according to the MCC indicator, proves to be the ensemble method with boosted decision trees. Also, on average, ensemble methods prove to be more successful than SVM.

Download Full-text

Tree-based homogeneous ensemble model with feature selection for diabetic retinopathy prediction

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13669 ◽

2020 ◽

Vol 8 (4) ◽

pp. 297-303

Author(s):

Tamunopriye Ene Dagogo-George ◽

Hammed Adeleye Mojeed ◽

Abdulateef Oluwagbemiga Balogun ◽

Modinat Abolore Mabayoje ◽

Shakirat Aderonke Salihu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Diabetic Retinopathy ◽

Ensemble Methods ◽

Predictive Performance ◽

Ensemble Classification ◽

Support Vector ◽

Learning Approaches ◽

Vector Machines ◽

Homogeneous Ensemble

Diabetic Retinopathy (DR) is a condition that emerges from prolonged diabetes, causing severe damages to the eyes. Early diagnosis of this disease is highly imperative as late diagnosis may be fatal. Existing studies employed machine learning approaches with Support Vector Machines (SVM) having the highest performance on most analyses and Decision Trees (DT) having the lowest. However, SVM has been known to suffer from parameter and kernel selection problems, which undermine its predictive capability. Hence, this study presents homogenous ensemble classification methods with DT as the base classifier to optimize predictive performance. Boosting and Bagging ensemble methods with feature selection were employed, and experiments were carried out using Python Scikit Learn libraries on DR datasets extracted from UCI Machine Learning repository. Experimental results showed that Bagged and Boosted DT were better than SVM. Specifically, Bagged DT performed best with accuracy 65.38 %, f-score 0.664, and AUC 0.731, followed by Boosted DT with accuracy 65.42 %, f-score 0.655, and AUC 0.724 when compared to SVM (accuracy 65.16 %, f-score 0.652, and AUC 0.721). These results indicate that DT's predictive performance can be optimized by employing the homogeneous ensemble methods to outperform SVM in predicting DR.

Download Full-text

Comparison of SVM, RF and SGD Methods for Determination of Programmer's Performance Classification Model in Social Media Activities

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1770 ◽

2020 ◽

Vol 4 (2) ◽

pp. 329-335

Author(s):

Rusydi Umar ◽

Imam Riadi ◽

Purwono

Keyword(s):

Social Media ◽

Gradient Descent ◽

Classification Model ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Svm Algorithm ◽

Vector Machines ◽

Performance Patterns ◽

A Company

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.

Download Full-text

Combining Market and Accounting-Based Models for Credit Scoring Using a Classification Scheme Based on Support Vector Machines

SSRN Electronic Journal ◽

10.2139/ssrn.2156220 ◽

2012 ◽

Cited By ~ 1

Author(s):

Dimitrios Niklis ◽

Michael Doumpos ◽

C. Zopounidis

Keyword(s):

Support Vector Machines ◽

Classification Scheme ◽

Credit Scoring ◽

Support Vector ◽

Vector Machines

Download Full-text

Persian Handwritten Number Recognition Using Adapted Framing Feature and Support Vector Machines

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026816500048 ◽

2016 ◽

Vol 15 (01) ◽

pp. 1650004 ◽

Cited By ~ 3

Author(s):

Hedieh Sajedi ◽

Mehran Bahador

Keyword(s):

Support Vector Machines ◽

Recognition Rate ◽

Nearest Neighbors ◽

Polynomial Kernel ◽

Support Vector ◽

K Nearest Neighbors ◽

New Approach ◽

Number Recognition ◽

Vector Machines

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.

Download Full-text

Text Mining Drug-Protein Interactions using an Ensemble of BERT, Sentence BERT and T5 models

10.1101/2021.10.26.465944 ◽

2021 ◽

Author(s):

Xin Sui ◽

Wanjing Wang ◽

Jinfeng Zhang

Keyword(s):

Protein Interactions ◽

Clustering Algorithm ◽

Data Augmentation ◽

Majority Vote ◽

Classification Model ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Test Dataset ◽

Improved Performance ◽

Using Data

In this work, we trained an ensemble model for predicting drug-protein interactions within a sentence based on only its semantics. Our ensembled model was built using three separate models: 1) a classification model using a fine-tuned BERT model; 2) a fine-tuned sentence BERT model that embeds every sentence into a vector; and 3) another classification model using a fine-tuned T5 model. In all models, we further improved performance using data augmentation. For model 2, we predicted the label of a sentence using k-nearest neighbors with its embedded vector. We also explored ways to ensemble these 3 models: a) we used the majority vote method to ensemble these 3 models; and b) based on the HDBSCAN clustering algorithm, we trained another ensemble model using features from all the models to make decisions. Our best model achieved an F-1 score of 0.753 on the BioCreative VII Track 1 test dataset.

Download Full-text

Informative Patterns for Credit Scoring: Support Vector Machines Preselect Data Subsets for Linear Discriminant Analysis

Studies in Classification, Data Analysis, and Knowledge Organization - Classification — the Ubiquitous Challenge ◽

10.1007/3-540-28084-7_52 ◽

2005 ◽

pp. 450-457 ◽

Cited By ~ 3

Author(s):

Ralf Stecking ◽

Klaus B. Schebesch

Keyword(s):

Support Vector Machines ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Credit Scoring ◽

Support Vector ◽

Linear Discriminant ◽

Vector Machines

Download Full-text

Rule Extraction from Neural Networks and Support Vector Machines for Credit Scoring

Intelligent Systems Reference Library - Data Mining: Foundations and Intelligent Paradigms ◽

10.1007/978-3-642-23151-3_13 ◽

2012 ◽

pp. 299-320 ◽

Cited By ~ 2

Author(s):

Rudy Setiono ◽

Bart Baesens ◽

David Martens

Keyword(s):

Neural Networks ◽

Support Vector Machines ◽

Credit Scoring ◽

Rule Extraction ◽

Support Vector ◽

Vector Machines

Download Full-text

Recognition of Gait Activities Using Acceleration Data from A Smartphone and A Wearable Device

Proceedings ◽

10.3390/proceedings2019031060 ◽

2019 ◽

Vol 31 (1) ◽

pp. 60 ◽

Cited By ~ 1

Author(s):

Irvin Hussein Lopez-Nava ◽

Matias Garcia-Constantino ◽

Jesus Favela

Keyword(s):

Assisted Living ◽

Inertial Sensor ◽

Ambient Assisted Living ◽

Human Gait ◽

Support Vector ◽

K Nearest Neighbors ◽

Acceleration Data ◽

Vector Machines ◽

Young Subjects ◽

Physical Spaces

Activity recognition is an important task in many fields, such as ambient intelligence, pervasive healthcare, and surveillance. In particular, the recognition of human gait can be useful to identify the characteristics of the places or physical spaces, such as whether the person is walking on level ground or walking down stairs in which people move. For example, ascending or descending stairs can be a risky activity for older adults because of a possible fall, which can have more severe consequences than if it occurred on a flat surface. While portable and wearable devices have been widely used to detect Activities of Daily Living (ADLs), few research works in the literature have focused on characterizing only actions of human gait. In the present study, a method for recognizing gait activities using acceleration data obtained from a smartphone and a wearable inertial sensor placed on the ankle of people is introduced. The acceleration signals were segmented based on the automatic detection of strides, also called gait cycles. Subsequently, a feature vector of the segmented signals was extracted, which was used to train four classifiers using the Naive Bayes, C4.5, Support Vector Machines, and K-Nearest Neighbors algorithms. Data was collected from seven young subjects who performed five gait activities: (i) going down an incline, (ii) going up an incline, (iii) walking on level ground, (iv) going down stairs, and (v) going up stairs. The results demonstrate the viability of using the proposed method and technologies in ambient assisted living contexts.

Download Full-text