A New Hybrid Support Vector Machine Ensemble Classification Model for Credit Scoring

2019 ◽  
Vol 12 (1) ◽  
pp. 77-88
Author(s):  
Jian-Rong Yao ◽  
Jia-Rui Chen

Credit scoring plays important role in the financial industry. There are different ways employed in the field of credit scoring, such as the traditional logistic regression, discriminant analysis, and linear regression; methods used in the field of machine learning include neural network, k-nearest neighbors, genetic algorithm, support vector machines (SVM), decision tree, and so on. SVM has been demonstrated with good performance in classification. This paper proposes a new hybrid RF-SVM ensemble model, which uses random forest to select important variables, and employs ensemble methods (bagging and boosting) to aggregate single base models (SVM) as a robust classifier. The experimental results suggest that this new model could achieve effective improvement, and has promising potential in the field of credit scoring.

Author(s):  
Ognjen Radović ◽  
Srđan Marinković ◽  
Jelena Radojičić

Credit scoring attracts special attention of financial institutions. In recent years, deep learning methods have been particularly interesting. In this paper, we compare the performance of ensemble deep learning methods based on decision trees with the best traditional method, logistic regression, and the machine learning method benchmark, support vector machines. Each method tests several different algorithms. We use different performance indicators. The research focuses on standard datasets relevant for this type of classification, the Australian and German datasets. The best method, according to the MCC indicator, proves to be the ensemble method with boosted decision trees. Also, on average, ensemble methods prove to be more successful than SVM.


2020 ◽  
Vol 8 (4) ◽  
pp. 297-303
Author(s):  
Tamunopriye Ene Dagogo-George ◽  
Hammed Adeleye Mojeed ◽  
Abdulateef Oluwagbemiga Balogun ◽  
Modinat Abolore Mabayoje ◽  
Shakirat Aderonke Salihu

Diabetic Retinopathy (DR) is a condition that emerges from prolonged diabetes, causing severe damages to the eyes. Early diagnosis of this disease is highly imperative as late diagnosis may be fatal. Existing studies employed machine learning approaches with Support Vector Machines (SVM) having the highest performance on most analyses and Decision Trees (DT) having the lowest. However, SVM has been known to suffer from parameter and kernel selection problems, which undermine its predictive capability. Hence, this study presents homogenous ensemble classification methods with DT as the base classifier to optimize predictive performance. Boosting and Bagging ensemble methods with feature selection were employed, and experiments were carried out using Python Scikit Learn libraries on DR datasets extracted from UCI Machine Learning repository. Experimental results showed that Bagged and Boosted DT were better than SVM. Specifically, Bagged DT performed best with accuracy 65.38 %, f-score 0.664, and AUC 0.731, followed by Boosted DT with accuracy 65.42 %, f-score 0.655, and AUC 0.724 when compared to SVM (accuracy 65.16 %, f-score 0.652, and AUC 0.721). These results indicate that DT's predictive performance can be optimized by employing the homogeneous ensemble methods to outperform SVM in predicting DR.


2020 ◽  
Vol 4 (2) ◽  
pp. 329-335
Author(s):  
Rusydi Umar ◽  
Imam Riadi ◽  
Purwono

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.


Author(s):  
Hedieh Sajedi ◽  
Mehran Bahador

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.


2021 ◽  
Author(s):  
Xin Sui ◽  
Wanjing Wang ◽  
Jinfeng Zhang

In this work, we trained an ensemble model for predicting drug-protein interactions within a sentence based on only its semantics. Our ensembled model was built using three separate models: 1) a classification model using a fine-tuned BERT model; 2) a fine-tuned sentence BERT model that embeds every sentence into a vector; and 3) another classification model using a fine-tuned T5 model. In all models, we further improved performance using data augmentation. For model 2, we predicted the label of a sentence using k-nearest neighbors with its embedded vector. We also explored ways to ensemble these 3 models: a) we used the majority vote method to ensemble these 3 models; and b) based on the HDBSCAN clustering algorithm, we trained another ensemble model using features from all the models to make decisions. Our best model achieved an F-1 score of 0.753 on the BioCreative VII Track 1 test dataset.


Proceedings ◽  
2019 ◽  
Vol 31 (1) ◽  
pp. 60 ◽  
Author(s):  
Irvin Hussein Lopez-Nava ◽  
Matias Garcia-Constantino ◽  
Jesus Favela

Activity recognition is an important task in many fields, such as ambient intelligence, pervasive healthcare, and surveillance. In particular, the recognition of human gait can be useful to identify the characteristics of the places or physical spaces, such as whether the person is walking on level ground or walking down stairs in which people move. For example, ascending or descending stairs can be a risky activity for older adults because of a possible fall, which can have more severe consequences than if it occurred on a flat surface. While portable and wearable devices have been widely used to detect Activities of Daily Living (ADLs), few research works in the literature have focused on characterizing only actions of human gait. In the present study, a method for recognizing gait activities using acceleration data obtained from a smartphone and a wearable inertial sensor placed on the ankle of people is introduced. The acceleration signals were segmented based on the automatic detection of strides, also called gait cycles. Subsequently, a feature vector of the segmented signals was extracted, which was used to train four classifiers using the Naive Bayes, C4.5, Support Vector Machines, and K-Nearest Neighbors algorithms. Data was collected from seven young subjects who performed five gait activities: (i) going down an incline, (ii) going up an incline, (iii) walking on level ground, (iv) going down stairs, and (v) going up stairs. The results demonstrate the viability of using the proposed method and technologies in ambient assisted living contexts.


Sign in / Sign up

Export Citation Format

Share Document