Comparison of the Hybrid Credit Scoring Models Based on Various Classifiers

Author(s):  
Fei-Long Chen ◽  
Feng-Chia Li

Credit scoring is an important topic for businesses and socio-economic establishments collecting huge amounts of data, with the intention of making the wrong decision obsolete. In this paper, the authors propose four approaches that combine four well-known classifiers, such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Back-Propagation Network (BPN) and Extreme Learning Machine (ELM). These classifiers are used to find a suitable hybrid classifier combination featuring selection that retains sufficient information for classification purposes. In this regard, different credit scoring combinations are constructed by selecting features with four approaches and classifiers than would otherwise be chosen. Two credit data sets from the University of California, Irvine (UCI), are chosen to evaluate the accuracy of the various hybrid features selection models. In this paper, the procedures that are part of the proposed approaches are described and then evaluated for their performances.

Author(s):  
Fei-Long Chen ◽  
Feng-Chia Li

Credit scoring is an important topic for businesses and socio-economic establishments collecting huge amounts of data, with the intention of making the wrong decision obsolete. In this paper, the authors propose four approaches that combine four well-known classifiers, such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Back-Propagation Network (BPN) and Extreme Learning Machine (ELM). These classifiers are used to find a suitable hybrid classifier combination featuring selection that retains sufficient information for classification purposes. In this regard, different credit scoring combinations are constructed by selecting features with four approaches and classifiers than would otherwise be chosen. Two credit data sets from the University of California, Irvine (UCI), are chosen to evaluate the accuracy of the various hybrid features selection models. In this paper, the procedures that are part of the proposed approaches are described and then evaluated for their performances.


2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


Author(s):  
Norsyela Muhammad Noor Mathivanan ◽  
Nor Azura Md.Ghani ◽  
Roziah Mohd Janor

<p>Online business development through e-commerce platforms is a phenomenon which change the world of promoting and selling products in this 21<sup>st</sup> century. Product title classification is an important task in assisting retailers and sellers to list a product in a suitable category. Product title classification is apart of text classification problem but the properties of product title are different from general document. This study aims to evaluate the performance of five different supervised learning models on data sets consist of e-commerce product titles with a very short description and they are incomplete sentences. The supervised learning models involve in the study are Naïve Bayes, K-Nearest Neighbor (KNN), Decision Tree, Support Vector Machine (SVM) and Random Forest. The results show KNN model is the best model with the highest accuracy and fastest computation time to classify the data used in the study. Hence, KNN model is a good approach in classifying e-commerce products.</p>


Author(s):  
Khairul Anam ◽  
Adel Al-Jumaily

Myoelectric pattern recognition (MPR) is used to detect user’s intention to achieve a smooth interaction between human and machine. The performance of MPR is influenced by the features extracted and the classifier employed. A kernel extreme learning machine especially radial basis function extreme learning machine (RBF-ELM) has emerged as one of the potential classifiers for MPR. However, RBF-ELM should be optimized to work efficiently. This paper proposed an optimization of RBF-ELM parameters using hybridization of particle swarm optimization (PSO) and a wavelet function. These proposed systems are employed to classify finger movements on the amputees and able-bodied subjects using electromyography signals. The experimental results show that the accuracy of the optimized RBF-ELM is 95.71% and 94.27% in the healthy subjects and the amputees, respectively. Meanwhile, the optimization using PSO only attained the average accuracy of 95.53 %, and 92.55 %, on the healthy subjects and the amputees, respectively. The experimental results also show that SW-RBF-ELM achieved the accuracy that is better than other well-known classifiers such as support vector machine (SVM), linear discriminant analysis (LDA) and k-nearest neighbor (kNN).


Diagnostics ◽  
2020 ◽  
Vol 10 (3) ◽  
pp. 136 ◽  
Author(s):  
Raúl Santiago-Montero ◽  
Humberto Sossa ◽  
David A. Gutiérrez-Hernández ◽  
Víctor Zamudio ◽  
Ignacio Hernández-Bautista ◽  
...  

Breast cancer is a disease that has emerged as the second leading cause of cancer deaths in women worldwide. The annual mortality rate is estimated to continue growing. Cancer detection at an early stage could significantly reduce breast cancer death rates long-term. Many investigators have studied different breast diagnostic approaches, such as mammography, magnetic resonance imaging, ultrasound, computerized tomography, positron emission tomography and biopsy. However, these techniques have limitations, such as being expensive, time consuming and not suitable for women of all ages. Proposing techniques that support the effective medical diagnosis of this disease has undoubtedly become a priority for the government, for health institutions and for civil society in general. In this paper, an associative pattern classifier (APC) was used for the diagnosis of breast cancer. The rate of efficiency obtained on the Wisconsin breast cancer database was 97.31%. The APC’s performance was compared with the performance of a support vector machine (SVM) model, back-propagation neural networks, C4.5, naive Bayes, k-nearest neighbor (k-NN) and minimum distance classifiers. According to our results, the APC performed best. The algorithm of the APC was written and executed in a JAVA platform, as well as the experimental and comparativeness between algorithms.


Electrocardiogram (ECG) is the analysis of the electrical movement of the heart over a period of time. The detailed information about the condition of the heart is measured by analyzing the ECG signal. Wavelet transform, fast Fourier transform are the different methods to disorganize cardiac disease. The paper elaborates the survey on ECG signal analysis and related study on arrhythmic and non arrhythmic data. Here we discuss the efficient feature extraction process for electrocardiogram, where based on position and priority six best P-QRS-T fragments are studied. This survey examines the the outcome of the system by using various Machine learning classification algorithms for feature extraction and analysis of ECG Signals. Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN) are the most important algorithms used here for this purpose. There are several publicly available data sets which are used for arrhythmia analysis and among them MIT-BIH ECG-ID database is mostly used. The drawbacks and limitations are also discussed here and from there future challenges and concluding remarks can be done.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Yinglin Yang ◽  
Xin Zhang ◽  
Jianwei Yin ◽  
Xiangyang Yu

The classification of plastic waste before recycling is of great significance to achieve effective recycling. In order to achieve rapid, nondestructive, and on-site detection, a portable near-infrared spectrometer was used in this study to obtain the diffuse reflectance spectrum for both standard and commercial plastics made by ABS, PC, PE, PET, PP, PS, and PVC. After applying a series of pretreatments, the principal component analysis (PCA) was used to analyze the cluster trend. K-nearest neighbor (KNN), support vector machine (SVM), and back propagation neural network (BPNN) classification models were developed and evaluated, respectively. The result showed that different plastics could be well separated in top three principal components space after pretreatment, and the classification models performed excellent classification results and high generalization capability. This study indicated that the portable NIR spectrometer, integrated with chemometrics, could achieve excellent performance and has great potential in the field of commercial plastic identification.


Author(s):  
Keke Zhang ◽  
Lei Zhang ◽  
Qiufeng Wu

The cherry leaves infected by Podosphaera pannosa will suffer powdery mildew, which is a serious disease threatening the cherry production industry. In order to identify the diseased cherry leaves in early stage, the authors formulate the cherry leaf disease infected identification as a classification problem and propose a fully automatic identification method based on convolutional neural network (CNN). The GoogLeNet is used as backbone of the CNN. Then, transferred learning techniques are applied to fine-tune the CNN from pre-trained GoogLeNet on ImageNet dataset. This article compares the proposed method against three traditional machine learning methods i.e., support vector machine (SVM), k-nearest neighbor (KNN) and back propagation (BP) neural network. Quantitative evaluations conducted on a data set of 1,200 images collected by smart phones, demonstrates that the CNN achieves best precise performance in identifying diseased cherry leaves, with the testing accuracy of 99.6%. Thus, a CNN can be used effectively in identifying the diseased cherry leaves.


Author(s):  
Mohamed Alloghani ◽  
Ahmed Aljaaf ◽  
Abir Hussain ◽  
Thar Baker ◽  
Jamila Mustafina ◽  
...  

Abstract Background Machine learning is a branch of Artificial Intelligence that is concerned with the design and development of algorithms, and it enables today’s computers to have the property of learning. Machine learning is gradually growing and becoming a critical approach in many domains such as health, education, and business. Methods In this paper, we applied machine learning to the diabetes dataset with the aim of recognizing patterns and combinations of factors that characterizes or explain re-admission among diabetes patients. The classifiers used include Linear Discriminant Analysis, Random Forest, k–Nearest Neighbor, Naïve Bayes, J48 and Support vector machine. Results Of the 100,000 cases, 78,363 were diabetic and over 47% were readmitted.Based on the classes that models produced, diabetic patients who are more likely to be readmitted are either women, or Caucasians, or outpatients, or those who undergo less rigorous lab procedures, treatment procedures, or those who receive less medication, and are thus discharged without proper improvements or administration of insulin despite having been tested positive for HbA1c. Conclusion Diabetic patients who do not undergo vigorous lab assessments, diagnosis, medications are more likely to be readmitted when discharged without improvements and without receiving insulin administration, especially if they are women, Caucasians, or both.


2014 ◽  
Vol 01 (04) ◽  
pp. 1450037 ◽  
Author(s):  
Sulin Pang ◽  
Shuqing Li ◽  
Jinwang Xiao

Considering the question of personal credit rating, this paper proposes a hybrid method for credit assessment based on an improved Support Vector Data Description (SVDD) algorithm combined with the particle swarm optimization (PSO) algorithm. First, the paper carries out data preprocess, and then it solves the two problems: parameters optimization and feature selection at the same time using the PSO algorithm combined with the improved SVDD algorithm and assesses the credit data using the optimized parameters and features. Finally, the method constructed is tested through two data sets in practice, and the results show that the hybrid method constructed in this paper can obtain higher classification accuracy compared with some other existing credit scoring methods.


Sign in / Sign up

Export Citation Format

Share Document