Comparison of the Hybrid Credit Scoring Models Based on Various Classifiers

Credit scoring is an important topic for businesses and socio-economic establishments collecting huge amounts of data, with the intention of making the wrong decision obsolete. In this paper, the authors propose four approaches that combine four well-known classifiers, such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Back-Propagation Network (BPN) and Extreme Learning Machine (ELM). These classifiers are used to find a suitable hybrid classifier combination featuring selection that retains sufficient information for classification purposes. In this regard, different credit scoring combinations are constructed by selecting features with four approaches and classifiers than would otherwise be chosen. Two credit data sets from the University of California, Irvine (UCI), are chosen to evaluate the accuracy of the various hybrid features selection models. In this paper, the procedures that are part of the proposed approaches are described and then evaluated for their performances.

Download Full-text

Novel Mathematical Model of Breast Cancer Diagnostics Using an Associative Pattern Classification

Diagnostics ◽

10.3390/diagnostics10030136 ◽

2020 ◽

Vol 10 (3) ◽

pp. 136 ◽

Cited By ~ 2

Author(s):

Raúl Santiago-Montero ◽

Humberto Sossa ◽

David A. Gutiérrez-Hernández ◽

Víctor Zamudio ◽

Ignacio Hernández-Bautista ◽

...

Keyword(s):

Breast Cancer ◽

Nearest Neighbor ◽

Early Stage ◽

Back Propagation ◽

Cancer Diagnostics ◽

Support Vector ◽

K Nearest Neighbor ◽

Breast Cancer Death ◽

Positron Emission ◽

The Government

Breast cancer is a disease that has emerged as the second leading cause of cancer deaths in women worldwide. The annual mortality rate is estimated to continue growing. Cancer detection at an early stage could significantly reduce breast cancer death rates long-term. Many investigators have studied different breast diagnostic approaches, such as mammography, magnetic resonance imaging, ultrasound, computerized tomography, positron emission tomography and biopsy. However, these techniques have limitations, such as being expensive, time consuming and not suitable for women of all ages. Proposing techniques that support the effective medical diagnosis of this disease has undoubtedly become a priority for the government, for health institutions and for civil society in general. In this paper, an associative pattern classifier (APC) was used for the diagnosis of breast cancer. The rate of efficiency obtained on the Wisconsin breast cancer database was 97.31%. The APC’s performance was compared with the performance of a support vector machine (SVM) model, back-propagation neural networks, C4.5, naive Bayes, k-nearest neighbor (k-NN) and minimum distance classifiers. According to our results, the APC performed best. The algorithm of the APC was written and executed in a JAVA platform, as well as the experimental and comparativeness between algorithms.

Download Full-text

Rapid and Nondestructive On-Site Classification Method for Consumer-Grade Plastics Based on Portable NIR Spectrometer and Machine Learning

Journal of Spectroscopy ◽

10.1155/2020/6631234 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Yinglin Yang ◽

Xin Zhang ◽

Jianwei Yin ◽

Xiangyang Yu

Keyword(s):

Near Infrared ◽

Nearest Neighbor ◽

Diffuse Reflectance Spectrum ◽

Back Propagation ◽

Principal Component ◽

Back Propagation Neural Network ◽

Support Vector ◽

Site Classification ◽

Classification Models ◽

K Nearest Neighbor

The classification of plastic waste before recycling is of great significance to achieve effective recycling. In order to achieve rapid, nondestructive, and on-site detection, a portable near-infrared spectrometer was used in this study to obtain the diffuse reflectance spectrum for both standard and commercial plastics made by ABS, PC, PE, PET, PP, PS, and PVC. After applying a series of pretreatments, the principal component analysis (PCA) was used to analyze the cluster trend. K-nearest neighbor (KNN), support vector machine (SVM), and back propagation neural network (BPNN) classification models were developed and evaluated, respectively. The result showed that different plastics could be well separated in top three principal components space after pretreatment, and the classification models performed excellent classification results and high generalization capability. This study indicated that the portable NIR spectrometer, integrated with chemometrics, could achieve excellent performance and has great potential in the field of commercial plastic identification.

Download Full-text

Identification of Cherry Leaf Disease Infected by Podosphaera Pannosa via Convolutional Neural Network

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/ijaeis.2019040105 ◽

2019 ◽

Vol 10 (2) ◽

pp. 98-110 ◽

Cited By ~ 3

Author(s):

Keke Zhang ◽

Lei Zhang ◽

Qiufeng Wu

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Nearest Neighbor ◽

Early Stage ◽

Back Propagation ◽

Support Vector ◽

Automatic Identification ◽

K Nearest Neighbor ◽

Data Set ◽

Leaf Disease

The cherry leaves infected by Podosphaera pannosa will suffer powdery mildew, which is a serious disease threatening the cherry production industry. In order to identify the diseased cherry leaves in early stage, the authors formulate the cherry leaf disease infected identification as a classification problem and propose a fully automatic identification method based on convolutional neural network (CNN). The GoogLeNet is used as backbone of the CNN. Then, transferred learning techniques are applied to fine-tune the CNN from pre-trained GoogLeNet on ImageNet dataset. This article compares the proposed method against three traditional machine learning methods i.e., support vector machine (SVM), k-nearest neighbor (KNN) and back propagation (BP) neural network. Quantitative evaluations conducted on a data set of 1,200 images collected by smart phones, demonstrates that the CNN achieves best precise performance in identifying diseased cherry leaves, with the testing accuracy of 99.6%. Thus, a CNN can be used effectively in identifying the diseased cherry leaves.

Download Full-text

Machine Learning Models Combined with Virtual Screening and Molecular Docking to Predict Human Topoisomerase I Inhibitors

Molecules ◽

10.3390/molecules24112107 ◽

2019 ◽

Vol 24 (11) ◽

pp. 2107 ◽

Cited By ~ 3

Author(s):

Bingke Li ◽

Xiaokang Kang ◽

Dan Zhao ◽

Yurong Zou ◽

Xudong Huang ◽

...

Keyword(s):

Virtual Screening ◽

Topoisomerase I ◽

Nearest Neighbor ◽

Binding Energies ◽

Support Vector ◽

Features Selection ◽

K Nearest Neighbor ◽

Autodock Vina ◽

Relative Probability ◽

C4.5 Decision Tree

In this work, random forest (RF), support vector machine, k-nearest neighbor and C4.5 decision tree, were used to establish classification models for predicting whether an unknown molecule is an inhibitor of human topoisomerase I (Top1) protein. All these models have achieved satisfactory results, with total prediction accuracies from 89.70% to 97.12%. Through comparative analysis, it can be found that the RF model has the best forecasting effect. The parameters were further optimized to generate the best-performing RF model. At the same time, features selection was implemented to choose properties most relevant to the inhibition of Top1 from 189 molecular descriptors through a special RF procedure. Subsequently, a ligand-based virtual screening was performed from the Maybridge database by the optimal RF model and 596 hits were picked out. Then, 67 molecules with relative probability scores over 0.7 were selected based on the screening results. Next, the 67 molecules above were docked to Top1 using AutoDock Vina. Finally, six top-ranked molecules with binding energies less than −10.0 kcal/mol were screened out and a common backbone, which is entirely different from that of existing Top1 inhibitors reported in the literature, was found.

Download Full-text

Traffic Status Prediction of Arterial Roads Based on the Deep Recurrent Q-Learning

Journal of Advanced Transportation ◽

10.1155/2020/8831521 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Wei Hao ◽

Donglei Rong ◽

Kefu Yi ◽

Qiang Zeng ◽

Zhibo Gao ◽

...

Keyword(s):

Nearest Neighbor ◽

Short Term Memory ◽

Road Traffic ◽

Back Propagation ◽

Memory Storage ◽

Support Vector ◽

K Nearest Neighbor ◽

Continuous Training ◽

System Sensitivity ◽

Q Learning

With the exponential growth of traffic data and the complexity of traffic conditions, in order to effectively store and analyse data to feed back valid information, this paper proposed an urban road traffic status prediction model based on the optimized deep recurrent Q-Learning method. The model is based on the optimized Long Short-Term Memory (LSTM) algorithm to handle the explosive growth of Q-table data, which not only avoids the gradient explosion and disappearance but also has the efficient storage and analysis. The continuous training and memory storage of the training sets are used to improve the system sensitivity, and then, the test sets are predicted based on the accumulated experience pool to obtain high-precision prediction results. The traffic flow data from Wanjiali Road to Shuangtang Road in Changsha City are tested as a case. The research results show that the prediction of the traffic delay index is within a reasonable interval, and it is significantly better than traditional prediction methods such as the LSTM, K-Nearest Neighbor (KNN), Support Vector Machines (SVM), exponential smoothing method, and Back Propagation (BP) neural network, which shows that the model proposed in this paper has the feasibility of application.

Download Full-text

Pattern Recognition of DC Partial Discharge on XLPE Cable Based on ADAM-DBN

Energies ◽

10.3390/en13174566 ◽

2020 ◽

Vol 13 (17) ◽

pp. 4566 ◽

Cited By ~ 1

Author(s):

Zhe Li ◽

Yongpeng Xu ◽

Xiuchen Jiang

Keyword(s):

Pattern Recognition ◽

Nearest Neighbor ◽

Recognition Accuracy ◽

Partial Discharge ◽

Back Propagation ◽

Training Sample ◽

Support Vector ◽

K Nearest Neighbor ◽

Set Size ◽

Sample Set

Pattern recognition of DC partial discharge (PD) receives plenty of attention and recent researches mainly focus on the static characteristics of PD signals. In order to improve the recognition accuracy of DC cable and extract information from PD waveforms, a modified deep belief network (DBN) supervised fine-tuned by the adaptive moment estimation (ADAM) algorithm is proposed to recognize the four typical insulation defects of DC cable according to the PD pulse waveforms. Moreover, the effect of the training sample set size on recognition accuracy is analyzed. Compared with naive Bayes (NB), K-nearest neighbor (KNN), support vector machine (SVM), and back propagation neural networks (BPNN), the ADAM-DBN method has higher accuracy on four different defect types due to the excellent ability in terms of the feature extraction of PD pulse waveforms. Moreover, the increase of training sample set size would lead to the increase of recognition accuracy within a certain range.

Download Full-text

Performance Examination and Feature Selection on Sybil User Data using Recursive Feature Elimination

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i1108.0789s419 ◽

2019 ◽

Vol 8 (9S4) ◽

pp. 48-56 ◽

Cited By ~ 1

Keyword(s):

Social Networks ◽

Feature Selection ◽

Online Social Networks ◽

Nearest Neighbor ◽

Classification Model ◽

Recursive Feature Elimination ◽

Support Vector ◽

Actual Behavior ◽

Features Selection ◽

K Nearest Neighbor

Machine Learning (ML) research greatly helps in predicting model-based outcomes with high levels of accuracy based upon the training and testing of the models through the datasets. The social networks constitute one of the domains where ML can be used effectively to ensure the authenticity and security of the valid users. With the increase in usage of Online Social Networks (OSNs), the cases of spam and malicious activities can be found in abundance and Sybil nodes pose one such kind of safety and security hazard. Sybil account detection is not an easy task since they mimic the actual behavior of human accounts up to a great extent. In this paper, we look at one such scenario of Sybil accounts on the OSN, Twitter where machine leaning models have been used to train the machine with the existing datasets so as to be able to detect these malicious users before they can bring harm to the normal communication of the genuine users. Since the datasets used are so vast, the process of feature selection has been carried on the datasets as part of pre-processing before the actual classification as it assists in enhancing the model performance. Support Vector Machine–Recursive Feature Elimination (SVM-RFE) and Logistic Regression–Recursive Feature Elimination (LR-RFE) techniques have been used in this study for the selection of significant features. The classification model is trained on the selected features using Random Forest (RF) and K-Nearest Neighbor (KNN) algorithms. We also analyzed the biasing effects of fake accounts on the human accounts datasets during the process of features selection and classification. It has been shown that the RF algorithm outperformed KNN on the feature sets selected through SVM-RFE and LR-RFE.

Download Full-text

Automatic Detection and Staging of Lung Tumors using Locational Features and Double-Staged Classifications

Applied Sciences ◽

10.3390/app9112329 ◽

2019 ◽

Vol 9 (11) ◽

pp. 2329 ◽

Cited By ~ 3

Author(s):

May Phu Paing ◽

Kazuhiko Hamamoto ◽

Supan Tungjitkusolmun ◽

Chuchart Pintavirooj

Keyword(s):

Lung Cancer ◽

Nearest Neighbor ◽

Treatment Options ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Classification Performance ◽

Clinical Staging ◽

Support Vector ◽

K Nearest Neighbor ◽

Experimental Findings

Lung cancer is a life-threatening disease with the highest morbidity and mortality rates of any cancer worldwide. Clinical staging of lung cancer can significantly reduce the mortality rate, because effective treatment options strongly depend on the specific stage of cancer. Unfortunately, manual staging remains a challenge due to the intensive effort required. This paper presents a computer-aided diagnosis (CAD) method for detecting and staging lung cancer from computed tomography (CT) images. This CAD works in three fundamental phases: segmentation, detection, and staging. In the first phase, lung anatomical structures from the input tomography scans are segmented using gray-level thresholding. In the second, the tumor nodules inside the lungs are detected using some extracted features from the segmented tumor candidates. In the last phase, the clinical stages of the detected tumors are defined by extracting locational features. For accurate and robust predictions, our CAD applies a double-staged classification: the first is for the detection of tumors and the second is for staging. In both classification stages, five alternative classifiers, namely the Decision Tree (DT), K-nearest neighbor (KNN), Support Vector Machine (SVM), Ensemble Tree (ET), and Back Propagation Neural Network (BPNN), are applied and compared to ensure high classification performance. The average accuracy levels of 92.8% for detection and 90.6% for staging are achieved using BPNN. Experimental findings reveal that the proposed CAD method provides preferable results compared to previous methods; thus, it is applicable as a clinical diagnostic tool for lung cancer.

Download Full-text

Optimizing Error Rate in Intrusion Detection System Using Artificial Neural Network Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i9.102 ◽

2018 ◽

Vol 6 (9) ◽

pp. 152

Author(s):

S. Vijaya Rani ◽

G. N. K. Suresh Babu

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Intrusion Detection ◽

Error Rate ◽

Learning Process ◽

Nearest Neighbor ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor ◽

Artificial Neural

The illegal hackers penetrate the servers and networks of corporate and financial institutions to gain money and extract vital information. The hacking varies from one computing system to many system. They gain access by sending malicious packets in the network through virus, worms, Trojan horses etc. The hackers scan a network through various tools and collect information of network and host. Hence it is very much essential to detect the attacks as they enter into a network. The methods available for intrusion detection are Naive Bayes, Decision tree, Support Vector Machine, K-Nearest Neighbor, Artificial Neural Networks. A neural network consists of processing units in complex manner and able to store information and make it functional for use. It acts like human brain and takes knowledge from the environment through training and learning process. Many algorithms are available for learning process This work carry out research on analysis of malicious packets and predicting the error rate in detection of injured packets through artificial neural network algorithms.

Download Full-text