Default Probability Prediction of Credit Applicants Using a New Fuzzy KNN Method with Optimal Weights

Author(s):  
Abbas Keramati ◽  
Niloofar Yousefi ◽  
Amin Omidvar

Credit scoring has become a very important issue due to the recent growth of the credit industry. As the first objective, this chapter provides an academic database of literature between and proposes a classification scheme to classify the articles. The second objective of this chapter is to suggest the employing of the Optimally Weighted Fuzzy K-Nearest Neighbor (OWFKNN) algorithm for credit scoring. To show the performance of this method, two real world datasets from UCI database are used. In classification task, the empirical results demonstrate that the OWFKNN outperforms the conventional KNN and fuzzy KNN methods and also other methods. In the predictive accuracy of probability of default, the OWFKNN also show the best performance among the other methods. The results in this chapter suggest that the OWFKNN approach is mostly effective in estimating default probabilities and is a promising method to the fields of classification.

2018 ◽  
pp. 1838-1874
Author(s):  
Abbas Keramati ◽  
Niloofar Yousefi ◽  
Amin Omidvar

Credit scoring has become a very important issue due to the recent growth of the credit industry. As the first objective, this chapter provides an academic database of literature between and proposes a classification scheme to classify the articles. The second objective of this chapter is to suggest the employing of the Optimally Weighted Fuzzy K-Nearest Neighbor (OWFKNN) algorithm for credit scoring. To show the performance of this method, two real world datasets from UCI database are used. In classification task, the empirical results demonstrate that the OWFKNN outperforms the conventional KNN and fuzzy KNN methods and also other methods. In the predictive accuracy of probability of default, the OWFKNN also show the best performance among the other methods. The results in this chapter suggest that the OWFKNN approach is mostly effective in estimating default probabilities and is a promising method to the fields of classification.


Author(s):  
Xiao He ◽  
Francesco Alesiani ◽  
Ammar Shaker

Many real-world large-scale regression problems can be formulated as Multi-task Learning (MTL) problems with a massive number of tasks, as in retail and transportation domains. However, existing MTL methods still fail to offer both the generalization performance and the scalability for such problems. Scaling up MTL methods to problems with a tremendous number of tasks is a big challenge. Here, we propose a novel algorithm, named Convex Clustering Multi-Task regression Learning (CCMTL), which integrates with convex clustering on the k-nearest neighbor graph of the prediction models. Further, CCMTL efficiently solves the underlying convex problem with a newly proposed optimization method. CCMTL is accurate, efficient to train, and empirically scales linearly in the number of tasks. On both synthetic and real-world datasets, the proposed CCMTL outperforms seven state-of-the-art (SoA) multi-task learning methods in terms of prediction accuracy as well as computational efficiency. On a real-world retail dataset with 23,812 tasks, CCMTL requires only around 30 seconds to train on a single thread, while the SoA methods need up to hours or even days.


Author(s):  
Nan Yan ◽  
Subin Huang ◽  
Chao Kong

Discovering entity synonymous relations is an important work for many entity-based applications. Existing entity synonymous relation extraction approaches are mainly based on lexical patterns or distributional corpus-level statistics, ignoring the context semantics between entities. For example, the contexts around ''apple'' determine whether ''apple'' is a kind of fruit or Apple Inc. In this paper, an entity synonymous relation extraction approach is proposed using context-aware permutation invariance. Specifically, a triplet network is used to obtain the permutation invariance between the entities to learn whether two given entities possess synonymous relation. To track more synonymous features, the relational context semantics and entity representations are integrated into the triplet network, which can improve the performance of extracting entity synonymous relations. The proposed approach is implemented on three real-world datasets. Experimental results demonstrate that the approach performs better than the other compared approaches on entity synonymous relation extraction task.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1692 ◽  
Author(s):  
Iván Silva ◽  
José Eugenio Naranjo

Identifying driving styles using classification models with in-vehicle data can provide automated feedback to drivers on their driving behavior, particularly if they are driving safely. Although several classification models have been developed for this purpose, there is no consensus on which classifier performs better at identifying driving styles. Therefore, more research is needed to evaluate classification models by comparing performance metrics. In this paper, a data-driven machine-learning methodology for classifying driving styles is introduced. This methodology is grounded in well-established machine-learning (ML) methods and literature related to driving-styles research. The methodology is illustrated through a study involving data collected from 50 drivers from two different cities in a naturalistic setting. Five features were extracted from the raw data. Fifteen experts were involved in the data labeling to derive the ground truth of the dataset. The dataset fed five different models (Support Vector Machines (SVM), Artificial Neural Networks (ANN), fuzzy logic, k-Nearest Neighbor (kNN), and Random Forests (RF)). These models were evaluated in terms of a set of performance metrics and statistical tests. The experimental results from performance metrics showed that SVM outperformed the other four models, achieving an average accuracy of 0.96, F1-Score of 0.9595, Area Under the Curve (AUC) of 0.9730, and Kappa of 0.9375. In addition, Wilcoxon tests indicated that ANN predicts differently to the other four models. These promising results demonstrate that the proposed methodology may support researchers in making informed decisions about which ML model performs better for driving-styles classification.


2018 ◽  
Vol 189 ◽  
pp. 03008
Author(s):  
Xiaoshuang Qiao ◽  
Hui Wang ◽  
Gongde Guo ◽  
Yuanyuan Liu

This paper explores a new ensemble approach called Ensemble Probability Distribution Novelty Detection (EPDND) for novelty detection. The proposed ensemble approach provides a metric to characterize different classes. Experimental results on 4 real-world datasets show that EPDND exhibits competitive overall performance to the other two common novelty detection approaches - Support Vector Domain Description and Gaussian Mixed Models in terms of accuracy, recall and F1 scores in many cases.


2018 ◽  
Vol 1025 ◽  
pp. 012114
Author(s):  
M A Mukid ◽  
T Widiharih ◽  
A Rusgiyono ◽  
A Prahutama

Author(s):  
Fei-Long Chen ◽  
Feng-Chia Li

Credit scoring is an important topic for businesses and socio-economic establishments collecting huge amounts of data, with the intention of making the wrong decision obsolete. In this paper, the authors propose four approaches that combine four well-known classifiers, such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Back-Propagation Network (BPN) and Extreme Learning Machine (ELM). These classifiers are used to find a suitable hybrid classifier combination featuring selection that retains sufficient information for classification purposes. In this regard, different credit scoring combinations are constructed by selecting features with four approaches and classifiers than would otherwise be chosen. Two credit data sets from the University of California, Irvine (UCI), are chosen to evaluate the accuracy of the various hybrid features selection models. In this paper, the procedures that are part of the proposed approaches are described and then evaluated for their performances.


2021 ◽  
Vol 25 (6) ◽  
pp. 1349-1368
Author(s):  
Chung-Chian Hsu ◽  
Wei-Cyun Tsao ◽  
Arthur Chang ◽  
Chuan-Yu Chang

Most of real-world datasets are of mixed type including both numeric and categorical attributes. Unlike numbers, operations on categorical values are limited, and the degree of similarity between distinct values cannot be measured directly. In order to properly analyze mixed-type data, dedicated methods to handle categorical values in the datasets are needed. The limitation of most existing methods is lack of appropriate numeric representations of categorical values. Consequently, some of analysis algorithms cannot be applied. In this paper, we address this deficiency by transforming categorical values to their numeric representation so as to facilitate various analyses of mixed-type data. In particular, the proposed transformation method preserves semantics of categorical values with respect to the other values in the dataset, resulting in better performance on data analyses including classification and clustering. The proposed method is verified and compared with other methods on extensive real-world datasets.


Author(s):  
Fei-Long Chen ◽  
Feng-Chia Li

Credit scoring is an important topic for businesses and socio-economic establishments collecting huge amounts of data, with the intention of making the wrong decision obsolete. In this paper, the authors propose four approaches that combine four well-known classifiers, such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Back-Propagation Network (BPN) and Extreme Learning Machine (ELM). These classifiers are used to find a suitable hybrid classifier combination featuring selection that retains sufficient information for classification purposes. In this regard, different credit scoring combinations are constructed by selecting features with four approaches and classifiers than would otherwise be chosen. Two credit data sets from the University of California, Irvine (UCI), are chosen to evaluate the accuracy of the various hybrid features selection models. In this paper, the procedures that are part of the proposed approaches are described and then evaluated for their performances.


2018 ◽  
Vol 7 (4.44) ◽  
pp. 194
Author(s):  
Intan Nurma Yulita ◽  
Mohamad Ivan Fanany ◽  
Aniati Murni Arymurthy

Autism is a brain development disorder that affects the patient's ability to communicate and interact with others. Most people with autism get sleep disorders. But they have some difficulty to communicate, so this problem is getting worse. The alternative that can be done is to detect sleep disorders through polysomnography. One of the test purposes is to classify the sleep stages. The doctors need a long time to process it. This paper presents an automatic sleep stage classification. The classification was based on the shallow classifiers, namely naive Bayes, k-nearest neighbor (KNN), multi-layer perceptron (MLP), and C4.5 (a type of decision tree). On the other hand, this dataset has a class imbalance problem. As a solution, this study carried out the mechanism of resampling. The results show that the use of d as a measure of the uniformity of data distribution greatly influenced the classification performance. The higher d, the more uniform the distribution of data (0 <= d <= 1). The performance with d = 1 was higher than d = 0. On the other hand, KNN was the best classifier. The highest accuracy and F-measure were 83.07 and 82.80 respectively. 


Sign in / Sign up

Export Citation Format

Share Document