Using machine learning techniques to develop prediction models for detecting unpaid credit card customers

2020 ◽  
Vol 39 (5) ◽  
pp. 6073-6087
Author(s):  
Meltem Yontar ◽  
Özge Hüsniye Namli ◽  
Seda Yanik

Customer behavior prediction is gaining more importance in the banking sector like in any other sector recently. This study aims to propose a model to predict whether credit card users will pay their debts or not. Using the proposed model, potential unpaid risks can be predicted and necessary actions can be taken in time. For the prediction of customers’ payment status of next months, we use Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Tree (CART) and C4.5, which are widely used artificial intelligence and decision tree algorithms. Our dataset includes 10713 customer’s records obtained from a well-known bank in Taiwan. These records consist of customer information such as the amount of credit, gender, education level, marital status, age, past payment records, invoice amount and amount of credit card payments. We apply cross validation and hold-out methods to divide our dataset into two parts as training and test sets. Then we evaluate the algorithms with the proposed performance metrics. We also optimize the parameters of the algorithms to improve the performance of prediction. The results show that the model built with the CART algorithm, one of the decision tree algorithm, provides high accuracy (about 86%) to predict the customers’ payment status for next month. When the algorithm parameters are optimized, classification accuracy and performance are increased.

2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Ya-Han Hu ◽  
Chun-Tien Tai ◽  
Chih-Fong Tsai ◽  
Min-Wei Huang

Digoxin is a high-alert medication because of its narrow therapeutic range and high drug-to-drug interactions (DDIs). Approximately 50% of digoxin toxicity cases are preventable, which motivated us to improve the treatment outcomes of digoxin. The objective of this study is to apply machine learning techniques to predict the appropriateness of initial digoxin dosage. A total of 307 inpatients who had their conditions treated with digoxin between 2004 and 2013 at a medical center in Taiwan were collected in the study. Ten independent variables, including demographic information, laboratory data, and whether the patients had CHF were also noted. A patient with serum digoxin concentration being controlled at 0.5–0.9 ng/mL after his/her initial digoxin dosage was defined as having an appropriate use of digoxin; otherwise, a patient was defined as having an inappropriate use of digoxin. Weka 3.7.3, an open source machine learning software, was adopted to develop prediction models. Six machine learning techniques were considered, including decision tree (C4.5), k-nearest neighbors (kNN), classification and regression tree (CART), randomForest (RF), multilayer perceptron (MLP), and logistic regression (LGR). In the non-DDI group, the area under ROC curve (AUC) of RF (0.912) was excellent, followed by that of MLP (0.813), CART (0.791), and C4.5 (0.784); the remaining classifiers performed poorly. For the DDI group, the AUC of RF (0.892) was the best, followed by CART (0.795), MLP (0.777), and C4.5 (0.774); the other classifiers’ performances were less than ideal. The decision tree-based approaches and MLP exhibited markedly superior accuracy performance, regardless of DDI status. Although digoxin is a high-alert medication, its initial dose can be accurately determined by using data mining techniques such as decision tree-based and MLP approaches. Developing a dosage decision support system may serve as a supplementary tool for clinicians and also increase drug safety in clinical practice.


Analysis of credit scoring is an effective credit risk assessment technique, which is one of the major research fields in the banking sector. Machine learning has a variety of applications in the banking sector and it has been widely used for data analysis. Modern techniques such as machine learning have provided a self-regulating process to analyze the data using classification techniques. The classification method is a supervised learning process in which the computer learns from the input data provided and makes use of this information to classify the new dataset. This research paper presents a comparison of various machine learning techniques used to evaluate the credit risk. A credit transaction that needs to be accepted or rejected is trained and implemented on the dataset using different machine learning algorithms. The techniques are implemented on the German credit dataset taken from UCI repository which has 1000 instances and 21 attributes, depending on which the transactions are either accepted or rejected. This paper compares algorithms such as Support Vector Network, Neural Network, Logistic Regression, Naive Bayes, Random Forest, and Classification and Regression Trees (CART) algorithm and the results obtained show that Random Forest algorithm was able to predict credit risk with higher accuracy


2013 ◽  
Vol 864-867 ◽  
pp. 2782-2786
Author(s):  
Bao Hua Yang ◽  
Shuang Li

This papers deals with the study of the algorithm of classification method based on decision tree for remote sensing image. The experimental area is located in the Xiangyang district, the data source for the 2010 satellite images of SPOT and TM fusion. Moreover, classification method based on decision tree is optimized with the help of the module of RuleGen and applied in regional remote sensing image of interest. The precision of Maximum likelihood ratio is 95.15 percent, and 94.82 percent for CRAT. Experimental results show that the classification method based on classification and regression tree method is as well as the traditional one.


2020 ◽  
Vol 14 (2) ◽  
pp. 273-284
Author(s):  
Reni Pratiwi ◽  
Memi Nor Hayati ◽  
Surya Prangga

Decision tree is a algorithm used as a reasoning procedure to get answers from problems are entered. Many methods can be used in decision trees, including the C5.0 algorithm and Classification and Regression Tree (CART). C5.0 algorithm is a non-binary decision tree where the branch of tree can be more than two, while the CART algorithm is a binary decision tree where the branch of tree consists of only two branches. This research aims to determine the classification results of the C5.0 and CART algorithms and to determine the comparison of the accuracy classification results from these two methods. The variables used in this research are the average monthly income (Y), employment (X1), number of family members (X2), last education (X3) and gender (X4). After analyzing the results obtained that the accuracy rate of C5.0 algorithm is 79,17% while the accuracy rate of CART is 84,63%. So it can be said that the CART method is a better method in classifying the average income of the people of Teluk Baru Village in Muara Ancalong District in 2019 compared to the C5.0 algorithm method.   Keywords: C5.0 Algorithm, CART, Classification, Decision Tree.


Author(s):  
K Sumanth Reddy ◽  
Gaddam Pranith ◽  
Karre Varun ◽  
Thipparthy Surya Sai Teja

The compressive strength of concrete plays an important role in determining the durability and performance of concrete. Due to rapid growth in material engineering finalizing an appropriate proportion for the mix of concrete to obtain the desired compressive strength of concrete has become cumbersome and a laborious task further the problem becomes more complex to obtain a rational relation between the concrete materials used to the strength obtained. The development in computational methods can be used to obtain a rational relation between the materials used and the compressive strength using machine learning techniques which reduces the influence of outliers and all unwanted variables influence in the determination of compressive strength. In this paper basic machine learning technics Multilayer perceptron neural network (MLP), Support Vector Machines (SVM), linear regressions (LR) and Classification and Regression Tree (CART), have been used to develop a model for determining the compressive strength for two different set of data (ingredients). Among all technics used the SVM provides a better results in comparison to other, but comprehensively the SVM cannot be a universal model because many recent literatures have proved that such models need more data and also the dynamicity of the attributes involved play an important role in determining the efficacy of the model.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1578 ◽  
Author(s):  
Guanghui Hu ◽  
Weizhi Zhang ◽  
Hong Wan ◽  
Xinxin Li

In pedestrian inertial navigation, multi-sensor fusion is often used to obtain accurate heading estimates. As a widely distributed signal source, the geomagnetic field is convenient to provide sufficiently accurate heading angles. Unfortunately, there is a broad presence of artificial magnetic perturbations in indoor environments, leading to difficulties in geomagnetic correction. In this paper, by analyzing the spatial distribution model of the magnetic interference field on the geomagnetic field, two quantitative features have been found to be crucial in distinguishing normal magnetic data from anomalies. By leveraging these two features and the classification and regression tree (CART) algorithm, we trained a decision tree that is capable of extracting magnetic data from distorted measurements. Furthermore, this well-trained decision tree can be used as a reject gate in a Kalman filter. By combining the decision tree and Kalman filter, a high-precision indoor pedestrian navigation system based on a magnetically assisted inertial system is proposed. This system is then validated in a real indoor environment, and the results show that our system delivers state-of-the-art positioning performance. Compared to other baseline algorithms, an improvement of over 70% in the positioning accuracy is achieved.


2021 ◽  
Vol 25 (4) ◽  
pp. 929-948
Author(s):  
Shuang Yu ◽  
Xiongfei Li ◽  
Hancheng Wang ◽  
Xiaoli Zhang ◽  
Shiping Chen

In classification, a decision tree is a common model due to its simple structure and easy understanding. Most of decision tree algorithms assume all instances in a dataset have the same degree of confidence, so they use the same generation and pruning strategies for all training instances. In fact, the instances with greater degree of confidence are more useful than the ones with lower degree of confidence in the same dataset. Therefore, the instances should be treated discriminately according to their corresponding confidence degrees when training classifiers. In this paper, we investigate the impact and significance of degree of confidence of instances on the classification performance of decision tree algorithms, taking the classification and regression tree (CART) algorithm as an example. First, the degree of confidence of instances is quantified from a statistical perspective. Then, a developed CART algorithm named C_CART is proposed by introducing the confidence of instances into the generation and pruning processes of CART algorithm. Finally, we conduct experiments to evaluate the performance of C_CART algorithm. The experimental results show that our C_CART algorithm can significantly improve the generalization performance as well as avoiding the over-fitting problem to a certain extend.


Information ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 344
Author(s):  
Neda Rostamzadeh ◽  
Sheikh S. Abdullah ◽  
Kamran Sedig ◽  
Amit X. Garg ◽  
Eric McArthur

The use of data analysis techniques in electronic health records (EHRs) offers great promise in improving predictive risk modeling. Although useful, these analysis techniques often suffer from a lack of interpretability and transparency, especially when the data is high-dimensional. The emergence of a type of computational system known as visual analytics has the potential to address these issues by integrating data analysis techniques with interactive visualizations. This paper introduces a visual analytics system called VERONICA that utilizes the natural classification of features in EHRs to identify the group of features with the strongest predictive power. VERONICA incorporates a representative set of supervised machine learning techniques—namely, classification and regression tree, C5.0, random forest, support vector machines, and naive Bayes to support users in developing predictive models using EHRs. It then makes the analytics results accessible through an interactive visual interface. By integrating different sampling strategies, analytics algorithms, visualization techniques, and human-data interaction, VERONICA assists users in comparing prediction models in a systematic way. To demonstrate the usefulness and utility of our proposed system, we use the clinical dataset stored at ICES to identify the best representative feature groups in detecting patients who are at high risk of developing acute kidney injury.


The scope of this research work is to identify the efficient machine learning algorithm for predicting the behavior of a student from the student performance dataset. We applied Support Vector Machines, K-Nearest Neighbor, Decision Tree and Naïve Bayes algorithms to predict the grade of a student and compared their prediction results in terms of various performance metrics. The students who visited many resources for reference, made academic related discussions and interactions in the class room, absent for minimum days, cared by parents care have shown great improvement in the final grade. Among the machine learning techniques we have used, SVM has shown more accuracy in terms of four important attribute. The accuracy rate of SVM after tuning is 0.80. The KNN and decision tree achieves the accuracy of 0.64, 0.65 respectively whereas the Naïve Bayes achieves 0.77.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Maisa Cardoso Aniceto ◽  
Flavio Barboza ◽  
Herbert Kimura

AbstractCredit risk evaluation has a relevant role to financial institutions, since lending may result in real and immediate losses. In particular, default prediction is one of the most challenging activities for managing credit risk. This study analyzes the adequacy of borrower’s classification models using a Brazilian bank’s loan database, and exploring machine learning techniques. We develop Support Vector Machine, Decision Trees, Bagging, AdaBoost and Random Forest models, and compare their predictive accuracy with a benchmark based on a Logistic Regression model. Comparisons are analyzed based on usual classification performance metrics. Our results show that Random Forest and Adaboost perform better when compared to other models. Moreover, Support Vector Machine models show poor performance using both linear and nonlinear kernels. Our findings suggest that there are value creating opportunities for banks to improve default prediction models by exploring machine learning techniques.


Sign in / Sign up

Export Citation Format

Share Document