scholarly journals Data Attribute Selection with Information Gain to Improve Credit Approval Classification Performance using K-Nearest Neighbor Algorithm

Author(s):  
Ivandari Ivandari ◽  
Tria Titiani Chasanah ◽  
Sattriedi Wahyu Binabar ◽  
M. Adib Adib Al Karomi

Credit is one of the modern economic behaviors. In practice, credit can be either borrowing a certain amount of money or purchasing goods with a gradual payment process and within an agreed timeframe. Economic conditions that are less supportive and high community needs make people choose to buy goods with this credit process. Unfortunately the high needs sometimes are not in line with the ability to make payments in accordance with the initial agreement. Such condition causes the payment process to be disrupted or also called the term “bad credit”. This research uses public data of credit card dataset from UCI repository and private data that is dataset of credit approval from local banking. The information gain algorithm is used to calculate the weights of each of the attributes. From the calculation results note that all attributes have different weights. This study resulted in the conclusion that not all data attributes influence the classification result. Suppose attribute A1 to UCI dataset as well as loan type attribute on local dataset that has information gain weight 0 (zero). The result of classification using K-Nearest Neighbors algorithm shows that there is an increase of 7.53% for UCI dataset and 3.26% for local dataset after feature selection on both datasets.

Author(s):  
Tsehay Admassu Assegie*

Phishing causes many problems in business industry. The electronic commerce and electronic banking such as mobile banking involves a number of online transaction. In such online transactions, we have to discriminate features related to legitimate and phishing websites in order to ensure security of the online transaction. In this study, we have collected data form phish tank public data repository and proposed K-Nearest Neighbors (KNN) based model for phishing attack detection. The proposed model detects phishing attack through URL classification. The performance of the proposed model is tested empirically and result is analyzed. Experimental result on test set reveals that the model is efficient on phishing attack detection. Furthermore, the K value that gives better accuracy is determined to achieve better performance on phishing attack detection. Overall, the average accuracy of the proposed model is 85.08%.


Author(s):  
Tsehay Admassu Assegie ◽  

Phishing causes many problems in business industry. The electronic commerce and electronic banking such as mobile banking involves a number of online transaction. In such online transactions, we have to discriminate features related to legitimate and phishing websites in order to ensure security of the online transaction. In this study, we have collected data form phish tank public data repository and proposed K-Nearest Neighbors (KNN) based model for phishing attack detection. The proposed model detects phishing attack through URL classification. The performance of the proposed model is tested empirically and result is analyzed. Experimental result on test set reveals that the model is efficient on phishing attack detection. Furthermore, the K value that gives better accuracy is determined to achieve better performance on phishing attack detection. Overall, the average accuracy of the proposed model is 85.08%.


2018 ◽  
Vol 150 ◽  
pp. 06006 ◽  
Author(s):  
Rozlini Mohamed ◽  
Munirah Mohd Yusof ◽  
Noorhaniza Wahidi

Feature selection is a process to select the best feature among huge number of features in dataset, However, the problem in feature selection is to select a subset that give the better performs under some classifier. In producing better classification result, feature selection been applied in many of the classification works as part of preprocessing step; where only a subset of feature been used rather than the whole features from a particular dataset. This procedure not only can reduce the irrelevant features but in some cases able to increase classification performance due to finite sample size. In this study, Chi-Square (CH), Information Gain (IG) and Bat Algorithm (BA) are used to obtain the subset features on fourteen well-known dataset from various applications. To measure the performance of these selected features three benchmark classifier are used; k-Nearest Neighbor (kNN), Naïve Bayes (NB) and Decision Tree (DT). This paper then analyzes the performance of all classifiers with feature selection in term of accuracy, sensitivity, F-Measure and ROC. The objective of these study is to analyse the outperform feature selection techniques among conventional and heuristic techniques in various applications.


Information ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 12 ◽  
Author(s):  
Radmila Janković

Image classification is one of the most important tasks in the digital era. In terms of cultural heritage, it is important to develop classification methods that obtain good accuracy, but also are less computationally intensive, as image classification usually uses very large sets of data. This study aims to train and test four classification algorithms: (i) the multilayer perceptron, (ii) averaged one dependence estimators, (iii) forest by penalizing attributes, and (iv) the k-nearest neighbor rough sets and analogy based reasoning, and compares these with the results obtained from the Convolutional Neural Network (CNN). Three types of features were extracted from the images: (i) the edge histogram, (ii) the color layout, and (iii) the JPEG coefficients. The algorithms were tested before and after applying the attribute selection, and the results indicated that the best classification performance was obtained for the multilayer perceptron in both cases.


2020 ◽  
Vol 17 (1) ◽  
pp. 319-328
Author(s):  
Ade Muchlis Maulana Anwar ◽  
Prihastuti Harsani ◽  
Aries Maesya

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 779
Author(s):  
Ruriko Yoshida

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.


Author(s):  
Aishwarya Priyadarshini ◽  
Sanhita Mishra ◽  
Debani Prasad Mishra ◽  
Surender Reddy Salkuti ◽  
Ramakanta Mohanty

<p>Nowadays, fraudulent or deceitful activities associated with financial transactions, predominantly using credit cards have been increasing at an alarming rate and are one of the most prevalent activities in finance industries, corporate companies, and other government organizations. It is therefore essential to incorporate a fraud detection system that mainly consists of intelligent fraud detection techniques to keep in view the consumer and clients’ welfare alike. Numerous fraud detection procedures, techniques, and systems in literature have been implemented by employing a myriad of intelligent techniques including algorithms and frameworks to detect fraudulent and deceitful transactions. This paper initially analyses the data through exploratory data analysis and then proposes various classification models that are implemented using intelligent soft computing techniques to predictively classify fraudulent credit card transactions. Classification algorithms such as K-Nearest neighbor (K-NN), decision tree, random forest (RF), and logistic regression (LR) have been implemented to critically evaluate their performances. The proposed model is computationally efficient, light-weight and can be used for credit card fraudulent transaction detection with better accuracy.</p>


Author(s):  
Wei Yan

In cloud computing environments parallel kNN queries for big data is an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operator widely adopted by many applications including knowledge discovery, data mining, and spatial databases. This chapter proposes a parallel method of kNN queries for big data using MapReduce programming model. Firstly, this chapter proposes an approximate algorithm that is based on mapping multi-dimensional data sets into two-dimensional data sets, and transforming kNN queries into a sequence of two-dimensional point searches. Then, in two-dimensional space this chapter proposes a partitioning method using Voronoi diagram, which incorporates the Voronoi diagram into R-tree. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on R-tree using MapReduce programming model. Finally, this chapter presents the results of extensive experimental evaluations which indicate efficiency of the proposed approach.


Author(s):  
Wei Yan

Parallel queries of k Nearest Neighbor for massive spatial data are an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every point in another dataset R, is a useful tool widely adopted by many applications including knowledge discovery, data mining, and spatial databases. In cloud computing environments, MapReduce programming model is a well-accepted framework for data-intensive application over clusters of computers. This chapter proposes a parallel method of kNN queries based on clusters in MapReduce programming model. Firstly, this chapter proposes a partitioning method of spatial data using Voronoi diagram. Then, this chapter clusters the data point after partition using k-means method. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on k-means clusters using MapReduce programming model. Finally, extensive experiments evaluate the efficiency of the proposed approach.


Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2814 ◽  
Author(s):  
Xiaoguang Liu ◽  
Huanliang Li ◽  
Cunguang Lou ◽  
Tie Liang ◽  
Xiuling Liu ◽  
...  

Falls are the major cause of fatal and non-fatal injury among people aged more than 65 years. Due to the grave consequences of the occurrence of falls, it is necessary to conduct thorough research on falls. This paper presents a method for the study of fall detection using surface electromyography (sEMG) based on an improved dual parallel channels convolutional neural network (IDPC-CNN). The proposed IDPC-CNN model is designed to identify falls from daily activities using the spectral features of sEMG. Firstly, the classification accuracy of time domain features and spectrograms are compared using linear discriminant analysis (LDA), k-nearest neighbor (KNN) and support vector machine (SVM). Results show that spectrograms provide a richer way to extract pattern information and better classification performance. Therefore, the spectrogram features of sEMG are selected as the input of IDPC-CNN to distinguish between daily activities and falls. Finally, The IDPC-CNN is compared with SVM and three different structure CNNs under the same conditions. Experimental results show that the proposed IDPC-CNN achieves 92.55% accuracy, 95.71% sensitivity and 91.7% specificity. Overall, The IDPC-CNN is more effective than the comparison in accuracy, efficiency, training and generalization.


Sign in / Sign up

Export Citation Format

Share Document