Data Attribute Selection with Information Gain to Improve Credit Approval Classification Performance using K-Nearest Neighbor Algorithm

Credit is one of the modern economic behaviors. In practice, credit can be either borrowing a certain amount of money or purchasing goods with a gradual payment process and within an agreed timeframe. Economic conditions that are less supportive and high community needs make people choose to buy goods with this credit process. Unfortunately the high needs sometimes are not in line with the ability to make payments in accordance with the initial agreement. Such condition causes the payment process to be disrupted or also called the term “bad credit”. This research uses public data of credit card dataset from UCI repository and private data that is dataset of credit approval from local banking. The information gain algorithm is used to calculate the weights of each of the attributes. From the calculation results note that all attributes have different weights. This study resulted in the conclusion that not all data attributes influence the classification result. Suppose attribute A1 to UCI dataset as well as loan type attribute on local dataset that has information gain weight 0 (zero). The result of classification using K-Nearest Neighbors algorithm shows that there is an increase of 7.53% for UCI dataset and 3.26% for local dataset after feature selection on both datasets.

Download Full-text

K-Nearest Neighbor Based URL Identification Model for Phishing Attack Detection

Indian Journal of Artificial Intelligence and Neural Networking ◽

10.35940/ijainn.b1019.041221 ◽

2021 ◽

Vol 1 (2) ◽

pp. 18-21

Author(s):

Tsehay Admassu Assegie*

Keyword(s):

Nearest Neighbor ◽

Attack Detection ◽

Experimental Result ◽

Data Repository ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

K Value ◽

Proposed Model ◽

Public Data ◽

Public Data Repository

Phishing causes many problems in business industry. The electronic commerce and electronic banking such as mobile banking involves a number of online transaction. In such online transactions, we have to discriminate features related to legitimate and phishing websites in order to ensure security of the online transaction. In this study, we have collected data form phish tank public data repository and proposed K-Nearest Neighbors (KNN) based model for phishing attack detection. The proposed model detects phishing attack through URL classification. The performance of the proposed model is tested empirically and result is analyzed. Experimental result on test set reveals that the model is efficient on phishing attack detection. Furthermore, the K value that gives better accuracy is determined to achieve better performance on phishing attack detection. Overall, the average accuracy of the proposed model is 85.08%.

Download Full-text

K-Nearest Neighbor Based URL Identification Model for Phishing Attack Detection

Indian Journal of Artificial Intelligence and Neural Networking ◽

10.54105/ijainn.b1019.041221 ◽

2021 ◽

pp. 18-21

Author(s):

Tsehay Admassu Assegie ◽

Keyword(s):

Nearest Neighbor ◽

Attack Detection ◽

Experimental Result ◽

Data Repository ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

K Value ◽

Proposed Model ◽

Public Data ◽

Public Data Repository

Download Full-text

A Comparative Study of Feature Selection Techniques for Bat Algorithm in Various Applications

MATEC Web of Conferences ◽

10.1051/matecconf/201815006006 ◽

2018 ◽

Vol 150 ◽

pp. 06006 ◽

Cited By ~ 2

Author(s):

Rozlini Mohamed ◽

Munirah Mohd Yusof ◽

Noorhaniza Wahidi

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Information Gain ◽

Bat Algorithm ◽

Classification Performance ◽

Finite Sample ◽

K Nearest Neighbor ◽

Chi Square ◽

Finite Sample Size ◽

Feature Selection Techniques

Feature selection is a process to select the best feature among huge number of features in dataset, However, the problem in feature selection is to select a subset that give the better performs under some classifier. In producing better classification result, feature selection been applied in many of the classification works as part of preprocessing step; where only a subset of feature been used rather than the whole features from a particular dataset. This procedure not only can reduce the irrelevant features but in some cases able to increase classification performance due to finite sample size. In this study, Chi-Square (CH), Information Gain (IG) and Bat Algorithm (BA) are used to obtain the subset features on fourteen well-known dataset from various applications. To measure the performance of these selected features three benchmark classifier are used; k-Nearest Neighbor (kNN), Naïve Bayes (NB) and Decision Tree (DT). This paper then analyzes the performance of all classifiers with feature selection in term of accuracy, sensitivity, F-Measure and ROC. The objective of these study is to analyse the outperform feature selection techniques among conventional and heuristic techniques in various applications.

Download Full-text

Machine Learning Models for Cultural Heritage Image Classification: Comparison Based on Attribute Selection

Information ◽

10.3390/info11010012 ◽

2019 ◽

Vol 11 (1) ◽

pp. 12 ◽

Cited By ~ 3

Author(s):

Radmila Janković

Keyword(s):

Cultural Heritage ◽

Image Classification ◽

Multilayer Perceptron ◽

Nearest Neighbor ◽

Classification Performance ◽

Attribute Selection ◽

K Nearest Neighbor ◽

Large Sets ◽

Before And After ◽

Computationally Intensive

Image classification is one of the most important tasks in the digital era. In terms of cultural heritage, it is important to develop classification methods that obtain good accuracy, but also are less computationally intensive, as image classification usually uses very large sets of data. This study aims to train and test four classification algorithms: (i) the multilayer perceptron, (ii) averaged one dependence estimators, (iii) forest by penalizing attributes, and (iv) the k-nearest neighbor rough sets and analogy based reasoning, and compares these with the results obtained from the Convolutional Neural Network (CNN). Three types of features were extracted from the images: (i) the edge histogram, (ii) the color layout, and (iii) the JPEG coefficients. The algorithms were tested before and after applying the attribute selection, and the results indicated that the best classification performance was obtained for the multilayer perceptron in both cases.

Download Full-text

PENENTUAN DAERAH PRIORITAS PELAYANAN AKTA KELAHIRAN DENGAN METODE K-NN DAN K-MEANS

Komputasi: Jurnal Ilmiah Ilmu Komputer dan Matematika ◽

10.33751/komputasi.v17i1.1735 ◽

2020 ◽

Vol 17 (1) ◽

pp. 319-328

Author(s):

Ade Muchlis Maulana Anwar ◽

Prihastuti Harsani ◽

Aries Maesya

Keyword(s):

Nearest Neighbor ◽

Information Gain ◽

Birth Certificate ◽

Population Data ◽

Community Services ◽

Birth Certificates ◽

Similar Data ◽

K Nearest Neighbor ◽

Civil Registration ◽

The Family

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.

Download Full-text

Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

Mathematics ◽

10.3390/math9070779 ◽

2021 ◽

Vol 9 (7) ◽

pp. 779

Author(s):

Ruriko Yoshida

Keyword(s):

Supervised Learning ◽

Phylogenetic Trees ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

High Dimensional ◽

Learning Method ◽

Dimensional Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.

Download Full-text

Fraudulent credit card transaction detection using soft computing techniques

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i3.pp1634-1642 ◽

2021 ◽

Vol 23 (3) ◽

pp. 1634

Author(s):

Aishwarya Priyadarshini ◽

Sanhita Mishra ◽

Debani Prasad Mishra ◽

Surender Reddy Salkuti ◽

Ramakanta Mohanty

Keyword(s):

Soft Computing ◽

Credit Card ◽

Nearest Neighbor ◽

Detection System ◽

Fraud Detection ◽

K Nearest Neighbor ◽

Computationally Efficient ◽

Detection Techniques ◽

Government Organizations ◽

Soft Computing Techniques

<p>Nowadays, fraudulent or deceitful activities associated with financial transactions, predominantly using credit cards have been increasing at an alarming rate and are one of the most prevalent activities in finance industries, corporate companies, and other government organizations. It is therefore essential to incorporate a fraud detection system that mainly consists of intelligent fraud detection techniques to keep in view the consumer and clients’ welfare alike. Numerous fraud detection procedures, techniques, and systems in literature have been implemented by employing a myriad of intelligent techniques including algorithms and frameworks to detect fraudulent and deceitful transactions. This paper initially analyses the data through exploratory data analysis and then proposes various classification models that are implemented using intelligent soft computing techniques to predictively classify fraudulent credit card transactions. Classification algorithms such as K-Nearest neighbor (K-NN), decision tree, random forest (RF), and logistic regression (LR) have been implemented to critically evaluate their performances. The proposed model is computationally efficient, light-weight and can be used for credit card fraudulent transaction detection with better accuracy.</p>

Download Full-text

Parallel kNN Queries for Big Data Based on Voronoi Diagram Using MapReduce

Advances in Data Mining and Database Management - Handbook of Research on Innovative Database Query Processing Techniques ◽

10.4018/978-1-4666-8767-7.ch014 ◽

2015 ◽

pp. 392-414

Author(s):

Wei Yan

Keyword(s):

Big Data ◽

Voronoi Diagram ◽

Spatial Databases ◽

Nearest Neighbor ◽

Programming Model ◽

Dimensional Space ◽

Data Sets ◽

Two Dimensional ◽

K Nearest Neighbor ◽

K Nearest Neighbors

In cloud computing environments parallel kNN queries for big data is an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operator widely adopted by many applications including knowledge discovery, data mining, and spatial databases. This chapter proposes a parallel method of kNN queries for big data using MapReduce programming model. Firstly, this chapter proposes an approximate algorithm that is based on mapping multi-dimensional data sets into two-dimensional data sets, and transforming kNN queries into a sequence of two-dimensional point searches. Then, in two-dimensional space this chapter proposes a partitioning method using Voronoi diagram, which incorporates the Voronoi diagram into R-tree. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on R-tree using MapReduce programming model. Finally, this chapter presents the results of extensive experimental evaluations which indicate efficiency of the proposed approach.

Download Full-text

Parallel Queries of Cluster-Based k Nearest Neighbor in MapReduce

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Managing Big Data in Cloud Computing Environments ◽

10.4018/978-1-4666-9834-5.ch007 ◽

2016 ◽

pp. 163-182

Author(s):

Wei Yan

Keyword(s):

Spatial Data ◽

Spatial Databases ◽

Nearest Neighbor ◽

Programming Model ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Data Intensive ◽

Parallel Queries ◽

Massive Spatial Data ◽

Nearest Neighbor Queries

Parallel queries of k Nearest Neighbor for massive spatial data are an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every point in another dataset R, is a useful tool widely adopted by many applications including knowledge discovery, data mining, and spatial databases. In cloud computing environments, MapReduce programming model is a well-accepted framework for data-intensive application over clusters of computers. This chapter proposes a parallel method of kNN queries based on clusters in MapReduce programming model. Firstly, this chapter proposes a partitioning method of spatial data using Voronoi diagram. Then, this chapter clusters the data point after partition using k-means method. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on k-means clusters using MapReduce programming model. Finally, extensive experiments evaluate the efficiency of the proposed approach.

Download Full-text

A New Approach to Fall Detection Based on Improved Dual Parallel Channels Convolutional Neural Network

Sensors ◽

10.3390/s19122814 ◽

2019 ◽

Vol 19 (12) ◽

pp. 2814 ◽

Cited By ~ 2

Author(s):

Xiaoguang Liu ◽

Huanliang Li ◽

Cunguang Lou ◽

Tie Liang ◽

Xiuling Liu ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Nearest Neighbor ◽

Fall Detection ◽

Classification Performance ◽

Daily Activities ◽

Support Vector ◽

K Nearest Neighbor ◽

Linear Discriminant ◽

Parallel Channels

Falls are the major cause of fatal and non-fatal injury among people aged more than 65 years. Due to the grave consequences of the occurrence of falls, it is necessary to conduct thorough research on falls. This paper presents a method for the study of fall detection using surface electromyography (sEMG) based on an improved dual parallel channels convolutional neural network (IDPC-CNN). The proposed IDPC-CNN model is designed to identify falls from daily activities using the spectral features of sEMG. Firstly, the classification accuracy of time domain features and spectrograms are compared using linear discriminant analysis (LDA), k-nearest neighbor (KNN) and support vector machine (SVM). Results show that spectrograms provide a richer way to extract pattern information and better classification performance. Therefore, the spectrogram features of sEMG are selected as the input of IDPC-CNN to distinguish between daily activities and falls. Finally, The IDPC-CNN is compared with SVM and three different structure CNNs under the same conditions. Experimental results show that the proposed IDPC-CNN achieves 92.55% accuracy, 95.71% sensitivity and 91.7% specificity. Overall, The IDPC-CNN is more effective than the comparison in accuracy, efficiency, training and generalization.

Download Full-text