GEOMETRIC PROXIMITY GRAPHS FOR IMPROVING NEAREST NEIGHBOR METHODS IN INSTANCE-BASED LEARNING AND DATA MINING

In the typical nonparametric approach to classification in instance-based learning and data mining, random data (the training set of patterns) are collected and used to design a decision rule (classifier). One of the most well known such rules is the k-nearest-neighbor decision rule (also known as lazy learning) in which an unknown pattern is classified into the majority class among its k nearest neighbors in the training set. Several questions related to this rule have received considerable attention over the years. Such questions include the following. How can the storage of the training set be reduced without degrading the performance of the decision rule? How should the reduced training set be selected to represent the different classes? How large should k be? How should the value of k be chosen? Should all k neighbors be equally weighted when used to decide the class of an unknown pattern? If not, how should the weights be chosen? Should all the features (attributes) we weighted equally and if not how should the feature weights be chosen? What distance metric should be used? How can the rule be made robust to overlapping classes or noise present in the training data? How can the rule be made invariant to scaling of the measurements? How can the nearest neighbors of a new point be computed efficiently? What is the smallest neural network that can implement nearest neighbor decision rules? Geometric proximity graphs such as Voronoi diagrams and their many relatives provide elegant solutions to these problems, as well as other related data mining problems such as outlier detection. After a non-exhaustive review of some of the classical canonical approaches to these problems, the methods that use proximity graphs are discussed, some new observations are made, and open problems are listed.

Download Full-text

Sistem Pendukung Keputusan Kredit Usaha Rakyat PT. Bank Rakyat Indonesia Unit Kaliangkrik Magelang

Creative Information Technology Journal ◽

10.24076/citec.2014v2i1.33 ◽

2015 ◽

Vol 2 (1) ◽

pp. 1

Author(s):

Agung Nugroho ◽

Kusrini Kusrini ◽

M. Rudyanto Arief

Keyword(s):

Data Mining ◽

Decision Support ◽

Decision Maker ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Training Data ◽

Classification Rule ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Viable Solution

Banyak faktor dan variabel yang mempengaruhi risiko kredit dalam pengambilan keputusan pada permasalahan Kredit Usaha Rakyat (KUR). Faktor-faktor yang digunakan sebagai dasar penilaian Kredit Usaha Rakyat pada PT.Bank Rakyat Indonesia Unit Kaliangkrik menggunakan prinsip dasar yang dikenal dengan prinsip “5 of Credit” yaitu Character, Capacity, Capital, Condition dan Collateral. Dari factor-faktor yang digunakan sebagai dasar penilaian kredit, digunakan metode Mining Classification Rule dalam membuat Sistem Pendukung Keputusan pemberian KUR. Terdapat beberapa algoritma yang dapat digunakan dalam data mining untuk metode klasifikasi salah satunya adalah algoritma k-nearest neightbor. Konsep sistem pendukung keputusan pemberian KUR ini dirancang dapat melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut dan memberikan solusi nasabah yang layak menerima KUR berdasarkan masukan dari user dengan menggunakan metode k-nearest neighbors (knn). Data-data transaksi pembayaran nasabah lama akan dijadikan sebagai data training dimana sebelumnya akan ditentukan kelasnya terlebih dahulu. Penentuan kelas dilakukan dengan proses klasifikasi data berdasarkan kategori status nasabah sesuai jumlah tunggakan pembayaran kreditnya. Dari hasil perhitungan kemiripan kasus antara data calon nasabah baru dengan nasabah lama atau data training menggunakan algoritma K-Nearest Neighbor, hasil dengan nilai tertinggi akan dijadikan acuan seorang decision maker dalam mengambil keputusan.Many factors and variables that affect credit risk in decision-making on issues People's Business Credit (KUR). The factors are used as the basis of assessment of the People's Business Credit Unit at PT Bank Rakyat Indonesia Kaliangkrik using basic principle known as the principle of "5 of Credit" ie Character, Capacity, Capital, Collateral Condition and. Of the factors that are used as a basis for credit assessment, Classification Rule Mining method used in making the administration of KUR Decision Support Systems. There are several algorithms that can be used in data mining for classification methods one of which is the k-nearest algorithm neightbor. The concept of the provision of decision support system is designed KUR can perform the classification of objects based on distance learning data that is closest to the object and provide a viable solution customers receive KUR based on input from the user by using the k-nearest neighbors (KNN). Payment transaction data will be used as a customer long training data which will be determined prior to first class. Grading is done with the data classification process based on customer status categories according to the amount of credit outstanding payments. From the calculation of the similarity between the case of data with prospective new customers or old customers training data using the K-Nearest Neighbor algorithm, the results with the highest scores will be used as a reference to a decision maker in making decisions.

Download Full-text

PREDIKSI KELULUSAN MAHASISWA MAGISTER TEKNIK INFORMATIKA UNIVERSITAS AMIKOM YOGYAKARTA MENGGUNAKAN METODE K-NEAREST NEIGHBOR

Respati ◽

10.35842/jtir.v13i2.260 ◽

2018 ◽

Vol 13 (2) ◽

Author(s):

Eri Sasmita Susanto ◽

Kusrini Kusrini ◽

Hanif Al Fatta

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbors ◽

Training Data ◽

K Nearest Neighbor ◽

Process Data ◽

K Nearest Neighbors ◽

Testing Data ◽

Estimation Scheme ◽

Student Graduation ◽

Feasibility Test

INTISARIPenelitian ini difokuskan untuk mengetahui uji kelayakan prediksi kelulusan mahasiswa Universitas AMIKOM Yogyakarta. Dalam hal ini penulis memilih algoritma K-Nearest Neighbors (K-NN) karena K-Nearest Neighbors (K-NN) merupakan algoritma yang bisa digunakan untuk mengolah data yang bersifat numerik dan tidak membutuhkan skema estimasi parameter perulangan yang rumit, ini berarti bisa diaplikasikan untuk dataset berukuran besar.Input dari sistem ini adalah Data sampel berupa data mahasiswa tahun 2014-2015. pengujian pada penelitian ini menggunakn dua pengujian yaitu data testing dan data training. Kriteria yang digunakan dalam penelitian ini adalah , IP Semester 1-4, capaian SKS, Status Kelulusan. Output dari sistem ini berupa hasil prediksi kelulusan mahasiswa yang terbagi menjadi dua yaitu tepat waktu dan kelulusan tidak tepat waktu.Hasil pengujian menunjukkan bahwa Berdasarkan penerapan k=14 dan k-fold=5 menghasilkan performa yang terbaik dalam memprediksi kelulusan mahasiswa dengan metode K-Nearest Neighbor menggunakan indeks prestasi 4 semester dengan nilai akurasi= 98,46%, precision= 99.53% dan recall =97.64%.Kata kunci: Algoritma K-Nearest Neighbors, Prediksi Kelulusan, Data Testing, Data Training ABSTRACTThis research is focused on knowing the feasibility test of students' graduation prediction of AMIKOM University Yogyakarta. In this case the authors chose the K-Nearest Neighbors (K-NN) algorithm because K-Nearest Neighbors (K-NN) is an algorithm that can be used to process data that is numerical and does not require complicated repetitive parameter estimation scheme, this means it can be applied for large datasets.The input of this system is the sample data in the form of student data from 2014-2015. test in this research use two test that is data testing and training data. The criteria used in this study are, IP Semester 1-4, achievement of SKS, Graduation Status. The output of this system in the form of predicted results of student graduation which is divided into two that is timely and graduation is not timely.The result of the test shows that based on the application of k = 14 and k-fold = 5, the best performance in predicting the students' graduation using K-Nearest Neighbor method uses 4 semester achievement index with accuracy value = 98,46%, precision = 99.53% and recall = 97.64%.Keywords: K-Nearest Neighbors Algorithm, Graduation Prediction, Testing Data, Training Data

Download Full-text

The k conditional nearest neighbor algorithm for classification and class probability estimation

PeerJ Computer Science ◽

10.7717/peerj-cs.194 ◽

2019 ◽

Vol 5 ◽

pp. e194 ◽

Cited By ~ 2

Author(s):

Hyukjun Gweon ◽

Matthias Schonlau ◽

Stefan H. Steiner

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbors ◽

Training Data ◽

Data Sets ◽

Posterior Probabilities ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Benchmark Data ◽

Nonparametric Classification ◽

Class Probability

The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap.

Download Full-text

Use of Data Mining for Prediction of Customer Loyalty

CommIT (Communication and Information Technology) Journal ◽

10.21512/commit.v10i1.1660 ◽

2015 ◽

Vol 10 (1) ◽

pp. 41 ◽

Cited By ~ 3

Author(s):

Andri Wijaya ◽

Abba Suganda Girsang

Keyword(s):

Data Mining ◽

Customer Loyalty ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Training Set ◽

Use Of Data ◽

C4.5 Algorithm

This article discusses the analysis of customer loyalty using three data mining methods: C4.5,Naive Bayes, and Nearest Neighbor Algorithms and real-world empirical data. The data contain ten attributes related to the customer loyalty and are obtained from a national multimedia company in Indonesia. The dataset contains 2269 records. The study also evaluates the effects of the size of the training data to the accuracy of the classification. The results suggest that C4.5 algorithm produces highest classification accuracy at the order of 81% followed by the methods of Naive Bayes 76% and Nearest Neighbor 55%. In addition, the numerical evaluation also suggests that the proportion of 80% is optimal for the training set.

Download Full-text

An instance selection algorithm for fuzzy K-nearest neighbor

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200124 ◽

2021 ◽

Vol 40 (1) ◽

pp. 521-533

Author(s):

Junhai Zhai ◽

Jiaxing Qi ◽

Sufang Zhang

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbors ◽

Fuzzy Membership ◽

Instance Selection ◽

Selection Algorithm ◽

K Nearest Neighbor ◽

Training Set ◽

K Nearest Neighbors ◽

K Nearest Neighbor Algorithm ◽

Testing Accuracy

The condensed nearest neighbor (CNN) is a pioneering instance selection algorithm for 1-nearest neighbor. Many variants of CNN for K-nearest neighbor have been proposed by different researchers. However, few studies were conducted on condensed fuzzy K-nearest neighbor. In this paper, we present a condensed fuzzy K-nearest neighbor (CFKNN) algorithm that starts from an initial instance set S and iteratively selects informative instances from training set T, moving them from T to S. Specifically, CFKNN consists of three steps. First, for each instance x ∈ T, it finds the K-nearest neighbors in S and calculates the fuzzy membership degrees of the K nearest neighbors using S rather than T. Second it computes the fuzzy membership degrees of x using the fuzzy K-nearest neighbor algorithm. Finally, it calculates the information entropy of x and selects an instance according to the calculated value. Extensive experiments on 11 datasets are conducted to compare CFKNN with four state-of-the-art algorithms (CNN, edited nearest neighbor (ENN), Tomeklinks, and OneSidedSelection) regarding the number of selected instances, the testing accuracy, and the compression ratio. The experimental results show that CFKNN provides excellent performance and outperforms the other four algorithms.

Download Full-text

PREDIKSI KELANCARAN PEMBAYARAN CICILAN CALON DEBITUR DENGAN METODE K-NEAREST NEIGHBOR

JURTEKSI ◽

10.33330/jurteksi.v7i2.1078 ◽

2021 ◽

Vol 7 (2) ◽

pp. 195-202

Author(s):

Sri Ayu Rizky ◽

Rolly Yesputra ◽

Santoso Santoso

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Training Data ◽

Mining Method ◽

K Nearest Neighbor ◽

Application System ◽

K Value ◽

Testing Data ◽

Calculation Process ◽

K Nearest Neighbor Algorithm

Abstract: In this research, a prediction system has been successfully developed to predict whether or not a prospective money borrower will run smoothly. Prospective borrowers who will borrow, some of the data that meet the criteria will be inputted by the office clerk into a prediction application system interface to be processed using the Data Mining method, namely the K-Nearest Neighbor Algorithm with the Codeigniter programming language 3. The results of the Euclidean calculation process are based on predetermined criteria Between training data (training) to testing data (test) will be displayed with a table that has been sorted from smallest to largest containing 9 closest neighbors according to the K value that has been determined, namely 9. The nine neighbors will be taken the dominant category. This dominant category can be used as a guideline that makes it easier for the leader to make a decision on the next borrower. Keywords: Data Mining; Euclidean; K-Nearest Neighbor; Prospective Borrowers; Abstrak: Dalam penelitian ini telah berhasil dibuat sebuah sistem prediksi untuk memprediksi lancar atau tidak lancarnya seorang calon peminjam uang. Calon peminjam uang yang akan meminjam, sebagian datanya yang memenuhi kriteria akan diinputkan petugas kantor ke dalam sebuah interface sistem aplikasi prediksi untuk diolah menggunakan metode Data Mining yaitu Algoritma K-Nearest Neighbor dengan bahasa pemrograman Codeigniter 3. Hasil proses perhitungan Euclidean berdasarkan kriteria yang sudah ditentukan antara data training (latih) ke data testing (uji) tersebut akan ditampilkan dengan sebuah tabel yang sudah diurutkan dari yang terkecil ke terbesar berisi 9 tetangga terdekat sesuai dengan nilai K yang sudah ditentukan yaitu 9. Sembilan tetangga tersebut akan diambil kategori yang dominan. Kategori yang dominan tersebut bisa dijadikan suatu pedoman yang memudahkan pimpinan dalam mengambil sebuah keputusan kepada calon peminjam selanjutnya. Kata kunci: Debitur; Data Mining; Euclidean; K-Nearest Neighbor

Download Full-text

Implementation of Zoning and K-Nearest Neighbor in Character Recognition of Wrésastra Script

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2019.v10.i01.p02 ◽

2019 ◽

pp. 9 ◽

Cited By ~ 1

Author(s):

I Wayan Agus Surya Darma

Keyword(s):

Feature Extraction ◽

Character Recognition ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Extraction Process ◽

Training Data ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Technological Advances ◽

Different Types

Balinese script is an important aspect that packs the Balinese culture from time to time which continues to experience development along with technological advances. Balinese script consists of three types (1) Wrésastra, (2) Swalalita and (3) Modre which have different types of characters. The Wrésastra and Swalalita script are Balinese scripts which grouped into the script criteria that are used to write in the field of everyday life. In this research, the zoning method will be implemented in the feature extraction process to produce special features owned by Balinese script. The results of the feature extraction process will produce special features owned by Balinese script which will be used in the classification process to recognize the character of Balinese script. Special features are produced using the zoning method, it will divide the image characters area of ??Balinese scripts into several regions, to enrich the features of each Balinese script. The result of feature extractions is stored as training data that will be used in the classification process. K-Nearest Neighbors is implemented in the special feature classification process that is owned by the character of Balinese script. Based on the results of the test, the highest level of accuracy was obtained using the value K=3 and reference=10 with the accuracy of Balinese script recognition 97.5%.

Download Full-text

Prediksi Kelulusan dan Putus Studi Mahasiswa dengan Pendekatan Bertingkat pada Perguruan Tinggi

SIMADA (Jurnal Sistem Informasi & Manajemen Basis Data) ◽

10.30873/simada.v3i2.2359 ◽

2021 ◽

Vol 3 (2) ◽

pp. 140-148

Author(s):

Hermanto Hermanto

Keyword(s):

Data Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Drop Out ◽

Training Data ◽

K Nearest Neighbor ◽

Quality Of Higher Education

Currently, the problem of college failure, its on-time graduation, and the factors that cause it is still an interesting research topic (C. Marquez-Vera, C. Romero and S. Ventura, 2011). This study compares three data mining classification algorithms namely Naive Bayes, Decision Tree and K-Nearest Neighbor to predict graduation and dropout risk for students to improve the quality of higher education and the most accurate algorithms to use Prepare graduation and dropout prediction Student studies. The best algorithm for predicting graduation and dropout is the decision tree with the best accuracy value of 99.15% with a training data ratio of 30%. Keyword : Data Mining; Algoritma Naive Bayes; Decision Tree; K-Nearest Neighbor; Predict Graduation; Drop Out.

Download Full-text

Comparison of Naive Bayes Method, K-NN (K-Nearest Neighbor) and Decision Tree for Predicting the Graduation of ‘Aisyiyah University Students of Yogyakarta

International Journal of Health Science and Technology ◽

10.31101/ijhst.v2i1.1829 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Tikaridha Hardiani

Keyword(s):

Data Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Training Data ◽

K Nearest Neighbor ◽

Classification Technique ◽

Testing Data ◽

Student Graduation ◽

The University

The students of Universitas ‘Aisyiyah Yogyakarta have been increasing including the number of students in the Faculty of Health Sciences. In 2016 the total number of UNISA students was 1851. The increasing number of students every year leads to great numbers of data stored in the university database. The data provide useful information for the university to predict student graduation or student study period whether they graduate on time with a study period of 4 years or late with a study period of more than 4 years. This can be processed by using a data mining technique that is the classification technique. Data needed in the classification technique are data of students who have graduated as training data and data of students who are still studying in the university as testing data. The training data were 501 records with 10 goals and the testing data were 428 records. Data mining process method used was the Cross-Industry Standard Prosses for Data Mining (CRISPDM). The algorithms used in this study were Naive Bayes, K-Nearest Neighbor (KNN) and Decision Tree. The three algorithms were compared to see the accuracy by using Rapidminer software. Based on the accuracy, it was found that the K-NN algorithm was the best in predicting student graduation with an accuracy of 91.82%. The K-NN algorithm showed that 100% of the students of Nursing study program of Universitas Aisyiyah Yogyakarta are predicted to graduate on time.

Download Full-text

Perancangan Aplikasi Prediksi Kelulusan Tepat Waktu Bagi Mahasiswa Baru Dengan Teknik Data Mining (Studi Kasus: Data Akademik Mahasiswa STMIK Dipanegara Makassar)

Creative Information Technology Journal ◽

10.24076/citec.2014v1i4.27 ◽

2015 ◽

Vol 1 (4) ◽

pp. 270

Author(s):

Muhammad Syukri Mustafa ◽

I. Wayan Simpen

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Test Results ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Sample Data ◽

New Students ◽

K Nearest Neighbor Algorithm ◽

Using Data ◽

Existing Data

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.

Download Full-text