GEOMETRIC PROXIMITY GRAPHS FOR IMPROVING NEAREST NEIGHBOR METHODS IN INSTANCE-BASED LEARNING AND DATA MINING

2005 ◽  
Vol 15 (02) ◽  
pp. 101-150 ◽  
Author(s):  
GODFRIED TOUSSAINT

In the typical nonparametric approach to classification in instance-based learning and data mining, random data (the training set of patterns) are collected and used to design a decision rule (classifier). One of the most well known such rules is the k-nearest-neighbor decision rule (also known as lazy learning) in which an unknown pattern is classified into the majority class among its k nearest neighbors in the training set. Several questions related to this rule have received considerable attention over the years. Such questions include the following. How can the storage of the training set be reduced without degrading the performance of the decision rule? How should the reduced training set be selected to represent the different classes? How large should k be? How should the value of k be chosen? Should all k neighbors be equally weighted when used to decide the class of an unknown pattern? If not, how should the weights be chosen? Should all the features (attributes) we weighted equally and if not how should the feature weights be chosen? What distance metric should be used? How can the rule be made robust to overlapping classes or noise present in the training data? How can the rule be made invariant to scaling of the measurements? How can the nearest neighbors of a new point be computed efficiently? What is the smallest neural network that can implement nearest neighbor decision rules? Geometric proximity graphs such as Voronoi diagrams and their many relatives provide elegant solutions to these problems, as well as other related data mining problems such as outlier detection. After a non-exhaustive review of some of the classical canonical approaches to these problems, the methods that use proximity graphs are discussed, some new observations are made, and open problems are listed.

2015 ◽  
Vol 2 (1) ◽  
pp. 1
Author(s):  
Agung Nugroho ◽  
Kusrini Kusrini ◽  
M. Rudyanto Arief

Banyak faktor dan variabel yang mempengaruhi risiko kredit dalam pengambilan keputusan pada permasalahan Kredit Usaha Rakyat (KUR). Faktor-faktor yang digunakan sebagai dasar penilaian Kredit Usaha Rakyat pada PT.Bank Rakyat Indonesia Unit Kaliangkrik menggunakan prinsip dasar yang dikenal dengan prinsip “5 of Credit” yaitu Character, Capacity, Capital, Condition dan Collateral. Dari factor-faktor yang digunakan sebagai dasar penilaian kredit, digunakan metode Mining Classification Rule dalam membuat Sistem Pendukung Keputusan pemberian KUR. Terdapat beberapa algoritma yang dapat digunakan dalam data mining untuk metode klasifikasi salah satunya adalah algoritma k-nearest neightbor. Konsep sistem pendukung keputusan pemberian KUR ini dirancang dapat melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut dan memberikan solusi nasabah yang layak menerima KUR berdasarkan masukan dari user dengan menggunakan metode k-nearest neighbors (knn). Data-data transaksi pembayaran nasabah lama akan dijadikan sebagai data training dimana sebelumnya akan ditentukan kelasnya terlebih dahulu. Penentuan kelas dilakukan dengan proses klasifikasi data berdasarkan kategori status nasabah sesuai jumlah tunggakan pembayaran kreditnya. Dari hasil perhitungan kemiripan kasus antara data calon nasabah baru dengan nasabah lama atau data training menggunakan algoritma K-Nearest Neighbor, hasil dengan nilai tertinggi akan dijadikan acuan seorang decision maker dalam mengambil keputusan.Many factors and variables that affect credit risk in decision-making on issues People's Business Credit (KUR). The factors are used as the basis of assessment of the People's Business Credit Unit at PT Bank Rakyat Indonesia Kaliangkrik using basic principle known as the principle of "5 of Credit" ie Character, Capacity, Capital, Collateral Condition and. Of the factors that are used as a basis for credit assessment, Classification Rule Mining method used in making the administration of KUR Decision Support Systems. There are several algorithms that can be used in data mining for classification methods one of which is the k-nearest algorithm neightbor. The concept of the provision of decision support system is designed KUR can perform the classification of objects based on distance learning data that is closest to the object and provide a viable solution customers receive KUR based on input from the user by using the k-nearest neighbors (KNN). Payment transaction data will be used as a customer long training data which will be determined prior to first class. Grading is done with the data classification process based on customer status categories according to the amount of credit outstanding payments. From the calculation of the similarity between the case of data with prospective new customers or old customers training data using the K-Nearest Neighbor algorithm, the results with the highest scores will be used as a reference to a decision maker in making decisions.


Respati ◽  
2018 ◽  
Vol 13 (2) ◽  
Author(s):  
Eri Sasmita Susanto ◽  
Kusrini Kusrini ◽  
Hanif Al Fatta

INTISARIPenelitian ini difokuskan untuk mengetahui uji kelayakan prediksi kelulusan mahasiswa Universitas AMIKOM Yogyakarta. Dalam hal ini penulis memilih algoritma K-Nearest Neighbors (K-NN) karena K-Nearest Neighbors (K-NN) merupakan algoritma  yang bisa digunakan untuk mengolah data yang bersifat numerik dan tidak membutuhkan skema estimasi parameter perulangan yang rumit, ini berarti bisa diaplikasikan untuk dataset berukuran besar.Input dari sistem ini adalah Data sampel berupa data mahasiswa tahun 2014-2015. pengujian pada penelitian ini menggunakn dua pengujian yaitu data testing dan data training. Kriteria yang digunakan dalam penelitian ini adalah , IP Semester 1-4, capaian SKS, Status Kelulusan. Output dari sistem ini berupa hasil prediksi kelulusan mahasiswa yang terbagi menjadi dua yaitu tepat waktu dan kelulusan tidak tepat waktu.Hasil pengujian menunjukkan bahwa Berdasarkan penerapan k=14 dan k-fold=5 menghasilkan performa yang terbaik dalam memprediksi kelulusan mahasiswa dengan metode K-Nearest Neighbor menggunakan indeks prestasi 4 semester dengan nilai akurasi= 98,46%, precision= 99.53% dan recall =97.64%.Kata kunci: Algoritma K-Nearest Neighbors, Prediksi Kelulusan, Data Testing, Data Training ABSTRACTThis research is focused on knowing the feasibility test of students' graduation prediction of AMIKOM University Yogyakarta. In this case the authors chose the K-Nearest Neighbors (K-NN) algorithm because K-Nearest Neighbors (K-NN) is an algorithm that can be used to process data that is numerical and does not require complicated repetitive parameter estimation scheme, this means it can be applied for large datasets.The input of this system is the sample data in the form of student data from 2014-2015. test in this research use two test that is data testing and training data. The criteria used in this study are, IP Semester 1-4, achievement of SKS, Graduation Status. The output of this system in the form of predicted results of student graduation which is divided into two that is timely and graduation is not timely.The result of the test shows that based on the application of k = 14 and k-fold = 5, the best performance in predicting the students' graduation using K-Nearest Neighbor method uses 4 semester achievement index with accuracy value = 98,46%, precision = 99.53% and recall = 97.64%.Keywords: K-Nearest Neighbors Algorithm, Graduation Prediction, Testing Data, Training Data


2019 ◽  
Vol 5 ◽  
pp. e194 ◽  
Author(s):  
Hyukjun Gweon ◽  
Matthias Schonlau ◽  
Stefan H. Steiner

The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap.


Author(s):  
Andri Wijaya ◽  
Abba Suganda Girsang

This  article  discusses  the  analysis  of  customer  loyalty  using  three  data  mining  methods:  C4.5,Naive Bayes, and Nearest Neighbor Algorithms and real-world  empirical  data.  The  data  contain  ten  attributes related to the customer loyalty and are obtained from a national  multimedia  company  in  Indonesia.  The  dataset contains 2269 records. The study also evaluates the effects of  the  size  of  the  training  data  to  the  accuracy  of  the classification.  The  results  suggest  that  C4.5  algorithm produces   highest classification   accuracy   at   the   order of  81%  followed  by  the  methods  of  Naive  Bayes  76% and  Nearest  Neighbor  55%.  In  addition,  the  numerical evaluation  also  suggests  that  the  proportion  of  80%  is optimal  for  the  training  set.


2021 ◽  
Vol 40 (1) ◽  
pp. 521-533
Author(s):  
Junhai Zhai ◽  
Jiaxing Qi ◽  
Sufang Zhang

The condensed nearest neighbor (CNN) is a pioneering instance selection algorithm for 1-nearest neighbor. Many variants of CNN for K-nearest neighbor have been proposed by different researchers. However, few studies were conducted on condensed fuzzy K-nearest neighbor. In this paper, we present a condensed fuzzy K-nearest neighbor (CFKNN) algorithm that starts from an initial instance set S and iteratively selects informative instances from training set T, moving them from T to S. Specifically, CFKNN consists of three steps. First, for each instance x ∈ T, it finds the K-nearest neighbors in S and calculates the fuzzy membership degrees of the K nearest neighbors using S rather than T. Second it computes the fuzzy membership degrees of x using the fuzzy K-nearest neighbor algorithm. Finally, it calculates the information entropy of x and selects an instance according to the calculated value. Extensive experiments on 11 datasets are conducted to compare CFKNN with four state-of-the-art algorithms (CNN, edited nearest neighbor (ENN), Tomeklinks, and OneSidedSelection) regarding the number of selected instances, the testing accuracy, and the compression ratio. The experimental results show that CFKNN provides excellent performance and outperforms the other four algorithms.


JURTEKSI ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. 195-202
Author(s):  
Sri Ayu Rizky ◽  
Rolly Yesputra ◽  
Santoso Santoso

Abstract: In this research, a prediction system has been successfully developed to predict whether or not a prospective money borrower will run smoothly. Prospective borrowers who will borrow, some of the data that meet the criteria will be inputted by the office clerk into a prediction application system interface to be processed using the Data Mining method, namely the K-Nearest Neighbor Algorithm with the Codeigniter programming language 3. The results of the Euclidean calculation process are based on predetermined criteria Between training data (training) to testing data (test) will be displayed with a table that has been sorted from smallest to largest containing 9 closest neighbors according to the K value that has been determined, namely 9. The nine neighbors will be taken the dominant category. This dominant category can be used as a guideline that makes it easier for the leader to make a decision on the next borrower.            Keywords: Data Mining; Euclidean; K-Nearest Neighbor; Prospective Borrowers;  Abstrak: Dalam penelitian ini telah berhasil dibuat sebuah sistem prediksi untuk memprediksi lancar atau tidak lancarnya seorang calon peminjam uang. Calon peminjam uang yang akan meminjam, sebagian datanya yang memenuhi kriteria akan diinputkan petugas kantor ke dalam sebuah interface sistem aplikasi prediksi untuk diolah menggunakan metode Data Mining yaitu Algoritma K-Nearest Neighbor dengan bahasa pemrograman Codeigniter 3. Hasil proses perhitungan Euclidean berdasarkan kriteria yang sudah ditentukan antara data training (latih) ke data testing (uji) tersebut akan ditampilkan dengan sebuah tabel yang sudah diurutkan dari yang terkecil ke terbesar berisi 9 tetangga terdekat sesuai dengan nilai K yang sudah ditentukan yaitu 9.  Sembilan tetangga tersebut akan diambil kategori yang dominan. Kategori yang dominan tersebut bisa dijadikan suatu pedoman yang memudahkan pimpinan dalam mengambil sebuah keputusan kepada calon peminjam selanjutnya. Kata kunci: Debitur; Data Mining; Euclidean; K-Nearest Neighbor


Author(s):  
I Wayan Agus Surya Darma

Balinese script is an important aspect that packs the Balinese culture from time to time which continues to experience development along with technological advances. Balinese script consists of three types (1) Wrésastra, (2) Swalalita and (3) Modre which have different types of characters. The Wrésastra and Swalalita script are Balinese scripts which grouped into the script criteria that are used to write in the field of everyday life. In this research, the zoning method will be implemented in the feature extraction process to produce special features owned by Balinese script. The results of the feature extraction process will produce special features owned by Balinese script which will be used in the classification process to recognize the character of Balinese script. Special features are produced using the zoning method, it will divide the image characters area of ??Balinese scripts into several regions, to enrich the features of each Balinese script. The result of feature extractions is stored as training data that will be used in the classification process. K-Nearest Neighbors is implemented in the special feature classification process that is owned by the character of Balinese script. Based on the results of the test, the highest level of accuracy was obtained using the value K=3 and reference=10 with the accuracy of Balinese script recognition 97.5%.


2021 ◽  
Vol 3 (2) ◽  
pp. 140-148
Author(s):  
Hermanto Hermanto

Currently, the problem of college failure, its on-time graduation, and the factors that cause it is still an interesting research topic (C. Marquez-Vera, C. Romero and S. Ventura, 2011). This study compares three data mining classification algorithms namely Naive Bayes, Decision Tree and K-Nearest Neighbor to predict graduation and dropout risk for students to improve the quality of higher education and the most accurate algorithms to use Prepare graduation and dropout prediction Student studies. The best algorithm for predicting graduation and dropout is the decision tree with the best accuracy value of 99.15% with a training data ratio of 30%. Keyword : Data Mining; Algoritma Naive Bayes; Decision Tree; K-Nearest Neighbor; Predict Graduation; Drop Out.


Author(s):  
Tikaridha Hardiani

The students of Universitas ‘Aisyiyah Yogyakarta have been increasing including the number of students in the Faculty of Health Sciences. In 2016 the total number of UNISA students was 1851. The increasing number of students every year leads to great numbers of data stored in the university database. The data provide useful information for the university to predict student graduation or student study period whether they graduate on time with a study period of 4 years or late with a study period of more than 4 years. This can be processed by using a data mining technique that is the classification technique. Data needed in the classification technique are data of students who have graduated as training data and data of students who are still studying in the university as testing data. The training data were 501 records with 10 goals and the testing data were 428 records. Data mining process method used was the Cross-Industry Standard Prosses for Data Mining (CRISPDM). The algorithms used in this study were Naive Bayes, K-Nearest Neighbor (KNN) and Decision Tree. The three algorithms were compared to see the accuracy by using Rapidminer software. Based on the accuracy, it was found that the K-NN algorithm was the best in predicting student graduation with an accuracy of 91.82%. The K-NN algorithm showed that 100% of the students of Nursing study program of Universitas Aisyiyah Yogyakarta are predicted to graduate on time.


2015 ◽  
Vol 1 (4) ◽  
pp. 270
Author(s):  
Muhammad Syukri Mustafa ◽  
I. Wayan Simpen

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.


Sign in / Sign up

Export Citation Format

Share Document