Efficient k-Nearest Neighbor Searches for Parallel Multidimensional Index Structures

We present a multi-dimensional indexing approach for fast sequence similarity search in DNA and protein databases. In particular, we propose effective transformations of subsequences into numerical vector domains and build efficient index structures on the transformed vectors. We then define distance functions in the transformed domain and examine properties of these functions. We experimentally compared their (a) approximation quality for k-Nearest Neighbor (k-NN) queries and both (b) pruning ability and (c) approximation quality for ε-range queries. Results for k-NN queries, which we present here, show that our proposed distances FD2 and WD2 (i.e. Frequency and Wavelet Distance functions for 2-grams) perform significantly better than the others. We then develop effective index structures, based on R-trees and scalar quantization, on top of transformed vectors and distance functions. Promising results from the experiments on real biosequence data sets are presented.

Download Full-text

Navigating K-Nearest Neighbor Graphs to Solve Nearest Neighbor Searches

Advances in Pattern Recognition - Lecture Notes in Computer Science ◽

10.1007/978-3-642-15992-3_29 ◽

2010 ◽

pp. 270-280 ◽

Cited By ~ 4

Author(s):

Edgar Chávez ◽

Eric Sadit Tellez

Keyword(s):

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Searches

Download Full-text

Machine Learning Verdict of EEG Signals in Brain Computer Interface

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1838114 ◽

2018 ◽

pp. 429-441

Author(s):

M. Jeyanthi ◽

C. Velayutham

Keyword(s):

Nearest Neighbor ◽

Technology Development ◽

Vital Role ◽

Svm Classifier ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Data Set ◽

Eeg Data ◽

Irrelevant Attributes

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.

Download Full-text

PENENTUAN DAERAH PRIORITAS PELAYANAN AKTA KELAHIRAN DENGAN METODE K-NN DAN K-MEANS

Komputasi: Jurnal Ilmiah Ilmu Komputer dan Matematika ◽

10.33751/komputasi.v17i1.1735 ◽

2020 ◽

Vol 17 (1) ◽

pp. 319-328

Author(s):

Ade Muchlis Maulana Anwar ◽

Prihastuti Harsani ◽

Aries Maesya

Keyword(s):

Nearest Neighbor ◽

Information Gain ◽

Birth Certificate ◽

Population Data ◽

Community Services ◽

Birth Certificates ◽

Similar Data ◽

K Nearest Neighbor ◽

Civil Registration ◽

The Family

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.

Download Full-text