OP-KNN: Method and Applications

This paper presents a methodology named Optimally Pruned K-Nearest Neighbors (OP-KNNs) which has the advantage of competing with state-of-the-art methods while remaining fast. It builds a one hidden-layer feedforward neural network using K-Nearest Neighbors as kernels to perform regression. Multiresponse Sparse Regression (MRSR) is used in order to rank each kth nearest neighbor and finally Leave-One-Out estimation is used to select the optimal number of neighbors and to estimate the generalization performances. Since computational time of this method is small, this paper presents a strategy using OP-KNN to perform Variable Selection which is tested successfully on eight real-life data sets from different application fields. In summary, the most significant characteristic of this method is that it provides good performance and a comparatively simple model at extremely high-learning speed.

Download Full-text

Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

Mathematics ◽

10.3390/math9070779 ◽

2021 ◽

Vol 9 (7) ◽

pp. 779

Author(s):

Ruriko Yoshida

Keyword(s):

Supervised Learning ◽

Phylogenetic Trees ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

High Dimensional ◽

Learning Method ◽

Dimensional Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.

Download Full-text

A Reformed K-Nearest Neighbors Algorithm for Big Data Sets

Journal of Computer Science ◽

10.3844/jcssp.2018.1213.1225 ◽

2018 ◽

Vol 14 (9) ◽

pp. 1213-1225 ◽

Cited By ~ 2

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Big Data ◽

Nearest Neighbors ◽

Data Sets ◽

K Nearest Neighbors

Download Full-text

Parallel kNN Queries for Big Data Based on Voronoi Diagram Using MapReduce

Advances in Data Mining and Database Management - Handbook of Research on Innovative Database Query Processing Techniques ◽

10.4018/978-1-4666-8767-7.ch014 ◽

2015 ◽

pp. 392-414

Author(s):

Wei Yan

Keyword(s):

Big Data ◽

Voronoi Diagram ◽

Spatial Databases ◽

Nearest Neighbor ◽

Programming Model ◽

Dimensional Space ◽

Data Sets ◽

Two Dimensional ◽

K Nearest Neighbor ◽

K Nearest Neighbors

In cloud computing environments parallel kNN queries for big data is an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operator widely adopted by many applications including knowledge discovery, data mining, and spatial databases. This chapter proposes a parallel method of kNN queries for big data using MapReduce programming model. Firstly, this chapter proposes an approximate algorithm that is based on mapping multi-dimensional data sets into two-dimensional data sets, and transforming kNN queries into a sequence of two-dimensional point searches. Then, in two-dimensional space this chapter proposes a partitioning method using Voronoi diagram, which incorporates the Voronoi diagram into R-tree. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on R-tree using MapReduce programming model. Finally, this chapter presents the results of extensive experimental evaluations which indicate efficiency of the proposed approach.

Download Full-text

Efficient Shared Execution Processing of k-Nearest Neighbor Joins in Road Networks

Mobile Information Systems ◽

10.1155/2018/1243289 ◽

2018 ◽

Vol 2018 ◽

pp. 1-17 ◽

Cited By ~ 1

Author(s):

Hyung-Ju Cho

Keyword(s):

Euclidean Distance ◽

Nearest Neighbor ◽

Real Life ◽

Road Networks ◽

Nearest Neighbors ◽

Superior Performance ◽

K Nearest Neighbor ◽

Wide Range ◽

Primitive Operation ◽

Nested Loop

We investigate the k-nearest neighbor (kNN) join in road networks to determine the k-nearest neighbors (NNs) from a dataset S to every object in another dataset R. The kNN join is a primitive operation and is widely used in many data mining applications. However, it is an expensive operation because it combines the kNN query and the join operation, whereas most existing methods assume the use of the Euclidean distance metric. We alternatively consider the problem of processing kNN joins in road networks where the distance between two points is the length of the shortest path connecting them. We propose a shared execution-based approach called the group-nested loop (GNL) method that can efficiently evaluate kNN joins in road networks by exploiting grouping and shared execution. The GNL method can be easily implemented using existing kNN query algorithms. Extensive experiments using several real-life roadmaps confirm the superior performance and effectiveness of the proposed method in a wide range of problem settings.

Download Full-text

An Efficient Clustering Method for Hyperspectral Optimal Band Selection via Shared Nearest Neighbor

Remote Sensing ◽

10.3390/rs11030350 ◽

2019 ◽

Vol 11 (3) ◽

pp. 350 ◽

Cited By ~ 7

Author(s):

Qiang Li ◽

Qi Wang ◽

Xuelong Li

Keyword(s):

Nearest Neighbor ◽

Hyperspectral Image ◽

Local Density ◽

Computational Time ◽

Band Selection ◽

Data Sets ◽

Selection Methods ◽

Clustering Method ◽

Slope Change ◽

Shared Nearest Neighbor

A hyperspectral image (HSI) has many bands, which leads to high correlation between adjacent bands, so it is necessary to find representative subsets before further analysis. To address this issue, band selection is considered as an effective approach that removes redundant bands for HSI. Recently, many band selection methods have been proposed, but the majority of them have extremely poor accuracy in a small number of bands and require multiple iterations, which does not meet the purpose of band selection. Therefore, we propose an efficient clustering method based on shared nearest neighbor (SNNC) for hyperspectral optimal band selection, claiming the following contributions: (1) the local density of each band is obtained by shared nearest neighbor, which can more accurately reflect the local distribution characteristics; (2) in order to acquire a band subset containing a large amount of information, the information entropy is taken as one of the weight factors; (3) a method for automatically selecting the optimal band subset is designed by the slope change. The experimental results reveal that compared with other methods, the proposed method has competitive computational time and the selected bands achieve higher overall classification accuracy on different data sets, especially when the number of bands is small.

Download Full-text

A Homomorphic Neural Network for Modeling and Prediction

Neural Computation ◽

10.1162/neco.2008.12-06-418 ◽

2008 ◽

Vol 20 (4) ◽

pp. 1042-1064

Author(s):

Maciej Pedzisz ◽

Danilo P. Mandic

Keyword(s):

Real Life ◽

Volterra Model ◽

Data Sets ◽

Feedforward Network ◽

Optimal Learning ◽

Modeling And Prediction ◽

Real Life Data ◽

Gradient Based ◽

Hidden Layer ◽

Nonlinear Adaptive Filtering

A homomorphic feedforward network (HFFN) for nonlinear adaptive filtering is introduced. This is achieved by a two-layer feedforward architecture with an exponential hidden layer and logarithmic preprocessing step. This way, the overall input-output relationship can be seen as a generalized Volterra model, or as a bank of homomorphic filters. Gradient-based learning for this architecture is introduced, together with some practical issues related to the choice of optimal learning parameters and weight initialization. The performance and convergence speed are verified by analysis and extensive simulations. For rigor, the simulations are conducted on artificial and real-life data, and the performances are compared against those obtained by a sigmoidal feedforward network (FFN) with identical topology. The proposed HFFN proved to be a viable alternative to FFNs, especially in the critical case of online learning on small- and medium-scale data sets.

Download Full-text

Data Mining Techniques for Identification and Classification of Various Diseases in Plants

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1110.1292s19 ◽

2019 ◽

Vol 9 (2S) ◽

pp. 676-680

Keyword(s):

Neural Network ◽

Data Mining ◽

Nearest Neighbors ◽

Crop Productivity ◽

Vital Role ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbors ◽

Data Mining Techniques

Data mining is currently being used in various applications; In research community it plays a vital role. This paper specify about data mining techniques for the preprocessing and classification of various disease in plants. Since various plants has different diseases based on that each of them has different data sets and different objectives for knowledge discovery. Data Mining Techniques applied on plants that it helps in segmentation and classification of diseased plants, it avoids Oral Inspection and helps to increase in crop productivity. This paper provides various classification techniques Such as K-Nearest Neighbors, Support Vector Machine, Principle component Analysis, Neural Network. Thus among various techniques neural network is effective for disease detection in plants.

Download Full-text

Analisis Perbandingan Algoritma Klasifikasi Citra Chest X-ray Untuk Deteksi Covid-19

Teknika ◽

10.34148/teknika.v10i2.331 ◽

2021 ◽

Vol 10 (2) ◽

pp. 96-103

Author(s):

Mohammad Farid Naufal ◽

Selvia Ferdiana Kusuma ◽

Kevin Christian Tanus ◽

Raynaldy Valentino Sukiwun ◽

Joseph Kristiano ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Cross Validation ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Support Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

X Ray ◽

Chest X Ray

Kondisi pandemi global Covid-19 yang muncul diakhir tahun 2019 telah menjadi permasalahan utama seluruh negara di dunia. Covid-19 merupakan virus yang menyerang organ paru-paru dan dapat mengakibatkan kematian. Pasien Covid-19 banyak yang telah dirawat di rumah sakit sehingga terdapat data citra chest X-ray paru-paru pasien yang terjangkit Covid-19. Saat ini sudah banyak peneltian yang melakukan klasifikasi citra chest X-ray menggunakan Convolutional Neural Network (CNN) untuk membedakan paru-paru sehat, terinfeksi covid-19, dan penyakit paru-paru lainnya, namun belum ada penelitian yang mencoba membandingkan performa algoritma CNN dan machine learning klasik seperti Support Vector Machine (SVM), dan K-Nearest Neighbor (KNN) untuk mengetahui gap performa dan waktu eksekusi yang dibutuhkan. Penelitian ini bertujuan untuk membandingkan performa dan waktu eksekusi algoritma klasifikasi K-Nearest Neighbors (KNN), Support Vector Machine (SVM), dan CNN untuk mendeteksi Covid-19 berdasarkan citra chest X-Ray. Berdasarkan hasil pengujian menggunakan 5 Cross Validation, CNN merupakan algoritma yang memiliki rata-rata performa terbaik yaitu akurasi 0,9591, precision 0,9592, recall 0,9591, dan F1 Score 0,959 dengan waktu eksekusi rata-rata sebesar 3102,562 detik.

Download Full-text

Methods based on k-nearest neighbor regression in the prediction of basal area diameter distribution

Canadian Journal of Forest Research ◽

10.1139/x98-085 ◽

1998 ◽

Vol 28 (8) ◽

pp. 1107-1115 ◽

Cited By ~ 61

Author(s):

Matti Maltamo ◽

Annika Kangas

Keyword(s):

Nearest Neighbor ◽

Basal Area ◽

Nearest Neighbors ◽

Volume Estimation ◽

Diameter Distribution ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Stand Growth ◽

Weighted Averages ◽

Growing Stock

In the Finnish compartmentwise inventory systems, growing stock is described with means and sums of tree characteristics, such as mean height and basal area, by tree species. In the calculations, growing stock is described in a treewise manner using a diameter distribution predicted from stand variables. The treewise description is needed for several reasons, e.g., for predicting log volumes or stand growth and for analyzing the forest structure. In this study, methods for predicting the basal area diameter distribution based on the k-nearest neighbor (k-nn) regression are compared with methods based on parametric distributions. In the k-nn method, the predicted values for interesting variables are obtained as weighted averages of the values of neighboring observations. Using k-nn based methods, the basal area diameter distribution of a stand is predicted with a weighted average of the distributions of k-nearest neighbors. The methods tested in this study include weighted averages of (i)Weibull distributions of k-nearest neighbors, (ii)distributions of k-nearest neighbors smoothed with the kernel method, and (iii)empirical distributions of the k-nearest neighbors. These methods are compared for the accuracy of stand volume estimation, stand structure description, and stand growth prediction. Methods based on the k-nn regression proved to give a more accurate description of the stand than the parametric methods.

Download Full-text

Klasifikasi Sekolah Menengah Pertama/Sederajat Wilayah Bireuen Menggunakan Algoritma K-Nearest Neighbors Berbasis Web

Computer Engineering Science and System Journal ◽

10.24114/cess.v5i1.14962 ◽

2020 ◽

Vol 5 (1) ◽

pp. 33

Author(s):

Rozzi Kesuma Dinata ◽

Fajriana Fajriana ◽

Zulfa Zulfa ◽

Novia Hasdyna

Keyword(s):

Euclidean Distance ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

K Nearest Neighbor ◽

K Nearest Neighbors

Pada penelitian ini diimplementasikan algoritma K-Nearest Neighbor dalam pengklasifikasian Sekolah Menengah Pertama/Sederajat berdasarkan peminatan calon siswa. Tujuan penelitian ini adalah untuk memudahkan pengguna dalam menemukan sekolah SMP/sederajat berdasarkan 8 kriteria sekolah yaitu akreditasi, fasilitas ruangan, fasilitas olah raga, laboratorium, ekstrakulikuler, biaya, tingkatan kelas dan waktu belajar. Adapun data yang digunakan dalam penelitian ini didapatkan dari Dinas Pendidikan Pemuda dan Olahraga Kabupaten Bireuen. Hasil penelitian dengan menggunakan K-NN dan pendekatan Euclidean Distance dengan k=3, diperoleh nilai precision sebesar 63,67%, recall 68,95% dan accuracy sebesar 79,33% .

Download Full-text