scholarly journals OP-KNN: Method and Applications

2010 ◽  
Vol 2010 ◽  
pp. 1-6 ◽  
Author(s):  
Qi Yu ◽  
Yoan Miche ◽  
Antti Sorjamaa ◽  
Alberto Guillen ◽  
Amaury Lendasse ◽  
...  

This paper presents a methodology named Optimally Pruned K-Nearest Neighbors (OP-KNNs) which has the advantage of competing with state-of-the-art methods while remaining fast. It builds a one hidden-layer feedforward neural network using K-Nearest Neighbors as kernels to perform regression. Multiresponse Sparse Regression (MRSR) is used in order to rank each kth nearest neighbor and finally Leave-One-Out estimation is used to select the optimal number of neighbors and to estimate the generalization performances. Since computational time of this method is small, this paper presents a strategy using OP-KNN to perform Variable Selection which is tested successfully on eight real-life data sets from different application fields. In summary, the most significant characteristic of this method is that it provides good performance and a comparatively simple model at extremely high-learning speed.

Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 779
Author(s):  
Ruriko Yoshida

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.


2018 ◽  
Vol 14 (9) ◽  
pp. 1213-1225 ◽  
Author(s):  
Vo Ngoc Phu ◽  
Vo Thi Ngoc Tran

Author(s):  
Wei Yan

In cloud computing environments parallel kNN queries for big data is an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operator widely adopted by many applications including knowledge discovery, data mining, and spatial databases. This chapter proposes a parallel method of kNN queries for big data using MapReduce programming model. Firstly, this chapter proposes an approximate algorithm that is based on mapping multi-dimensional data sets into two-dimensional data sets, and transforming kNN queries into a sequence of two-dimensional point searches. Then, in two-dimensional space this chapter proposes a partitioning method using Voronoi diagram, which incorporates the Voronoi diagram into R-tree. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on R-tree using MapReduce programming model. Finally, this chapter presents the results of extensive experimental evaluations which indicate efficiency of the proposed approach.


2018 ◽  
Vol 2018 ◽  
pp. 1-17 ◽  
Author(s):  
Hyung-Ju Cho

We investigate the k-nearest neighbor (kNN) join in road networks to determine the k-nearest neighbors (NNs) from a dataset S to every object in another dataset R. The kNN join is a primitive operation and is widely used in many data mining applications. However, it is an expensive operation because it combines the kNN query and the join operation, whereas most existing methods assume the use of the Euclidean distance metric. We alternatively consider the problem of processing kNN joins in road networks where the distance between two points is the length of the shortest path connecting them. We propose a shared execution-based approach called the group-nested loop (GNL) method that can efficiently evaluate kNN joins in road networks by exploiting grouping and shared execution. The GNL method can be easily implemented using existing kNN query algorithms. Extensive experiments using several real-life roadmaps confirm the superior performance and effectiveness of the proposed method in a wide range of problem settings.


2019 ◽  
Vol 11 (3) ◽  
pp. 350 ◽  
Author(s):  
Qiang Li ◽  
Qi Wang ◽  
Xuelong Li

A hyperspectral image (HSI) has many bands, which leads to high correlation between adjacent bands, so it is necessary to find representative subsets before further analysis. To address this issue, band selection is considered as an effective approach that removes redundant bands for HSI. Recently, many band selection methods have been proposed, but the majority of them have extremely poor accuracy in a small number of bands and require multiple iterations, which does not meet the purpose of band selection. Therefore, we propose an efficient clustering method based on shared nearest neighbor (SNNC) for hyperspectral optimal band selection, claiming the following contributions: (1) the local density of each band is obtained by shared nearest neighbor, which can more accurately reflect the local distribution characteristics; (2) in order to acquire a band subset containing a large amount of information, the information entropy is taken as one of the weight factors; (3) a method for automatically selecting the optimal band subset is designed by the slope change. The experimental results reveal that compared with other methods, the proposed method has competitive computational time and the selected bands achieve higher overall classification accuracy on different data sets, especially when the number of bands is small.


2008 ◽  
Vol 20 (4) ◽  
pp. 1042-1064
Author(s):  
Maciej Pedzisz ◽  
Danilo P. Mandic

A homomorphic feedforward network (HFFN) for nonlinear adaptive filtering is introduced. This is achieved by a two-layer feedforward architecture with an exponential hidden layer and logarithmic preprocessing step. This way, the overall input-output relationship can be seen as a generalized Volterra model, or as a bank of homomorphic filters. Gradient-based learning for this architecture is introduced, together with some practical issues related to the choice of optimal learning parameters and weight initialization. The performance and convergence speed are verified by analysis and extensive simulations. For rigor, the simulations are conducted on artificial and real-life data, and the performances are compared against those obtained by a sigmoidal feedforward network (FFN) with identical topology. The proposed HFFN proved to be a viable alternative to FFNs, especially in the critical case of online learning on small- and medium-scale data sets.


Data mining is currently being used in various applications; In research community it plays a vital role. This paper specify about data mining techniques for the preprocessing and classification of various disease in plants. Since various plants has different diseases based on that each of them has different data sets and different objectives for knowledge discovery. Data Mining Techniques applied on plants that it helps in segmentation and classification of diseased plants, it avoids Oral Inspection and helps to increase in crop productivity. This paper provides various classification techniques Such as K-Nearest Neighbors, Support Vector Machine, Principle component Analysis, Neural Network. Thus among various techniques neural network is effective for disease detection in plants.


Teknika ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 96-103
Author(s):  
Mohammad Farid Naufal ◽  
Selvia Ferdiana Kusuma ◽  
Kevin Christian Tanus ◽  
Raynaldy Valentino Sukiwun ◽  
Joseph Kristiano ◽  
...  

Kondisi pandemi global Covid-19 yang muncul diakhir tahun 2019 telah menjadi permasalahan utama seluruh negara di dunia. Covid-19 merupakan virus yang menyerang organ paru-paru dan dapat mengakibatkan kematian. Pasien Covid-19 banyak yang telah dirawat di rumah sakit sehingga terdapat data citra chest X-ray paru-paru pasien yang terjangkit Covid-19. Saat ini sudah banyak peneltian yang melakukan klasifikasi citra chest X-ray menggunakan Convolutional Neural Network (CNN) untuk membedakan paru-paru sehat, terinfeksi covid-19, dan penyakit paru-paru lainnya, namun belum ada penelitian yang mencoba membandingkan performa algoritma CNN dan machine learning klasik seperti Support Vector Machine (SVM), dan K-Nearest Neighbor (KNN) untuk mengetahui gap performa dan waktu eksekusi yang dibutuhkan. Penelitian ini bertujuan untuk membandingkan performa dan waktu eksekusi algoritma klasifikasi K-Nearest Neighbors (KNN), Support Vector Machine (SVM), dan CNN  untuk mendeteksi Covid-19 berdasarkan citra chest X-Ray. Berdasarkan hasil pengujian menggunakan 5 Cross Validation, CNN merupakan algoritma yang memiliki rata-rata performa terbaik yaitu akurasi 0,9591, precision 0,9592, recall 0,9591, dan F1 Score 0,959 dengan waktu eksekusi rata-rata sebesar 3102,562 detik.


1998 ◽  
Vol 28 (8) ◽  
pp. 1107-1115 ◽  
Author(s):  
Matti Maltamo ◽  
Annika Kangas

In the Finnish compartmentwise inventory systems, growing stock is described with means and sums of tree characteristics, such as mean height and basal area, by tree species. In the calculations, growing stock is described in a treewise manner using a diameter distribution predicted from stand variables. The treewise description is needed for several reasons, e.g., for predicting log volumes or stand growth and for analyzing the forest structure. In this study, methods for predicting the basal area diameter distribution based on the k-nearest neighbor (k-nn) regression are compared with methods based on parametric distributions. In the k-nn method, the predicted values for interesting variables are obtained as weighted averages of the values of neighboring observations. Using k-nn based methods, the basal area diameter distribution of a stand is predicted with a weighted average of the distributions of k-nearest neighbors. The methods tested in this study include weighted averages of (i)Weibull distributions of k-nearest neighbors, (ii)distributions of k-nearest neighbors smoothed with the kernel method, and (iii)empirical distributions of the k-nearest neighbors. These methods are compared for the accuracy of stand volume estimation, stand structure description, and stand growth prediction. Methods based on the k-nn regression proved to give a more accurate description of the stand than the parametric methods.


2020 ◽  
Vol 5 (1) ◽  
pp. 33
Author(s):  
Rozzi Kesuma Dinata ◽  
Fajriana Fajriana ◽  
Zulfa Zulfa ◽  
Novia Hasdyna

Pada penelitian ini diimplementasikan algoritma K-Nearest Neighbor dalam pengklasifikasian Sekolah Menengah Pertama/Sederajat berdasarkan peminatan calon siswa. Tujuan penelitian ini adalah untuk memudahkan pengguna dalam menemukan sekolah SMP/sederajat berdasarkan 8 kriteria sekolah yaitu akreditasi, fasilitas ruangan, fasilitas olah raga, laboratorium, ekstrakulikuler, biaya, tingkatan kelas dan waktu belajar. Adapun data yang digunakan dalam penelitian ini didapatkan dari Dinas Pendidikan Pemuda dan Olahraga Kabupaten Bireuen. Hasil penelitian dengan menggunakan K-NN dan pendekatan Euclidean Distance dengan k=3, diperoleh nilai precision sebesar 63,67%, recall 68,95% dan accuracy sebesar 79,33% .


Sign in / Sign up

Export Citation Format

Share Document