Parallel kNN Queries for Big Data Based on Voronoi Diagram Using MapReduce

Geospatial Research ◽

10.4018/978-1-4666-9845-1.ch029 ◽

2016 ◽

pp. 644-665

Author(s):

Wei Yan

Keyword(s):

Big Data ◽

Voronoi Diagram ◽

Spatial Databases ◽

Nearest Neighbor ◽

Programming Model ◽

Dimensional Space ◽

Data Sets ◽

Two Dimensional ◽

K Nearest Neighbor ◽

K Nearest Neighbors

In cloud computing environments parallel kNN queries for big data is an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operator widely adopted by many applications including knowledge discovery, data mining, and spatial databases. This chapter proposes a parallel method of kNN queries for big data using MapReduce programming model. Firstly, this chapter proposes an approximate algorithm that is based on mapping multi-dimensional data sets into two-dimensional data sets, and transforming kNN queries into a sequence of two-dimensional point searches. Then, in two-dimensional space this chapter proposes a partitioning method using Voronoi diagram, which incorporates the Voronoi diagram into R-tree. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on R-tree using MapReduce programming model. Finally, this chapter presents the results of extensive experimental evaluations which indicate efficiency of the proposed approach.

Download Full-text

Voronoi-Based kNN Queries Using K-Means Clustering in MapReduce

Advances in Data Mining and Database Management - Emerging Technologies and Applications in Data Processing and Management ◽

10.4018/978-1-5225-8446-9.ch011 ◽

2019 ◽

pp. 220-241

Author(s):

Wei Yan

Keyword(s):

Big Data ◽

Voronoi Diagram ◽

Spatial Databases ◽

Nearest Neighbor ◽

Programming Model ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Data Intensive ◽

Clustering Approach ◽

Spatial Big Data

The kNN queries are special type of queries for massive spatial big data. The k-nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every point in another dataset R, are useful tools widely adopted by many applications including knowledge discovery, data mining, and spatial databases. In cloud computing environments, MapReduce programming model is a well-accepted framework for data-intensive application over clusters of computers. This chapter proposes a method of kNN queries based on Voronoi diagram-based partitioning using k-means clusters in MapReduce programming model. Firstly, this chapter proposes a Voronoi diagram-based partitioning approach for massive spatial big data. Then, this chapter presents a k-means clustering approach for the object points based on Voronoi diagram. Furthermore, this chapter proposes a parallel algorithm for processing massive spatial big data using kNN queries based on k-means clusters in MapReduce programming model. Finally, extensive experiments demonstrate the efficiency of the proposed approach.

Download Full-text

Parallel Queries of Cluster-Based k Nearest Neighbor in MapReduce

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Managing Big Data in Cloud Computing Environments ◽

10.4018/978-1-4666-9834-5.ch007 ◽

2016 ◽

pp. 163-182

Author(s):

Wei Yan

Keyword(s):

Spatial Data ◽

Spatial Databases ◽

Nearest Neighbor ◽

Programming Model ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Data Intensive ◽

Parallel Queries ◽

Massive Spatial Data ◽

Nearest Neighbor Queries

Parallel queries of k Nearest Neighbor for massive spatial data are an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every point in another dataset R, is a useful tool widely adopted by many applications including knowledge discovery, data mining, and spatial databases. In cloud computing environments, MapReduce programming model is a well-accepted framework for data-intensive application over clusters of computers. This chapter proposes a parallel method of kNN queries based on clusters in MapReduce programming model. Firstly, this chapter proposes a partitioning method of spatial data using Voronoi diagram. Then, this chapter clusters the data point after partition using k-means method. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on k-means clusters using MapReduce programming model. Finally, extensive experiments evaluate the efficiency of the proposed approach.

Download Full-text

COMBINING FEATURE SELECTION WITH EXTRACTION: UNSUPERVISED FEATURE SELECTION BASED ON PRINCIPAL COMPONENT ANALYSIS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213009000445 ◽

2009 ◽

Vol 18 (06) ◽

pp. 883-904

Author(s):

YUN LI ◽

BAO-LIANG LU ◽

TENG-FEI ZHANG

Keyword(s):

Feature Selection ◽

Principal Components ◽

Nearest Neighbor ◽

Dimensional Space ◽

Image Data ◽

Principal Component ◽

Data Sets ◽

K Nearest Neighbor ◽

Unsupervised Feature Selection ◽

Linear Feature

Principal components analysis (PCA) is a popular linear feature extractor, and widely used in signal processing, face recognition, etc. However, axes of the lower-dimensional space, i.e., principal components, are a set of new variables carrying no clear physical meanings. Thus we propose unsupervised feature selection algorithms based on eigenvectors analysis to identify critical original features for principal component. The presented algorithms are based on k-nearest neighbor rule to find the predominant row components and eight new measures are proposed to compute the correlation between row components in transformation matrix. Experiments are conducted on benchmark data sets and facial image data sets for gender classification to show their superiorities.

Download Full-text

Two dimensional Large Margin Nearest Neighbor for Matrix Classification

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/383 ◽

2017 ◽

Author(s):

Kun Song ◽

Feiping Nie ◽

Junwei Han

Keyword(s):

Nearest Neighbor ◽

Metric Learning ◽

Parameter Determination ◽

Data Sets ◽

Two Dimensional ◽

K Nearest Neighbor ◽

Large Margin ◽

Wide Range ◽

The Matrix ◽

Projection Matrices

Matrices are common forms of data that are encountered in a wide range of real applications. How to classify this kind of data is an important research topic. In this paper, we propose a novel distance metric learning method named two dimensional large margin nearest neighbor (2DLMNNN), for improving the performance of k nearest neighbor (KNN) classifier in matrix classification. In the proposed method, left and right projection matrices are employed to define the matrix-based Mahalanobis distance, which is used to construct the objective aimed at separating points in different classes by a large margin. The parameters in those two projection matrices are much less than that in its vector-based counterpart, thus our method reduces the risks of overfitting. We also introduce a framework for solving the proposed 2DLMNN. The convergence behavior, initialization, and parameter determination are also analyzed. Compared with vector-based methods, 2DLMNN performs better for matrix data classification. Promising experimental results on several data sets are provided to demonstrate the effectiveness of our method.

Download Full-text

Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

Mathematics ◽

10.3390/math9070779 ◽

2021 ◽

Vol 9 (7) ◽

pp. 779

Author(s):

Ruriko Yoshida

Keyword(s):

Supervised Learning ◽

Phylogenetic Trees ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

High Dimensional ◽

Learning Method ◽

Dimensional Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.

Download Full-text

An Enhanced Performance of K-Nearest Neighbor (K-NN) Classifier to Meet New Big Data Necessities

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/928/3/032013 ◽

2020 ◽

Vol 928 ◽

pp. 032013

Author(s):

Ihab L.Hussein Alsammak ◽

Humam M. Abdul Sahib ◽

Wasan H.Itwee

Keyword(s):

Big Data ◽

Nearest Neighbor ◽

K Nearest Neighbor

Download Full-text

Precision Pig Farming Image Analysis Using Random Forest and Boruta Predictive Big Data Analysis Using Neural Network and K- Nearest Neighbor

2021 2nd International Conference on Intelligent Engineering and Management (ICIEM) ◽

10.1109/iciem51511.2021.9445328 ◽

2021 ◽

Author(s):

S. A. Shaik Mazhar ◽

G. Suseendran

Keyword(s):

Neural Network ◽

Image Analysis ◽

Big Data ◽

Data Analysis ◽

Random Forest ◽

Nearest Neighbor ◽

Big Data Analysis ◽

K Nearest Neighbor ◽

Pig Farming

Download Full-text

A Reformed K-Nearest Neighbors Algorithm for Big Data Sets

Journal of Computer Science ◽

10.3844/jcssp.2018.1213.1225 ◽

2018 ◽

Vol 14 (9) ◽

pp. 1213-1225 ◽

Cited By ~ 2

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Big Data ◽

Nearest Neighbors ◽

Data Sets ◽

K Nearest Neighbors

Download Full-text

Determination of Reactivity Ratios from Binary Copolymerization Using the k-Nearest Neighbor Non-Parametric Regression

Polymers ◽

10.3390/polym13213811 ◽

2021 ◽

Vol 13 (21) ◽

pp. 3811

Author(s):

Iosif Sorin Fazakas-Anca ◽

Arina Modrea ◽

Sorin Vlase

Keyword(s):

Experimental Data ◽

Nearest Neighbor ◽

Optimization Method ◽

Reactivity Ratios ◽

Data Sets ◽

K Nearest Neighbor ◽

Integration Algorithm ◽

Data Set ◽

Parametric Regression ◽

Non Parametric

This paper proposes a new method for calculating the monomer reactivity ratios for binary copolymerization based on the terminal model. The original optimization method involves a numerical integration algorithm and an optimization algorithm based on k-nearest neighbour non-parametric regression. The calculation method has been tested on simulated and experimental data sets, at low (<10%), medium (10–35%) and high conversions (>40%), yielding reactivity ratios in a good agreement with the usual methods such as intersection, Fineman–Ross, reverse Fineman–Ross, Kelen–Tüdös, extended Kelen–Tüdös and the error in variable method. The experimental data sets used in this comparative analysis are copolymerization of 2-(N-phthalimido) ethyl acrylate with 1-vinyl-2-pyrolidone for low conversion, copolymerization of isoprene with glycidyl methacrylate for medium conversion and copolymerization of N-isopropylacrylamide with N,N-dimethylacrylamide for high conversion. Also, the possibility to estimate experimental errors from a single experimental data set formed by n experimental data is shown.

Download Full-text