Testing Performance (Time Analysis) of Nearest Neighbour (NN) Search Algorithms on K-d Trees

doi:10.30534/ijatcse/2021/041052021

Testing Performance (Time Analysis) of Nearest Neighbour (NN) Search Algorithms on K-d Trees

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/041052021 ◽

2021 ◽

Vol 10 (5) ◽

pp. 2954-2957

Keyword(s):

Data Structure ◽

Search Algorithm ◽

Dimensional Space ◽

Binary Search Tree ◽

Nearest Neighbour ◽

Space Partitioning ◽

Multidimensional Search ◽

Data Points ◽

Knn Search ◽

Better Than

K-d tree (k-dimensional tree) is a space partitioning data structure for organizing points in a k-dimensional space. K-d tree, or Multidimensional Binary Search Tree is a useful data structure for several applications such as searches involving a multidimensional search key (e.g., Range Search and Nearest Neighbour Search). K-d trees are a special case of binary space partitioning trees.KNN Search is a searching algorithm with complexity O(N log N) {N= no. of data points}. This search algorithm is relatively better than brute force search {Complexity= O(n*k); where k=No. of neighbours searched, N=No. of Data Points in Kd tree} for dimensions N>>2D {N=No. of Points, D=Dimensionality of Tree}.Furthermore, Parallel KNN Search is much more efficient and performs better than KNN Search, as it harnesses parallel processing capabilities of computers and thus, results in better search time.This paper tests the time performance of KNN Search and Parallel KNN Search and compares them by plotting it on a 3D graph. A more comprehensive comparison is done by use of 2D graphs for each dimension(from 2 to 20).

Download Full-text

EXPRESS: Baseline Correction Based on a Search Algorithm from Artificial Intelligence

Applied Spectroscopy ◽

10.1177/0003702820977512 ◽

2020 ◽

pp. 000370282097751

Author(s):

Xin Wang ◽

Xia Chen

Keyword(s):

Artificial Intelligence ◽

Objective Function ◽

Least Squares ◽

Least Squares Method ◽

Search Algorithm ◽

Absolute Error ◽

Search Problem ◽

Baseline Correction ◽

Current Spectrum ◽

Data Points

Many spectra have a polynomial-like baseline. Iterative polynomial fitting (IPF) is one of the most popular methods for baseline correction of these spectra. However, the baseline estimated by IPF may have substantially error when the spectrum contains significantly strong peaks or have strong peaks located at the endpoints. First, IPF uses temporary baseline estimated from the current spectrum to identify peak data points. If the current spectrum contains strong peaks, then the temporary baseline substantially deviates from the true baseline. Some good baseline data points of the spectrum might be mistakenly identified as peak data points and are artificially re-assigned with a low value. Second, if a strong peak is located at the endpoint of the spectrum, then the endpoint region of the estimated baseline might have significant error due to overfitting. This study proposes a search algorithm-based baseline correction method (SA) that aims to compress sample the raw spectrum to a dataset with small number of data points and then convert the peak removal process into solving a search problem in artificial intelligence (AI) to minimize an objective function by deleting peak data points. First, the raw spectrum is smoothened out by the moving average method to reduce noise and then divided into dozens of unequally spaced sections on the basis of Chebyshev nodes. Finally, the minimal points of each section are collected to form a dataset for peak removal through search algorithm. SA selects the mean absolute error (MAE) as the objective function because of its sensitivity to overfitting and rapid calculation. The baseline correction performance of SA is compared with those of three baseline correction methods: Lieber and MahadevanâJansen method, adaptive iteratively reweighted penalized least squares method, and improved asymmetric least squares method. Simulated and real FTIR and Raman spectra with polynomial-like baselines are employed in the experiments. Results show that for these spectra, the baseline estimated by SA has fewer error than those by the three other methods.

Download Full-text

Cache-efficient sweeping-based interval joins for extended Allen relation predicates

The VLDB Journal ◽

10.1007/s00778-020-00650-5 ◽

2021 ◽

Author(s):

Danila Piatov ◽

Sven Helmer ◽

Anton Dignös ◽

Fabio Persia

Keyword(s):

Data Structure ◽

Experimental Evaluation ◽

State Of The Art ◽

Temporal Databases ◽

Access Method ◽

Wide Range ◽

Interval Relation ◽

Cache Efficient ◽

Join Algorithms ◽

Better Than

AbstractWe develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing.

Download Full-text

Artifacts in Simultaneous hdEEG/fMRI Imaging: A Nonlinear Dimensionality Reduction Approach

Sensors ◽

10.3390/s19204454 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4454 ◽

Cited By ~ 2

Author(s):

Marek Piorecky ◽

Vlastimil Koudelka ◽

Jan Strobl ◽

Martin Brunovsky ◽

Vladimir Krajca

Keyword(s):

Data Structure ◽

Spatial Clustering ◽

Dimensional Space ◽

Head Movements ◽

Nonlinear Dimensionality Reduction ◽

Space Resolution ◽

Additional Information ◽

Nonlinear Dimension ◽

The Time Domain ◽

Low Dimensional

Simultaneous recordings of electroencephalogram (EEG) and functional magnetic resonance imaging (fMRI) are at the forefront of technologies of interest to physicians and scientists because they combine the benefits of both modalities—better time resolution (hdEEG) and space resolution (fMRI). However, EEG measurements in the scanner contain an electromagnetic field that is induced in leads as a result of gradient switching slight head movements and vibrations, and it is corrupted by changes in the measured potential because of the Hall phenomenon. The aim of this study is to design and test a methodology for inspecting hidden EEG structures with respect to artifacts. We propose a top-down strategy to obtain additional information that is not visible in a single recording. The time-domain independent component analysis algorithm was employed to obtain independent components and spatial weights. A nonlinear dimension reduction technique t-distributed stochastic neighbor embedding was used to create low-dimensional space, which was then partitioned using the density-based spatial clustering of applications with noise (DBSCAN). The relationships between the found data structure and the used criteria were investigated. As a result, we were able to extract information from the data structure regarding electrooculographic, electrocardiographic, electromyographic and gradient artifacts. This new methodology could facilitate the identification of artifacts and their residues from simultaneous EEG in fMRI.

Download Full-text

Fast K-nearest-neighbour search algorithm for nonparametric classification

Electronics Letters ◽

10.1049/el:20001249 ◽

2000 ◽

Vol 36 (21) ◽

pp. 1821 ◽

Cited By ~ 8

Author(s):

SeongJoon Baek ◽

Koeng-Mo Sung

Keyword(s):

Search Algorithm ◽

Nearest Neighbour ◽

Nonparametric Classification

Download Full-text

How significant are the known collision and element distinctness quantum algorithms?

Quantum Information and Computation ◽

10.26421/qic4.3-5 ◽

2004 ◽

Vol 4 (3) ◽

pp. 201-206

Author(s):

L. Grover ◽

T. Rudolph

Keyword(s):

Search Algorithm ◽

Quantum Algorithms ◽

Search Space ◽

Space Complexity ◽

Quantum Search ◽

Quantum Search Algorithm ◽

Time Space ◽

Simple Quantum ◽

Structured Problems ◽

Better Than

Quantum search is a technique for searching $N$ possibilities for a desired target in $O(\sqrt{N})$ steps. It has been applied in the design of quantum algorithms for several structured problems. Many of these algorithms require significant amount of quantum hardware. In this paper we propose the criterion that an algorithm which requires $O(S)$ hardware should be considered significant if it produces a speedup of better than $O\left(\sqrt{S}\right)$ over a simple quantum search algorithm. This is because a speedup of $O\left(\sqrt{S}\right)$ can be trivially obtained by dividing the search space into $S$ separate parts and handing the problem to $S$ independent processors that do a quantum search (in this paper we drop all logarithmic factors when discussing time/space complexity). Known algorithms for collision and element distinctness exactly saturate the criterion.

Download Full-text

Improving the Red-Black tree delete algorithm

10.21203/rs.3.rs-1194654/v1 ◽

2021 ◽

Author(s):

ZEGOUR Djamel Eddine

Keyword(s):

Data Structure ◽

Search Tree ◽

Binary Search ◽

Binary Search Tree ◽

Search Performance ◽

Running Time ◽

Standard Algorithm ◽

Color Changes ◽

Red Black Tree ◽

Symbol Tables

Abstract Today, Red-Black trees are becoming a popular data structure typically used to implement dictionaries, associative arrays, symbol tables within some compilers (C++, Java …) and many other systems. In this paper, we present an improvement of the delete algorithm of this kind of binary search tree. The proposed algorithm is very promising since it colors differently the tree while reducing color changes by a factor of about 29%. Moreover, the maintenance operations re-establishing Red-Black tree balance properties are reduced by a factor of about 11%. As a consequence, the proposed algorithm saves about 4% on running time when insert and delete operations are used together while conserving search performance of the standard algorithm.

Download Full-text

Averaging Trials Versus Averaging Trial Peaks: Impact on Study Outcomes

Journal of Applied Biomechanics ◽

10.1123/jab.2016-0164 ◽

2017 ◽

Vol 33 (3) ◽

pp. 233-236 ◽

Cited By ~ 7

Author(s):

Kevin D. Dames ◽

Jeremy D. Smith ◽

Gary D. Heise

Keyword(s):

Phase Shift ◽

Average Data ◽

Paired Samples ◽

Individual Trial ◽

Dependent Variables ◽

Data Points ◽

Multiple Trials ◽

Average Profile ◽

Better Than ◽

Peak Value

Gait data are commonly presented as an average of many trials or as an average across participants. Discrete data points (eg, maxima or minima) are identified and used as dependent variables in subsequent statistical analyses. However, the approach used for obtaining average data from multiple trials is inconsistent and unclear in the biomechanics literature. This study compared the statistical outcomes of averaging peaks from multiple trials versus identifying a single peak from an average profile. A series of paired-samples t tests were used to determine whether there were differences in average dependent variables from these 2 methods. Identifying a peak value from the average profile resulted in significantly smaller magnitudes of dependent variables than when peaks from multiple trials were averaged. Disagreement between the 2 methods was due to temporal differences in trial peak locations. Sine curves generated in MATLAB confirmed this misrepresentation of trial peaks in the average profile when a phase shift was introduced. Based on these results, averaging individual trial peaks represents the actual data better than choosing a peak from an average trial profile.

Download Full-text

Extension of mathematical background for Nearest Neighbour Analysis in three-dimensional space

Geoinformatics FCE CTU ◽

10.14311/gi.11.2 ◽

2013 ◽

Vol 11 ◽

pp. 25-36

Author(s):

Eva Stopková

Keyword(s):

Dimensional Space ◽

Average Distance ◽

Three Dimensional ◽

Nearest Neighbour ◽

Mathematical Background ◽

Area Of Interest ◽

Nearest Neighbours ◽

Anisotropic Function ◽

Neighbour Analysis ◽

Three Dimensional Space

Proceeding deals with development and testing of the module for GRASS GIS [1], based on Nearest Neighbour Analysis. This method can be useful for assessing whether points located in area of interest are distributed randomly, in clusters or separately. The main principle of the method consists of comparing observed average distance between the nearest neighbours r A to average distance between the nearest neighbours r E that is expected in case of randomly distributed points. The result should be statistically tested. The method for two- or three-dimensional space differs in way how to compute r E . Proceeding also describes extension of mathematical background deriving standard deviation of r E , needed in statistical test of analysis result. As disposition of phenomena (e.g. distribution of birds’ nests or plant species) and test results suggest, anisotropic function would repre- sent relationships between points in three-dimensional space better than isotropic function that was used in this work.

Download Full-text

RASTERIZATION AND VOXELIZATION OF TWO- AND THREE-DIMENSIONAL SPACE PARTITIONINGS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b4-283-2016 ◽

2016 ◽

Vol XLI-B4 ◽

pp. 283-288

Author(s):

Ben Gorte ◽

Sisi Zlatanova

Keyword(s):

Dimensional Space ◽

Three Dimensional ◽

Space Partitioning ◽

Object Boundary ◽

A Value ◽

Scan Line ◽

Voxel Value ◽

Two Stages ◽

Three Dimensional Space ◽

Indoor And Outdoor

The paper presents a very straightforward and effective algorithm to convert a space partitioning, made up of polyhedral objects, into a 3D block of voxels, which is fully occupied, i.e. in which every voxel has a value. In addition to walls, floors, etc. there are 'air' voxels, which in turn may be distinguished as indoor and outdoor air. The method is a 3D extension of a 2D polygon-to-raster conversion algorithm. The input of the algorithm is a set of non-overlapping, closed polyhedra, which can be nested or touching. The air volume is not necessarily represented explicitly as a polyhedron (it can be treated as 'background', leading to the 'default' voxel value). The approach consists of two stages, the first being object (boundary) based, the second scan-line based. In addition to planar faces, other primitives, such as ellipsoids, can be accommodated in the first stage without affecting the second.

Download Full-text

Projected Clustering for Biological Data Analysis

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch247 ◽

2011 ◽

pp. 1617-1622

Author(s):

Ping Deng ◽

Qingkai Ma ◽

Weili Wu

Keyword(s):

Nearest Neighbor ◽

Dimensional Space ◽

Clustering Algorithms ◽

Biological Data ◽

High Dimensional ◽

Projected Clustering ◽

Cluster Data ◽

Biological Data Analysis ◽

Data Points ◽

Entire Dataset

Clustering can be considered as the most important unsupervised learning problem. It has been discussed thoroughly by both statistics and database communities due to its numerous applications in problems such as classification, machine learning, and data mining. A summary of clustering techniques can be found in (Berkhin, 2002). Most known clustering algorithms such as DBSCAN (Easter, Kriegel, Sander, & Xu, 1996) and CURE (Guha, Rastogi, & Shim, 1998) cluster data points based on full dimensions. When the dimensional space grows higher, the above algorithms lose their efficiency and accuracy because of the so-called “curse of dimensionality”. It is shown in (Beyer, Goldstein, Ramakrishnan, & Shaft, 1999) that computing the distance based on full dimensions is not meaningful in high dimensional space since the distance of a point to its nearest neighbor approaches the distance to its farthest neighbor as dimensionality increases. Actually, natural clusters might exist in subspaces. Data points in different clusters may be correlated with respect to different subsets of dimensions. In order to solve this problem, feature selection (Kohavi & Sommerfield, 1995) and dimension reduction (Raymer, Punch, Goodman, Kuhn, & Jain, 2000) have been proposed to find the closely correlated dimensions for all the data and the clusters in such dimensions. Although both methods reduce the dimensionality of the space before clustering, the case where clusters may exist in different subspaces of full dimensions is not handled well. Projected clustering has been proposed recently to effectively deal with high dimensionalities. Finding clusters and their relevant dimensions are the objectives of projected clustering algorithms. Instead of projecting the entire dataset on the same subspace, projected clustering focuses on finding specific projection for each cluster such that the similarity is reserved as much as possible.

Download Full-text