A Practical Index Structure Supporting Fréchet Proximity Queries among Trajectories

We present a scalable approach for range and k nearest neighbor queries under computationally expensive metrics, like the continuous Fréchet distance on trajectory data. Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories, regardless of the trajectory’s individual sizes or the spatial dimension, which allows one to exploit low “intrinsic dimensionality” of datasets for effective search space pruning. Since the distance computation is expensive, generic metric indexing methods are rendered impractical. We present strategies that (i) improve on known upper and lower bound computations, (ii) build cluster trees without any or very few distance calls, and (iii) search using bounds for metric pruning, interval orderings for reduction, and randomized pivoting for reporting the final results. We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world datasets. The results show improvement over state-of-the-art methods for exact queries, and even further speedups are achieved for queries that may return approximate results. Surprisingly, the majority of exact nearest-neighbor queries on real datasets are answered without any distance computations.

Download Full-text

Pivot-based Maximal Biclique Enumeration

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/492 ◽

2020 ◽

Author(s):

Aman Abidi ◽

Rui Zhou ◽

Lu Chen ◽

Chengfei Liu

Keyword(s):

Real World ◽

Search Space ◽

Index Structure ◽

Performance Study ◽

Clique Enumeration ◽

Wide Range ◽

Real World Applications ◽

Real World Datasets ◽

Search Space Pruning ◽

Extensive Performance

Enumerating maximal bicliques in a bipartite graph is an important problem in data mining, with innumerable real-world applications across different domains such as web community, bioinformatics, etc. Although substantial research has been conducted on this problem, surprisingly, we find that pivot-based search space pruning, which is quite effective in clique enumeration, has not been exploited in biclique scenario. Therefore, in this paper, we explore the pivot-based pruning for biclique enumeration. We propose an algorithm for implementing the pivot-based pruning, powered by an effective index structure Containment Directed Acyclic Graph (CDAG). Meanwhile, existing literature indicates contradictory findings on the order of vertex selection in biclique enumeration. As such, we re-examine the problem and suggest an offline ordering of vertices which expedites the pivot pruning. We conduct an extensive performance study using real-world datasets from a wide range of domains. The experimental results demonstrate that our algorithm is more scalable and outperforms all the existing algorithms across all datasets and can achieve a significant speedup against the previous algorithms.

Download Full-text

Search Space Reductions for Nearest-Neighbor Queries

Lecture Notes in Computer Science - Theory and Applications of Models of Computation ◽

10.1007/978-3-540-79228-4_48 ◽

2008 ◽

pp. 554-567 ◽

Cited By ~ 2

Author(s):

Micah Adler ◽

Brent Heeringa

Keyword(s):

Nearest Neighbor ◽

Search Space ◽

Nearest Neighbor Queries

Download Full-text

SPY-TEC+ : AN INTEGRATED INDEX STRUCTURE FOR k-NEAREST NEIGHBOR QUERIES WITH SEMANTIC PREDICATES IN MULTIMEDIA DATABASE

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194011005529 ◽

2011 ◽

Vol 21 (07) ◽

pp. 989-1011

Author(s):

DONG-JOO PARK ◽

DONG-HO LEE

Keyword(s):

Visual Information ◽

Nearest Neighbor ◽

High Dimensional Data ◽

Multimedia Retrieval ◽

Index Structure ◽

High Dimensional ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Integrated Index ◽

Nearest Neighbor Queries

Recently, advanced multimedia applications, such as geographic information system, and content-based multimedia retrieval system, require the efficient processing of k-nearest neighbor queries over large collection of multimedia objects. These queries usually include the semantic information that is represented by text, as well as the visual information that is represented by a high-dimensional feature vector. Among the available techniques for processing such queries, the incremental nearest neighbor algorithm proposed by Hjaltason and Samet is known as the best choice. However, the R-tree used in their algorithm has no facility capable of partially pruning the candidate tuples that will turn out not to satisfy the semantic predicate. Also, the R-tree does not perform sufficiently well on high-dimensional data even though it provides good results on low or middle-dimensional data. These drawbacks may lead to a poor performance when processing the query. In this paper, we propose an integrated index structure, so-called SPY-TEC+, that provides an efficient method for indexing the visual and semantic feature at the same time using the SPY-TEC that was proposed for indexing high-dimensional data, and the signature file. We also propose an efficient incremental nearest neighbor algorithm for processing k-nearest neighbor queries with visual and semantic predicates on the SPY-TEC+. Finally, we show that the SPY-TEC+ enhances the performance of the SPY-TEC for processing k-nearest neighbor queries with visual and semantic predicates through various experiments.

Download Full-text

Interval-Based Nearest Neighbor Queries over Sliding Windows from Trajectory Data

2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware ◽

10.1109/mdm.2009.32 ◽

2009 ◽

Cited By ~ 1

Author(s):

Yan Huang ◽

Chengyang Zhang

Keyword(s):

Nearest Neighbor ◽

Trajectory Data ◽

Sliding Windows ◽

Nearest Neighbor Queries

Download Full-text

How Good Are Modern Spatial Libraries?

Data Science and Engineering ◽

10.1007/s41019-020-00147-9 ◽

2020 ◽

Author(s):

Varun Pandey ◽

Alexander van Renen ◽

Andreas Kipf ◽

Alfons Kemper

Keyword(s):

Spatial Data ◽

High Performance ◽

Nearest Neighbor ◽

Vantage Point ◽

Nearest Neighbor Search ◽

Index Structure ◽

Spatial Query ◽

Neighbor Search ◽

Spatial Data Management ◽

Real World Datasets

Abstract Many applications today like Uber, Yelp, Tinder, etc. rely on spatial data or locations from its users. These applications and services either build their own spatial data management systems or rely on existing solutions. JTS Topology Suite (JTS), its C++ port GEOS, Google S2, ESRI Geometry API, and Java Spatial Index (JSI) are some of the spatial processing libraries that these systems build upon. These applications and services depend on indexing capabilities available in these libraries for high-performance spatial query processing. In this work, we compare these libraries qualitatively and quantitatively based on four different spatial queries using two real world datasets. We also compare these libraries with an open-source implementation of the Vantage Point Tree—an index structure that has been well studied in image retrieval and nearest-neighbor search algorithms for high-dimensional data. We found that Vantage Point Trees are very competitive and even outperform the aforementioned libraries in two queries.

Download Full-text

An index structure for efficient reverse nearest neighbor queries

Proceedings 17th International Conference on Data Engineering ◽

10.1109/icde.2001.914862 ◽

2002 ◽

Cited By ~ 35

Author(s):

Congyun Yang ◽

King-Ip Lin

Keyword(s):

Nearest Neighbor ◽

Index Structure ◽

Reverse Nearest Neighbor ◽

Nearest Neighbor Queries

Download Full-text

Learning K-way D-dimensional Discrete Embedding for Hierarchical Data Visualization and Retrieval

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/411 ◽

2019 ◽

Author(s):

Xiaoyuan Liang ◽

Martin Renqiang Min ◽

Hongyu Guo ◽

Guiling Wang

Keyword(s):

Data Visualization ◽

Real World ◽

Nearest Neighbor ◽

Search Space ◽

High Accuracy ◽

Semantic Structure ◽

Hierarchical Data ◽

Data Points ◽

Real World Datasets ◽

Data Objects

Traditional embedding approaches associate a real-valued embedding vector with each symbol or data point, which is equivalent to applying a linear transformation to ``one-hot" encoding of discrete symbols or data objects. Despite simplicity, these methods generate storage-inefficient representations and fail to effectively encode the internal semantic structure of data, especially when the number of symbols or data points and the dimensionality of the real-valued embedding vectors are large. In this paper, we propose a regularized autoencoder framework to learn compact Hierarchical K-way D-dimensional (HKD) discrete embedding of symbols or data points, aiming at capturing essential semantic structures of data. Experimental results on synthetic and real-world datasets show that our proposed HKD embedding can effectively reveal the semantic structure of data via hierarchical data visualization and greatly reduce the search space of nearest neighbor retrieval while preserving high accuracy.

Download Full-text

Overview of handcrafted features and deep learning models for leaf recognition

Journal of Engineering Research ◽

10.36909/jer.v9i1.7737 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Şahin Işık ◽

◽

Kemal Özkan ◽

Keyword(s):

Deep Learning ◽

Nearest Neighbor ◽

Search Space ◽

Local Features ◽

Automated System ◽

K Nearest Neighbor ◽

Target Class ◽

Leaf Sample ◽

Global Features ◽

Efficiency And Effectiveness

In this study, an automated system for classification of leaf species based on the global and local features is presented by concentrating on a smart and unorthodox decision system. The utilized global features consist of 11 features and are separated into two categories: gross shape features (7) and moment based features (4), respectively. In case of local features, only the curve points on Bézier curves are accepted as discriminative features. With the purpose of reducing the search space and improving the performance of the system, firstly, the class label of leaf object is determined by conducting the global features with respect to predefined threshold values. Once the target class is determined, the local features have been performed on in order to validate the label of leaf sample. After conducting experiments on the K-Nearest Neighbor (K-NN) with Hausdorff distance, this system provides valuable accuracy rate as achieving the 96.78% performance on Flavia and the 94.66% on Swedish dataset. Moreover, by applying a deep learning model, namely, Inception-v3 architecture, the superior results were recorded as 99.11% and 98.95% when compared to state-of-the-art methods. It turns out that one can use our feature extraction and classification technique or Inception-v3 model by considering compromises and commutations about efficiency and effectiveness.

Download Full-text

Continuous K Nearest Neighbor Queries of Moving Objects in Road Networks

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2010.01396 ◽

2010 ◽

Vol 33 (8) ◽

pp. 1396-1404 ◽

Cited By ~ 1

Author(s):

Liang ZHAO ◽

Luo CHEN ◽

Ning JING ◽

Wei LIAO

Keyword(s):

Moving Objects ◽

Nearest Neighbor ◽

Road Networks ◽

Nearest Neighbor Queries

Download Full-text

Efficient Retrieval of Music Recordings Using Graph-Based Index Structures

Signals ◽

10.3390/signals2020021 ◽

2021 ◽

Vol 2 (2) ◽

pp. 336-352

Author(s):

Frank Zalkow ◽

Julian Brandner ◽

Meinard Müller

Keyword(s):

Nearest Neighbor ◽

Response Times ◽

Negative Impact ◽

Nearest Neighbor Search ◽

Index Structure ◽

Search Problem ◽

Index Structures ◽

Music Retrieval ◽

Retrieval Systems ◽

Music Recordings

Flexible retrieval systems are required for conveniently browsing through large music collections. In a particular content-based music retrieval scenario, the user provides a query audio snippet, and the retrieval system returns music recordings from the collection that are similar to the query. In this scenario, a fast response from the system is essential for a positive user experience. For realizing low response times, one requires index structures that facilitate efficient search operations. One such index structure is the K-d tree, which has already been used in music retrieval systems. As an alternative, we propose to use a modern graph-based index, denoted as Hierarchical Navigable Small World (HNSW) graph. As our main contribution, we explore its potential in the context of a cross-version music retrieval application. In particular, we report on systematic experiments comparing graph- and tree-based index structures in terms of the retrieval quality, disk space requirements, and runtimes. Despite the fact that the HNSW index provides only an approximate solution to the nearest neighbor search problem, we demonstrate that it has almost no negative impact on the retrieval quality in our application. As our main result, we show that the HNSW-based retrieval is several orders of magnitude faster. Furthermore, the graph structure also works well with high-dimensional index items, unlike the tree-based structure. Given these merits, we highlight the practical relevance of the HNSW graph for music information retrieval (MIR) applications.

Download Full-text