scholarly journals A Practical Index Structure Supporting Fréchet Proximity Queries among Trajectories

2021 ◽  
Vol 7 (3) ◽  
pp. 1-33
Author(s):  
Joachim Gudmundsson ◽  
Michael Horton ◽  
John Pfeifer ◽  
Martin P. Seybold

We present a scalable approach for range and k nearest neighbor queries under computationally expensive metrics, like the continuous Fréchet distance on trajectory data. Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories, regardless of the trajectory’s individual sizes or the spatial dimension, which allows one to exploit low “intrinsic dimensionality” of datasets for effective search space pruning. Since the distance computation is expensive, generic metric indexing methods are rendered impractical. We present strategies that (i) improve on known upper and lower bound computations, (ii) build cluster trees without any or very few distance calls, and (iii) search using bounds for metric pruning, interval orderings for reduction, and randomized pivoting for reporting the final results. We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world datasets. The results show improvement over state-of-the-art methods for exact queries, and even further speedups are achieved for queries that may return approximate results. Surprisingly, the majority of exact nearest-neighbor queries on real datasets are answered without any distance computations.

Author(s):  
Aman Abidi ◽  
Rui Zhou ◽  
Lu Chen ◽  
Chengfei Liu

Enumerating maximal bicliques in a bipartite graph is an important problem in data mining, with innumerable real-world applications across different domains such as web community, bioinformatics, etc. Although substantial research has been conducted on this problem, surprisingly, we find that pivot-based search space pruning, which is quite effective in clique enumeration, has not been exploited in biclique scenario. Therefore, in this paper, we explore the pivot-based pruning for biclique enumeration. We propose an algorithm for implementing the pivot-based pruning, powered by an effective index structure Containment Directed Acyclic Graph (CDAG). Meanwhile, existing literature indicates contradictory findings on the order of vertex selection in biclique enumeration. As such, we re-examine the problem and suggest an offline ordering of vertices which expedites the pivot pruning. We conduct an extensive performance study using real-world datasets from a wide range of domains. The experimental results demonstrate that our algorithm is more scalable and outperforms all the existing algorithms across all datasets and can achieve a significant speedup against the previous algorithms.


Author(s):  
DONG-JOO PARK ◽  
DONG-HO LEE

Recently, advanced multimedia applications, such as geographic information system, and content-based multimedia retrieval system, require the efficient processing of k-nearest neighbor queries over large collection of multimedia objects. These queries usually include the semantic information that is represented by text, as well as the visual information that is represented by a high-dimensional feature vector. Among the available techniques for processing such queries, the incremental nearest neighbor algorithm proposed by Hjaltason and Samet is known as the best choice. However, the R-tree used in their algorithm has no facility capable of partially pruning the candidate tuples that will turn out not to satisfy the semantic predicate. Also, the R-tree does not perform sufficiently well on high-dimensional data even though it provides good results on low or middle-dimensional data. These drawbacks may lead to a poor performance when processing the query. In this paper, we propose an integrated index structure, so-called SPY-TEC+, that provides an efficient method for indexing the visual and semantic feature at the same time using the SPY-TEC that was proposed for indexing high-dimensional data, and the signature file. We also propose an efficient incremental nearest neighbor algorithm for processing k-nearest neighbor queries with visual and semantic predicates on the SPY-TEC+. Finally, we show that the SPY-TEC+ enhances the performance of the SPY-TEC for processing k-nearest neighbor queries with visual and semantic predicates through various experiments.


Author(s):  
Varun Pandey ◽  
Alexander van Renen ◽  
Andreas Kipf ◽  
Alfons Kemper

Abstract Many applications today like Uber, Yelp, Tinder, etc. rely on spatial data or locations from its users. These applications and services either build their own spatial data management systems or rely on existing solutions. JTS Topology Suite (JTS), its C++ port GEOS, Google S2, ESRI Geometry API, and Java Spatial Index (JSI) are some of the spatial processing libraries that these systems build upon. These applications and services depend on indexing capabilities available in these libraries for high-performance spatial query processing. In this work, we compare these libraries qualitatively and quantitatively based on four different spatial queries using two real world datasets. We also compare these libraries with an open-source implementation of the Vantage Point Tree—an index structure that has been well studied in image retrieval and nearest-neighbor search algorithms for high-dimensional data. We found that Vantage Point Trees are very competitive and even outperform the aforementioned libraries in two queries.


Author(s):  
Xiaoyuan Liang ◽  
Martin Renqiang Min ◽  
Hongyu Guo ◽  
Guiling Wang

Traditional embedding approaches associate a real-valued embedding vector with each symbol or data point, which is equivalent to applying a linear transformation to ``one-hot" encoding of discrete symbols or data objects. Despite simplicity, these methods generate storage-inefficient representations and fail to effectively encode the internal semantic structure of data, especially when the number of symbols or data points and the dimensionality of the real-valued embedding vectors are large. In this paper, we propose a regularized autoencoder framework to learn compact Hierarchical K-way D-dimensional (HKD) discrete embedding of symbols or data points, aiming at capturing essential semantic structures of data. Experimental results on synthetic and real-world datasets show that our proposed HKD embedding can effectively reveal the semantic structure of data via hierarchical data visualization and greatly reduce the search space of nearest neighbor retrieval while preserving high accuracy.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Şahin Işık ◽  
◽  
Kemal Özkan ◽  

In this study, an automated system for classification of leaf species based on the global and local features is presented by concentrating on a smart and unorthodox decision system. The utilized global features consist of 11 features and are separated into two categories: gross shape features (7) and moment based features (4), respectively. In case of local features, only the curve points on Bézier curves are accepted as discriminative features. With the purpose of reducing the search space and improving the performance of the system, firstly, the class label of leaf object is determined by conducting the global features with respect to predefined threshold values. Once the target class is determined, the local features have been performed on in order to validate the label of leaf sample. After conducting experiments on the K-Nearest Neighbor (K-NN) with Hausdorff distance, this system provides valuable accuracy rate as achieving the 96.78% performance on Flavia and the 94.66% on Swedish dataset. Moreover, by applying a deep learning model, namely, Inception-v3 architecture, the superior results were recorded as 99.11% and 98.95% when compared to state-of-the-art methods. It turns out that one can use our feature extraction and classification technique or Inception-v3 model by considering compromises and commutations about efficiency and effectiveness.


2010 ◽  
Vol 33 (8) ◽  
pp. 1396-1404 ◽  
Author(s):  
Liang ZHAO ◽  
Luo CHEN ◽  
Ning JING ◽  
Wei LIAO

Signals ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 336-352
Author(s):  
Frank Zalkow ◽  
Julian Brandner ◽  
Meinard Müller

Flexible retrieval systems are required for conveniently browsing through large music collections. In a particular content-based music retrieval scenario, the user provides a query audio snippet, and the retrieval system returns music recordings from the collection that are similar to the query. In this scenario, a fast response from the system is essential for a positive user experience. For realizing low response times, one requires index structures that facilitate efficient search operations. One such index structure is the K-d tree, which has already been used in music retrieval systems. As an alternative, we propose to use a modern graph-based index, denoted as Hierarchical Navigable Small World (HNSW) graph. As our main contribution, we explore its potential in the context of a cross-version music retrieval application. In particular, we report on systematic experiments comparing graph- and tree-based index structures in terms of the retrieval quality, disk space requirements, and runtimes. Despite the fact that the HNSW index provides only an approximate solution to the nearest neighbor search problem, we demonstrate that it has almost no negative impact on the retrieval quality in our application. As our main result, we show that the HNSW-based retrieval is several orders of magnitude faster. Furthermore, the graph structure also works well with high-dimensional index items, unlike the tree-based structure. Given these merits, we highlight the practical relevance of the HNSW graph for music information retrieval (MIR) applications.


Sign in / Sign up

Export Citation Format

Share Document