scholarly journals Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

Algorithmica ◽  
2012 ◽  
Vol 66 (2) ◽  
pp. 310-328 ◽  
Author(s):  
Vladimir Pestov
2017 ◽  
Vol 27 (01n02) ◽  
pp. 85-119 ◽  
Author(s):  
Karl Bringmann ◽  
Marvin Künnemann

The Fréchet distance is a well studied and very popular measure of similarity of two curves. The best known algorithms have quadratic time complexity, which has recently been shown to be optimal assuming the Strong Exponential Time Hypothesis (SETH) [Bringmann, FOCS'14]. To overcome the worst-case quadratic time barrier, restricted classes of curves have been studied that attempt to capture realistic input curves. The most popular such class are [Formula: see text]-packed curves, for which the Fréchet distance has a [Formula: see text]-approximation in time [Formula: see text] [Driemel et al., DCG'12]. In dimension [Formula: see text] this cannot be improved to [Formula: see text] for any [Formula: see text] unless SETH fails [Bringmann, FOCS'14]. In this paper, exploiting properties that prevent stronger lower bounds, we present an improved algorithm with time complexity [Formula: see text]. This improves upon the algorithm by Driemel et al. for any [Formula: see text]. Moreover, our algorithm's dependence on [Formula: see text], [Formula: see text] and [Formula: see text] is optimal in high dimensions apart from lower order factors, unless SETH fails. Our main new ingredients are as follows: For filling the classical free-space diagram we project short subcurves onto a line, which yields one-dimensional separated curves with roughly the same pairwise distances between vertices. Then we tackle this special case in near-linear time by carefully extending a greedy algorithm for the Fréchet distance of one-dimensional separated curves.


2016 ◽  
Vol 2016 ◽  
pp. 1-12
Author(s):  
Chunyan Shuai ◽  
Hengcheng Yang ◽  
Xin Ouyang ◽  
Siqi Li ◽  
Zheng Chen

In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values.


Author(s):  
Rahul Malik ◽  
Sangkyum Kim ◽  
Xin Jin ◽  
Chandrasekar Ramachandran ◽  
Jiawei Han ◽  
...  

2021 ◽  
Vol 50 (1) ◽  
pp. 42-49
Author(s):  
Martin Aumuller ◽  
Sariel Har-Peled ◽  
Sepideh Mahabadi ◽  
Rasmus Pagh ◽  
Francesco Silvestri

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points S and a radius parameter r > 0, the rnear neighbor (r-NN) problem asks for a data structure that, given any query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee.


Sign in / Sign up

Export Citation Format

Share Document