Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

The Fréchet distance is a well studied and very popular measure of similarity of two curves. The best known algorithms have quadratic time complexity, which has recently been shown to be optimal assuming the Strong Exponential Time Hypothesis (SETH) [Bringmann, FOCS'14]. To overcome the worst-case quadratic time barrier, restricted classes of curves have been studied that attempt to capture realistic input curves. The most popular such class are [Formula: see text]-packed curves, for which the Fréchet distance has a [Formula: see text]-approximation in time [Formula: see text] [Driemel et al., DCG'12]. In dimension [Formula: see text] this cannot be improved to [Formula: see text] for any [Formula: see text] unless SETH fails [Bringmann, FOCS'14]. In this paper, exploiting properties that prevent stronger lower bounds, we present an improved algorithm with time complexity [Formula: see text]. This improves upon the algorithm by Driemel et al. for any [Formula: see text]. Moreover, our algorithm's dependence on [Formula: see text], [Formula: see text] and [Formula: see text] is optimal in high dimensions apart from lower order factors, unless SETH fails. Our main new ingredients are as follows: For filling the classical free-space diagram we project short subcurves onto a line, which yields one-dimensional separated curves with roughly the same pairwise distances between vertices. Then we tackle this special case in near-linear time by carefully extending a greedy algorithm for the Fréchet distance of one-dimensional separated curves.

Download Full-text

A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters

Computational Intelligence and Neuroscience ◽

10.1155/2016/4075257 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12

Author(s):

Chunyan Shuai ◽

Hengcheng Yang ◽

Xin Ouyang ◽

Siqi Li ◽

Zheng Chen

Keyword(s):

Similarity Search ◽

Hamming Distance ◽

Bloom Filters ◽

Hash Tables ◽

High Dimensions ◽

Data Formats ◽

Overall Ratio ◽

Search Structure ◽

And Storage ◽

False Positive Probability

In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values.

Download Full-text

MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions

Lecture Notes in Computer Science - Scientific and Statistical Database Management ◽

10.1007/978-3-642-02279-1_12 ◽

2009 ◽

pp. 167-184 ◽

Cited By ~ 2

Author(s):

Rahul Malik ◽

Sangkyum Kim ◽

Xin Jin ◽

Chandrasekar Ramachandran ◽

Jiawei Han ◽

...

Keyword(s):

Similarity Search ◽

Index Structure ◽

High Dimensions

Download Full-text

Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2007.1008 ◽

2007 ◽

Vol 19 (5) ◽

pp. 585-598 ◽

Cited By ~ 10

Author(s):

Daniel Fogaras ◽

Balazs Racz

Keyword(s):

Lower Bounds ◽

Similarity Search ◽

Practical Algorithms ◽

Massive Graphs

Download Full-text

Lower bounds for the weak type (1, 1) estimate for the maximal function associated to cubes in high dimensions

Mathematical Research Letters ◽

10.4310/mrl.2013.v20.n5.a7 ◽

2013 ◽

Vol 20 (5) ◽

pp. 907-918 ◽

Cited By ~ 2

Author(s):

A. S. Iakovlev ◽

J.-O. Strömberg

Keyword(s):

Lower Bounds ◽

Maximal Function ◽

Weak Type ◽

High Dimensions

Download Full-text

Distributed similarity search in high dimensions using locality sensitive hashing

Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09 ◽

10.1145/1516360.1516446 ◽

2009 ◽

Cited By ~ 49

Author(s):

Parisa Haghani ◽

Sebastian Michel ◽

Karl Aberer

Keyword(s):

Similarity Search ◽

Locality Sensitive Hashing ◽

High Dimensions

Download Full-text

Fair near neighbor search via sampling

ACM SIGMOD Record ◽

10.1145/3471485.3471496 ◽

2021 ◽

Vol 50 (1) ◽

pp. 42-49

Author(s):

Martin Aumuller ◽

Sariel Har-Peled ◽

Sepideh Mahabadi ◽

Rasmus Pagh ◽

Francesco Silvestri

Keyword(s):

Similarity Search ◽

Near Neighbor ◽

Locality Sensitive Hashing ◽

Query Point ◽

Equal Opportunities ◽

High Dimensions ◽

Radius Parameter ◽

Neighbor Search ◽

Low Dimensional ◽

Set Of Points

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points S and a radius parameter r > 0, the rnear neighbor (r-NN) problem asks for a data structure that, given any query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee.

Download Full-text