scholarly journals The Maximum Displacement for Linear Probing Hashing

2013 ◽  
Vol 22 (3) ◽  
pp. 455-476
Author(s):  
NICLAS PETERSSON

In this paper we study the maximum displacement for linear probing hashing. We use the standard probabilistic model together with the insertion policy known as First-Come-(First-Served). The results are of asymptotic nature and focus on dense hash tables. That is, the number of occupied cellsnand the size of the hash tablemtend to infinity with ration/m→ 1. We present distributions and moments for the size of the maximum displacement, as well as for the number of items with displacement larger than some critical value. This is done via process convergence of the (appropriately normalized) length of the largest block of consecutive occupied cells, when the total number of occupied cellsnvaries.

Algorithms ◽  
2020 ◽  
Vol 13 (12) ◽  
pp. 338
Author(s):  
Ting Huang ◽  
Zhengping Weng ◽  
Gang Liu ◽  
Zhenwen He

To manage multidimensional point data more efficiently, this paper presents an improvement, called HD-tree, of a previous indexing method, called D-tree. Both structures combine quadtree-like partitioning (using integer shift operations without storing internal nodes, but only leaves) and hash tables (for searching for the nodes stored). However, the HD-tree follows a brand-new decomposition strategy, which is called half decomposition strategy. This improvement avoids the generation of nodes containing only a small amount of data and the sequential search of the hash table, so that it can save storage space while having faster I/O and better time performance when building the tree and querying data. The results demonstrate convincingly that the time and space performance of HD-tree is better than that of D-tree regardless of uniform or uneven data, which are less affected by data distribution.


2014 ◽  
Vol 644-650 ◽  
pp. 3365-3370
Author(s):  
Zhen Hong Guo ◽  
Lin Li ◽  
Qing Wang ◽  
Meng Lin ◽  
Rui Pan

With the rapid development of the Internet, the number of firewall rules is increasing. The enormous quantity of rules challenges the performance of the packet classification that has already become a bottleneck in firewalls. This dissertation proposes a rapid and multi-dimensional algorithm for packet classification based on BSOL(Binary Search On Leaves), which is named FMPC(FastMulti-dimensional Packet Classification). Different from BSOL, FMPC cuts all dimensions at the same time to decompose rule spaces and stores leaf spaces into hash tables; FMPC constructs a Bloom Filter for every hash table and stores them into embedded SRAM. When classifying a packet, FMPC performs parallel queries on Bloom Filters and determines how to visit hash tables according to the results. Algorithm analysis and the result of simulations show: the average number of hash-table lookups of FMPC is 1 when classifying a packet, which is much smaller than that of BSOL; inthe worst case, the number of hash-table lookups of FMPCisO(logwmax+1⁡), which is also smaller than that of BSOL in multi-dimensional environment, where wmax is the length, in bits, of the dimension whose length is the longest..


2002 ◽  
Vol 03 (03n04) ◽  
pp. 105-128 ◽  
Author(s):  
KUN SUK KIM ◽  
SARTAJ SAHNI

Waldvogel et al.9 have proposed a collection of hash tables (CHT) organization for an IP router table. Each hash table in the CHT contains prefixes of the same length together with markers for longer-length prefixes. IP lookup can be done with O( log ldist) hash-table searches, where ldist is the number of distinct prefix-lengths (also equal to the number of hash tables in the CHT). Srinivasan and Varghese8 have proposed the use of controlled prefix-expansion to reduce the value of ldist. The details of their algorithm to reduce the number of lengths are given in [7]. The complexity of this algorithm is O(nW2), where n is the number of prefixes, and W is the length of the longest prefix. The algorithm of [7] does not minimize the storage required by the prefixes and markers for the resulting set of prefixes. We develop an algorithm that minimizes storage requirement but takes O(nW3 + kW4) time, where k is the desired number of distinct lengths. Also, we propose improvements to the heuristic of [7].


2020 ◽  
Vol 10 (6) ◽  
pp. 1915
Author(s):  
Tianqi Zheng ◽  
Zhibin Zhang ◽  
Xueqi Cheng

Hash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set filtering and records deduplication. The performance aspects of hash tables differ drastically with respect to what kind of data are being processed or how many inserts, lookups and deletes are constructed. In this paper, we address some common use cases of hash tables: aggregating and joining over arbitrary string data. We designed a new hash table, SAHA, which is tightly integrated with modern analytical databases and optimized for string data with the following advantages: (1) it inlines short strings and saves hash values for long strings only; (2) it uses special memory loading techniques to do quick dispatching and hashing computations; and (3) it utilizes vectorized processing to batch hashing operations. Our evaluation results reveal that SAHA outperforms state-of-the-art hash tables by one to five times in analytical workloads, including Google’s SwissTable and Facebook’s F14Table. It has been merged into the ClickHouse database and shows promising results in production.


1992 ◽  
Vol 03 (01) ◽  
pp. 55-63
Author(s):  
FABRIZIO LUCCIO ◽  
ANDREA PIETRACAPRINA ◽  
GEPPINO PUCCI

The performance of hash tables is analyzed in a parallel context. Assuming that a hash table of fixed size is allocated in the shared memory of a PRAM with n processors, a Ph-step is defined as a PRAM computation in which each processor searches or inserts a key in the table. It is shown that the maximum number of table probes needed for a single key in a Ph-step is Ω( log 1/αn) and O( log 1/α′n) with high probability, where α and α′ are the load factors before and after the execution of the Ph-step. However, a clever implementation of a Ph-step is proposed, which runs in time O(( log 1/α′n)1/2) with high probability. The algorithm exploits the fact that operations relative to different keys have different durations; hence, the processors in charge of shorter operations, once finished, are used to perform part of the longer ones.


2014 ◽  
Vol 1046 ◽  
pp. 504-507
Author(s):  
Kai Song ◽  
Hai Sheng Li

In the paper, a new scheduling algorithm, priority bitmap and hash table (PBHT) algorithm, is put forward. The important parts of the scheduling algorithm, priority bitmap scheduling algorithm and hash tables are analyzed, and the workflow and time complexity of the scheduling algorithm are described in detail. Series of experiments are designed and completed. The feasibility, rationality, completeness and the scheduling algorithm are verified by the experimental results.


2021 ◽  
Vol 8 (2) ◽  
pp. 1-17
Author(s):  
Oded Green

In this article, we introduce HashGraph, a new scalable approach for building hash tables that uses concepts taken from sparse graph representations—hence, the name HashGraph. HashGraph introduces a new way to deal with hash-collisions that does not use “open-addressing” or “separate-chaining,” yet it has the benefits of both these approaches. HashGraph currently works for static inputs. Recent progress with dynamic graph data structures suggests that HashGraph might be extendable to dynamic inputs as well. We show that HashGraph can deal with a large number of hash values per entry without loss of performance. Last, we show a new querying algorithm for value lookups. We experimentally compare HashGraph to several state-of-the-art implementations and find that it outperforms them on average 2× when the inputs are unique and by as much as 40× when the input contains duplicates. The implementation of HashGraph in this article is for NVIDIA GPUs. HashGraph can build a hash table at a rate of 2.5 billion keys per second on a NVIDIA GV100 GPU and can query at nearly the same rate.


2021 ◽  
Vol 14 (13) ◽  
pp. 3267-3280
Author(s):  
Huayi Wang ◽  
Jingfan Meng ◽  
Long Gong ◽  
Jun Xu ◽  
Mitsunori Ogihara

Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-Sensitive Hashing (LSH) is one of the most popular solution approaches for ANNS. A common shortcoming of many LSH schemes is that since they probe only a single bucket in a hash table, they need to use a large number of hash tables to achieve a high query accuracy. For ANNS- L 2 , a multi-probe scheme was proposed to overcome this drawback by strategically probing multiple buckets in a hash table. In this work, we propose MP-RW-LSH, the first and so far only multi-probe LSH solution to ANNS in L 1 distance, and show that it achieves a better tradeoff between scalability and query efficiency than all existing LSH-based solutions. We also explain why a state-of-the-art ANNS -L 1 solution called Cauchy projection LSH (CP-LSH) is fundamentally not suitable for multi-probe extension. Finally, as a use case, we construct, using MP-RW-LSH as the underlying "ANNS- L 1 engine", a new ANNS-E (E for edit distance) solution that beats the state of the art.


2021 ◽  
Vol 50 (1) ◽  
pp. 87-94
Author(s):  
Baotong Lu ◽  
Xiangpeng Hao ◽  
Tianzheng Wang ◽  
Eric Lo

Byte-addressable persistent memory (PM) brings hash tables the potential of low latency, cheap persistence and instant recovery. The recent advent of Intel Optane DC Persistent Memory Modules (DCPMM) further accelerates this trend. Many new hash table designs have been proposed, but most of them were based on emulation and perform sub-optimally on real PM. They were also piecewise and partial solutions that side-stepped many important properties, in particular good scalability, high load factor and instant recovery.


2021 ◽  
Vol 50 (1) ◽  
pp. 60-67
Author(s):  
Tim Gubner ◽  
Viktor Leis ◽  
Peter Boncz

Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory footprint is often determined by how hash tables and the tuples within them are represented. In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently- and infrequently-accessed value slices. By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Self-aligned Region (USSR) accelerates handling frequently occurring strings, which are widespread in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure. We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2-4× and improves performance by up to 1.5×. On a real-world BI workload, we measured a 2× improvement in performance and in micro-benchmarks we observed speedups of up to 25×.


Sign in / Sign up

Export Citation Format

Share Document