Scaling Dynamic Hash Tables on Real Persistent Memory

Byte-addressable persistent memory (PM) brings hash tables the potential of low latency, cheap persistence and instant recovery. The recent advent of Intel Optane DC Persistent Memory Modules (DCPMM) further accelerates this trend. Many new hash table designs have been proposed, but most of them were based on emulation and perform sub-optimally on real PM. They were also piecewise and partial solutions that side-stepped many important properties, in particular good scalability, high load factor and instant recovery.

Download Full-text

Persistent memory hash indexes

Proceedings of the VLDB Endowment ◽

10.14778/3446095.3446101 ◽

2021 ◽

Vol 14 (5) ◽

pp. 785-798

Author(s):

Daokun Hu ◽

Zhiwen Chen ◽

Jianbing Wu ◽

Jianhua Sun ◽

Hao Chen

Keyword(s):

Future Development ◽

High Performance ◽

Performance Metrics ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Hash Tables ◽

Trade Offs ◽

Depth Analysis ◽

Persistent Memory ◽

Memory Modules

Persistent memory (PM) is increasingly being leveraged to build hash-based indexing structures featuring cheap persistence, high performance, and instant recovery, especially with the recent release of Intel Optane DC Persistent Memory Modules. However, most of them are evaluated on DRAM-based emulators with unreal assumptions, or focus on the evaluation of specific metrics with important properties sidestepped. Thus, it is essential to understand how well the proposed hash indexes perform on real PM and how they differentiate from each other if a wider range of performance metrics are considered. To this end, this paper provides a comprehensive evaluation of persistent hash tables. In particular, we focus on the evaluation of six state-of-the-art hash tables including Level hashing, CCEH, Dash, PCLHT, Clevel, and SOFT, with real PM hardware. Our evaluation was conducted using a unified benchmarking framework and representative workloads. Besides characterizing common performance properties, we also explore how hardware configurations (such as PM bandwidth, CPU instructions, and NUMA) affect the performance of PM-based hash tables. With our in-depth analysis, we identify design trade-offs and good paradigms in prior arts, and suggest desirable optimizations and directions for the future development of PM-based hash tables.

Download Full-text

Comparison on Search Failure between Hash Tables and a Functional Bloom Filter

Applied Sciences ◽

10.3390/app10155218 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5218 ◽

Cited By ~ 1

Author(s):

Hayoung Byun ◽

Hyesook Lim

Keyword(s):

Failure Rate ◽

Data Structures ◽

Hash Table ◽

Bloom Filter ◽

Load Factor ◽

Failure Rates ◽

Hash Tables ◽

Collision Problem ◽

Simulation Results ◽

Return Value

Hash-based data structures have been widely used in many applications. An intrinsic problem of hashing is collision, in which two or more elements are hashed to the same value. If a hash table is heavily loaded, more collisions would occur. Elements that could not be stored in a hash table because of the collision cause search failures. Many variant structures have been studied to reduce the number of collisions, but none of the structures completely solves the collision problem. In this paper, we claim that a functional Bloom filter (FBF) provides a lower search failure rate than hash tables, when a hash table is heavily loaded. In other words, a hash table can be replaced with an FBF because the FBF is more effective than hash tables in the search failure rate in storing a large amount of data to a limited size of memory. While hash tables require to store each input key in addition to its return value, a functional Bloom filter stores return values without input keys, because different index combinations according to each input key can be used to identify the input key. In search failure rates, we theoretically compare the FBF with hash-based data structures, such as multi-hash table, cuckoo hash table, and d-left hash table. We also provide simulation results to prove the validity of our theoretical results. The simulation results show that the search failure rates of hash tables are larger than that of the functional Bloom filter when the load factor is larger than 0.6.

Download Full-text

Performance characterization of a DRAM-NVM hybrid memory architecture for HPC applications using intel optane DC persistent memory modules

Proceedings of the International Symposium on Memory Systems - MEMSYS '19 ◽

10.1145/3357526.3357541 ◽

2019 ◽

Cited By ~ 4

Author(s):

Onkar Patil ◽

Latchesar Ionkov ◽

Jason Lee ◽

Frank Mueller ◽

Michael Lang

Keyword(s):

Memory Architecture ◽

Performance Characterization ◽

Hybrid Memory ◽

Persistent Memory ◽

Memory Modules

Download Full-text

PHPRX: An Efficient Hash Table for Persistent Memory

Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures ◽

10.1145/3409964.3461820 ◽

2021 ◽

Author(s):

Diego Cepeda ◽

Wojciech Golab

Keyword(s):

Hash Table ◽

Persistent Memory

Download Full-text

Octopus + : An RDMA-Enabled Distributed Persistent Memory File System

ACM Transactions on Storage ◽

10.1145/3448418 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-25

Author(s):

Bohong Zhu ◽

Youmin Chen ◽

Qing Wang ◽

Youyou Lu ◽

Jiwu Shu

Keyword(s):

High Speed ◽

High Performance ◽

File System ◽

Direct Memory Access ◽

File Systems ◽

Distributed File Systems ◽

Persistent Memory ◽

Memory Modules ◽

Non Volatile Memory ◽

Volatile Memory

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.

Download Full-text

Analysis of Robin Hood and Other Hashing Algorithms Under the Random Probing Model, With and Without Deletions

Combinatorics Probability Computing ◽

10.1017/s0963548318000408 ◽

2018 ◽

Vol 28 (4) ◽

pp. 600-617

Author(s):

P. V. POBLETE ◽

A. VIOLA

Keyword(s):

Search Algorithm ◽

Recurrence Equation ◽

Constant Time ◽

Collision Resolution ◽

Load Factor ◽

Hash Tables ◽

Search Cost ◽

Robin Hood ◽

Small Constant ◽

Load Factors

Thirty years ago, the Robin Hood collision resolution strategy was introduced for open addressing hash tables, and a recurrence equation was found for the distribution of its search cost. Although this recurrence could not be solved analytically, it allowed for numerical computations that, remarkably, suggested that the variance of the search cost approached a value of 1.883 when the table was full. Furthermore, by using a non-standard mean-centred search algorithm, this would imply that searches could be performed in expected constant time even in a full table.In spite of the time elapsed since these observations were made, no progress has been made in proving them. In this paper we introduce a technique to work around the intractability of the recurrence equation by solving instead an associated differential equation. While this does not provide an exact solution, it is sufficiently powerful to prove a bound of π2/3 for the variance, and thus obtain a proof that the variance of Robin Hood is bounded by a small constant for load factors arbitrarily close to 1. As a corollary, this proves that the mean-centred search algorithm runs in expected constant time.We also use this technique to study the performance of Robin Hood hash tables under a long sequence of insertions and deletions, where deletions are implemented by marking elements as deleted. We prove that, in this case, the variance is bounded by 1/(1−α), where α is the load factor.To model the behaviour of these hash tables, we use a unified approach that we apply also to study the First-Come-First-Served and Last-Come-First-Served collision resolution disciplines, both with and without deletions.

Download Full-text

NVLSM: A Persistent Memory Key-Value Store Using Log-Structured Merge Tree with Accumulative Compaction

ACM Transactions on Storage ◽

10.1145/3453300 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-26

Author(s):

Baoquan Zhang ◽

David H. C. Du

Keyword(s):

Computer Systems ◽

Memory Storage ◽

Low Latency ◽

Latency Data ◽

Persistent Memory ◽

Write Amplification ◽

Non Volatile Memory ◽

Data Persistence ◽

Average Latency ◽

Volatile Memory

Computer systems utilizing byte-addressable Non-Volatile Memory ( NVM ) as memory/storage can provide low-latency data persistence. The widely used key-value stores using Log-Structured Merge Tree ( LSM-Tree ) are still beneficial for NVM systems in aspects of the space and write efficiency. However, the significant write amplification introduced by the leveled compaction of LSM-Tree degrades the write performance of the key-value store and shortens the lifetime of the NVM devices. The existing studies propose new compaction methods to reduce write amplification. Unfortunately, they result in a relatively large read amplification. In this article, we propose NVLSM, a key-value store for NVM systems using LSM-Tree with new accumulative compaction. By fully utilizing the byte-addressability of NVM, accumulative compaction uses pointers to accumulate data into multiple floors in a logically sorted run to reduce the number of compactions required. We have also proposed a cascading searching scheme for reads among the multiple floors to reduce read amplification. Therefore, NVLSM reduces write amplification with small increases in read amplification. We compare NVLSM with key-value stores using LSM-Tree with two other compaction methods: leveled compaction and fragmented compaction. Our evaluations show that NVLSM reduces write amplification by up to 67% compared with LSM-Tree using leveled compaction without significantly increasing the read amplification. In write-intensive workloads, NVLSM reduces the average latency by 15.73%–41.2% compared to other key-value stores.

Download Full-text

Building blocks for persistent memory

The VLDB Journal ◽

10.1007/s00778-020-00622-9 ◽

2020 ◽

Vol 29 (6) ◽

pp. 1223-1241

Author(s):

Alexander van Renen ◽

Lukas Vogel ◽

Viktor Leis ◽

Thomas Neumann ◽

Alfons Kemper

Keyword(s):

Performance Evaluation ◽

Building Blocks ◽

Database Systems ◽

Performance Bottlenecks ◽

Persistent Memory ◽

Comprehensive Performance ◽

Memory Modules ◽

Real Hardware ◽

Writing Block ◽

Level Building

AbstractI/O latency and throughput are two of the major performance bottlenecks for disk-based database systems. Persistent memory (PMem) technologies, like Intel’s Optane DC persistent memory modules, promise to bridge the gap between NAND-based flash (SSD) and DRAM, and thus eliminate the I/O bottleneck. In this paper, we provide the first comprehensive performance evaluation of PMem on real hardware in terms of bandwidth and latency. Based on the results, we develop guidelines for efficient PMem usage and four optimized low-level building blocks for PMem applications: log writing, block flushing, in-place updates, and coroutines for write latency hiding.

Download Full-text

The Maximum Displacement for Linear Probing Hashing

Combinatorics Probability Computing ◽

10.1017/s0963548312000582 ◽

2013 ◽

Vol 22 (3) ◽

pp. 455-476

Author(s):

NICLAS PETERSSON

Keyword(s):

Probabilistic Model ◽

Hash Table ◽

Critical Value ◽

Hash Tables ◽

Maximum Displacement ◽

Asymptotic Nature ◽

Process Convergence

In this paper we study the maximum displacement for linear probing hashing. We use the standard probabilistic model together with the insertion policy known as First-Come-(First-Served). The results are of asymptotic nature and focus on dense hash tables. That is, the number of occupied cellsnand the size of the hash tablemtend to infinity with ration/m→ 1. We present distributions and moments for the size of the maximum displacement, as well as for the number of items with displacement larger than some critical value. This is done via process convergence of the (appropriately normalized) length of the largest block of consecutive occupied cells, when the total number of occupied cellsnvaries.

Download Full-text

HD-Tree: An Efficient High-Dimensional Virtual Index Structure Using a Half Decomposition Strategy

Algorithms ◽

10.3390/a13120338 ◽

2020 ◽

Vol 13 (12) ◽

pp. 338

Author(s):

Ting Huang ◽

Zhengping Weng ◽

Gang Liu ◽

Zhenwen He

Keyword(s):

Hash Table ◽

Index Structure ◽

High Dimensional ◽

Sequential Search ◽

Storage Space ◽

Hash Tables ◽

Time Performance ◽

Indexing Method ◽

Point Data ◽

Better Than

To manage multidimensional point data more efficiently, this paper presents an improvement, called HD-tree, of a previous indexing method, called D-tree. Both structures combine quadtree-like partitioning (using integer shift operations without storing internal nodes, but only leaves) and hash tables (for searching for the nodes stored). However, the HD-tree follows a brand-new decomposition strategy, which is called half decomposition strategy. This improvement avoids the generation of nodes containing only a small amount of data and the sequential search of the hash table, so that it can save storage space while having faster I/O and better time performance when building the tree and querying data. The results demonstrate convincingly that the time and space performance of HD-tree is better than that of D-tree regardless of uniform or uneven data, which are less affected by data distribution.

Download Full-text