HashGraph—Scalable Hash Tables Using a Sparse Graph Data Structure

In this article, we introduce HashGraph, a new scalable approach for building hash tables that uses concepts taken from sparse graph representations—hence, the name HashGraph. HashGraph introduces a new way to deal with hash-collisions that does not use “open-addressing” or “separate-chaining,” yet it has the benefits of both these approaches. HashGraph currently works for static inputs. Recent progress with dynamic graph data structures suggests that HashGraph might be extendable to dynamic inputs as well. We show that HashGraph can deal with a large number of hash values per entry without loss of performance. Last, we show a new querying algorithm for value lookups. We experimentally compare HashGraph to several state-of-the-art implementations and find that it outperforms them on average 2× when the inputs are unique and by as much as 40× when the input contains duplicates. The implementation of HashGraph in this article is for NVIDIA GPUs. HashGraph can build a hash table at a rate of 2.5 billion keys per second on a NVIDIA GV100 GPU and can query at nearly the same rate.

Download Full-text

SAHA: A String Adaptive Hash Table for Analytical Databases

Applied Sciences ◽

10.3390/app10061915 ◽

2020 ◽

Vol 10 (6) ◽

pp. 1915

Author(s):

Tianqi Zheng ◽

Zhibin Zhang ◽

Xueqi Cheng

Keyword(s):

Data Structure ◽

State Of The Art ◽

Long Strings ◽

Hash Table ◽

Use Cases ◽

Hash Tables ◽

Modern Analytical

Hash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set filtering and records deduplication. The performance aspects of hash tables differ drastically with respect to what kind of data are being processed or how many inserts, lookups and deletes are constructed. In this paper, we address some common use cases of hash tables: aggregating and joining over arbitrary string data. We designed a new hash table, SAHA, which is tightly integrated with modern analytical databases and optimized for string data with the following advantages: (1) it inlines short strings and saves hash values for long strings only; (2) it uses special memory loading techniques to do quick dispatching and hashing computations; and (3) it utilizes vectorized processing to batch hashing operations. Our evaluation results reveal that SAHA outperforms state-of-the-art hash tables by one to five times in analytical workloads, including Google’s SwissTable and Facebook’s F14Table. It has been merged into the ClickHouse database and shows promising results in production.

Download Full-text

Performance Effects of Dynamic Graph Data Structures in Community Detection Algorithms

2018 IEEE High Performance extreme Computing Conference (HPEC) ◽

10.1109/hpec.2018.8547528 ◽

2018 ◽

Author(s):

Rohit Varkey Thankachan ◽

Brian P. Swenson ◽

James P. Fairbanks

Keyword(s):

Community Detection ◽

Data Structures ◽

Dynamic Graph ◽

Graph Data ◽

Detection Algorithms ◽

Performance Effects

Download Full-text

MP-RW-LSH

Proceedings of the VLDB Endowment ◽

10.14778/3484224.3484226 ◽

2021 ◽

Vol 14 (13) ◽

pp. 3267-3280

Author(s):

Huayi Wang ◽

Jingfan Meng ◽

Long Gong ◽

Jun Xu ◽

Mitsunori Ogihara

Keyword(s):

Nearest Neighbor ◽

Edit Distance ◽

State Of The Art ◽

Hash Table ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Algorithmic Problem ◽

Use Case ◽

Hash Tables ◽

Neighbor Search

Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-Sensitive Hashing (LSH) is one of the most popular solution approaches for ANNS. A common shortcoming of many LSH schemes is that since they probe only a single bucket in a hash table, they need to use a large number of hash tables to achieve a high query accuracy. For ANNS- L 2 , a multi-probe scheme was proposed to overcome this drawback by strategically probing multiple buckets in a hash table. In this work, we propose MP-RW-LSH, the first and so far only multi-probe LSH solution to ANNS in L 1 distance, and show that it achieves a better tradeoff between scalability and query efficiency than all existing LSH-based solutions. We also explain why a state-of-the-art ANNS -L 1 solution called Cauchy projection LSH (CP-LSH) is fundamentally not suitable for multi-probe extension. Finally, as a use case, we construct, using MP-RW-LSH as the underlying "ANNS- L 1 engine", a new ANNS-E (E for edit distance) solution that beats the state of the art.

Download Full-text

Comparison on Search Failure between Hash Tables and a Functional Bloom Filter

Applied Sciences ◽

10.3390/app10155218 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5218 ◽

Cited By ~ 1

Author(s):

Hayoung Byun ◽

Hyesook Lim

Keyword(s):

Failure Rate ◽

Data Structures ◽

Hash Table ◽

Bloom Filter ◽

Load Factor ◽

Failure Rates ◽

Hash Tables ◽

Collision Problem ◽

Simulation Results ◽

Return Value

Hash-based data structures have been widely used in many applications. An intrinsic problem of hashing is collision, in which two or more elements are hashed to the same value. If a hash table is heavily loaded, more collisions would occur. Elements that could not be stored in a hash table because of the collision cause search failures. Many variant structures have been studied to reduce the number of collisions, but none of the structures completely solves the collision problem. In this paper, we claim that a functional Bloom filter (FBF) provides a lower search failure rate than hash tables, when a hash table is heavily loaded. In other words, a hash table can be replaced with an FBF because the FBF is more effective than hash tables in the search failure rate in storing a large amount of data to a limited size of memory. While hash tables require to store each input key in addition to its return value, a functional Bloom filter stores return values without input keys, because different index combinations according to each input key can be used to identify the input key. In search failure rates, we theoretically compare the FBF with hash-based data structures, such as multi-hash table, cuckoo hash table, and d-left hash table. We also provide simulation results to prove the validity of our theoretical results. The simulation results show that the search failure rates of hash tables are larger than that of the functional Bloom filter when the load factor is larger than 0.6.

Download Full-text

Recent Progress and State of the Art in Seismo-Electromagnetic Study

IEEJ Transactions on Fundamentals and Materials ◽

10.1541/ieejfms.130.2 ◽

2010 ◽

Vol 130 (1) ◽

pp. 2-5

Author(s):

Katsumi Hattori

Keyword(s):

Recent Progress ◽

State Of The Art

Download Full-text

Fast lightweight accurate xenograft sorting

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00181-w ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Jens Zentgraf ◽

Sven Rahmann

Keyword(s):

State Of The Art ◽

Hash Table ◽

Human Tumor ◽

Surrounding Tissue ◽

Cpu Time ◽

Alignment Free ◽

Time Usage ◽

The One ◽

Similar Accuracy ◽

Software Prefetching

Abstract Motivation With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species’ (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results. Results We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further. Availability Our software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort. It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing.

Download Full-text

Solid-state nanopore systems: from materials to applications

NPG Asia Materials ◽

10.1038/s41427-021-00313-z ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Yuhui He ◽

Makusu Tsutsui ◽

Yue Zhou ◽

Xiang-Shui Miao

Keyword(s):

Solid State ◽

Recent Progress ◽

State Of The Art ◽

Ion Selectivity ◽

Dna Origami ◽

Fundamental Interest ◽

Reverse Electrodialysis ◽

Promising Application ◽

Flow Through ◽

Wall Materials

AbstractIon transport and hydrodynamic flow through nanometer-sized channels (nanopores) have been increasingly studied owing to not only the fundamental interest in the abundance of novel phenomena that has been observed but also their promising application in innovative nanodevices, including next-generation sequencers, nanopower generators, and memristive synapses. We first review various kinds of materials and the associated state-of-the-art processes developed for fabricating nanoscale pores, including the emerging structures of DNA origami and 2-dimensional nanopores. Then, the unique transport phenomena are examined wherein the surface properties of wall materials play predominant roles in inducing intriguing characteristics, such as ion selectivity and reverse electrodialysis. Finally, we highlight recent progress in the potential application of nanopores, ranging from their use in biosensors to nanopore-based artificial synapses.

Download Full-text

Recent progress and challenges in microbial polyhydroxybutyrate (PHB) production from CO2 as a sustainable feedstock: A state-of-the-art review

Bioresource Technology ◽

10.1016/j.biortech.2021.125616 ◽

2021 ◽

pp. 125616

Author(s):

Jiye Lee ◽

Hyun June Park ◽

Myounghoon Moon ◽

Jin-Suk Lee ◽

Kyoungseon Min

Keyword(s):

Recent Progress ◽

State Of The Art ◽

Phb Production

Download Full-text

Persistent memory hash indexes

Proceedings of the VLDB Endowment ◽

10.14778/3446095.3446101 ◽

2021 ◽

Vol 14 (5) ◽

pp. 785-798

Author(s):

Daokun Hu ◽

Zhiwen Chen ◽

Jianbing Wu ◽

Jianhua Sun ◽

Hao Chen

Keyword(s):

Future Development ◽

High Performance ◽

Performance Metrics ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Hash Tables ◽

Trade Offs ◽

Depth Analysis ◽

Persistent Memory ◽

Memory Modules

Persistent memory (PM) is increasingly being leveraged to build hash-based indexing structures featuring cheap persistence, high performance, and instant recovery, especially with the recent release of Intel Optane DC Persistent Memory Modules. However, most of them are evaluated on DRAM-based emulators with unreal assumptions, or focus on the evaluation of specific metrics with important properties sidestepped. Thus, it is essential to understand how well the proposed hash indexes perform on real PM and how they differentiate from each other if a wider range of performance metrics are considered. To this end, this paper provides a comprehensive evaluation of persistent hash tables. In particular, we focus on the evaluation of six state-of-the-art hash tables including Level hashing, CCEH, Dash, PCLHT, Clevel, and SOFT, with real PM hardware. Our evaluation was conducted using a unified benchmarking framework and representative workloads. Besides characterizing common performance properties, we also explore how hardware configurations (such as PM bandwidth, CPU instructions, and NUMA) affect the performance of PM-based hash tables. With our in-depth analysis, we identify design trade-offs and good paradigms in prior arts, and suggest desirable optimizations and directions for the future development of PM-based hash tables.

Download Full-text

Mobile Blended Learning with eSquirrel

Διεθνές Συνέδριο για την Ανοικτή & εξ Αποστάσεως Εκπαίδευση ◽

10.12681/icodl2015.62 ◽

2015 ◽

Vol 8 (4Β) ◽

Author(s):

Michael Maurer

Keyword(s):

Blended Learning ◽

Mobile Learning ◽

Recent Progress ◽

State Of The Art ◽

The State ◽

Educational Institutions ◽

Classroom Teaching ◽

Learning Platform ◽

Start Up ◽

Innovative Solution

This article outlines the state of the art of mobile blended learning apps. It describes recent progress in this area, and explains the potential of mobile blended learning for schools and educational institutions. Furthermore, it presents an innovative solution, eSquirrel, which is developed by an Austrian inter-disciplinary start-up. eSquirrel is a blended learning platform that combines mobile learning with gamification. It blends the concepts of classroom teaching, eLearning and learning from books into a native Android and iOS course app, and enables teachers to learn their students’ progress.

Download Full-text