hash tables
Recently Published Documents


TOTAL DOCUMENTS

351
(FIVE YEARS 60)

H-INDEX

22
(FIVE YEARS 2)

2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Shaobo Wang ◽  
Yujia Liu

This exploration is aimed at quickly obtaining the spatial position information of microseismic focal points and increasing the accuracy of microseismic rapid positioning, to take timely corresponding measures. A microseismic focal point location system completely different from the traditional microseismic location method is proposed. The search engine technology is introduced into the system, which can locate the microseismic focal point quickly and accurately. First, the propagation characteristics of microseismic signals in coal and rock layers are analyzed, and the focal position information is obtained. However, the collected microseismic signal of the coal mine contains noise, so it is denoised at first. Then, a waveform database is established for the denoised waveform data and focal point position. The structure and mathematical model of the location-sensitive hash (LSH) based on P stable distribution are introduced and improved, and the optimized algorithm multiprobe LSH is obtained. The microseismic location model is established according to the characteristics of microseismic data. The values of three parameters, hash table number, hash function family dimension, and interval size, are determined. The experimental data of the parameters of the search engine algorithm are analyzed. The results show that when the number of hash tables is 6, the dimension k of the hash function family is 14, and the interval size W is 8000, the retrieval time reaches a relatively small value, the recall rate reaches a large value, and the proportion of retrieved candidates is large; the parameters of the search engine algorithm of the measured coal mine microseismic data are analyzed. It is obtained that when the number of hash tables is 4, the dimension k of the hash function family is 6, and the interval size W is 500, the retrieval time reaches a relatively small value, the recall rate obtains a large value, and the proportion of retrieved candidates is large. The contents studied are of great significance to the evaluation of destructive mine earthquakes and impact risk.


Author(s):  
Andrii Dashkevych

The paper presents an approach to solving problems of spatial processing on sets of points on a plane. The presented method consists in plotting regions of an arbitrary geometric shape near given points of the set on a regular grid and determining the intersection points of the regions using spatial hash tables to improve the efficiency of operations. The proposed approach is implemented in the form of software for determining the spatial relationships between points as a sequence of operations with discretized sets and allows visualization of research results. Figs.: 2. Refs.: 13. Keywords: spatial processing task; point set; plane; regular grid; spatial hash table.


Cryptographic hash functions are which transform any long message to fixed-length data. It seeks to ensure the confidentiality of the data through the cryptographic hash. The digital forensic tool is a method for extracting information from various storage devices, such as hard drives, memory. SHA-1 and SHA-2 methods are both widely used in forensic image archives. The hash method is usually used during evidence processing, the checking of forensic images (duplicate evidence), then at the completion of the analysis again to ensure data integrity and forensic evaluation of evidence. There was a vulnerability called a collision in the hashing algorithm in which two independent messages had the same hash values. While SHA-3 is secure than its former counterparts, the processors for general purposes are being slow and are not yet so popular. This task proposes a basic yet successful framework to meet the needs of cyber forensics, combining hash functions with other cryptographic concepts, for instance, SALT, such as modified secured hash algorithm (MSHA). A salt applies to the hashing mechanism to make it exclusive, expand its complexity and reduce user attacks like hash tables without increasing user requirements.


2021 ◽  
Vol 15 (1) ◽  
pp. 112-126
Author(s):  
Subarna Chatterjee ◽  
Meena Jagadeesan ◽  
Wilson Qin ◽  
Stratos Idreos

We present a self-designing key-value storage engine, Cosine, which can always take the shape of the close to "perfect" engine architecture given an input workload, a cloud budget, a target performance, and required cloud SLAs. By identifying and formalizing the first principles of storage engine layouts and core key-value algorithms, Cosine constructs a massive design space comprising of sextillion (10 36 ) possible storage engine designs over a diverse space of hardware and cloud pricing policies for three cloud providers - AWS, GCP, and Azure. Cosine spans across diverse designs such as Log-Structured Merge-trees, B-trees, Log-Structured Hash-tables, in-memory accelerators for filters and indexes as well as trillions of hybrid designs that do not appear in the literature or industry but emerge as valid combinations of the above. Cosine includes a unified distribution-aware I/O model and a learned concurrency-aware CPU model that with high accuracy can calculate the performance and cloud cost of any possible design on any workload and virtual machines. Cosine can then search through that space in a matter of seconds to find the best design and materializes the actual code of the resulting storage engine design using a templated Rust implementation. We demonstrate that on average Cosine outperforms state-of-the-art storage engines such as write-optimized RocksDB, read-optimized WiredTiger, and very write-optimized FASTER by 53x, 25x, and 20x, respectively, for diverse workloads, data sizes, and cloud budgets across all YCSB core workloads and many variants.


2021 ◽  
Vol 14 (13) ◽  
pp. 3267-3280
Author(s):  
Huayi Wang ◽  
Jingfan Meng ◽  
Long Gong ◽  
Jun Xu ◽  
Mitsunori Ogihara

Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-Sensitive Hashing (LSH) is one of the most popular solution approaches for ANNS. A common shortcoming of many LSH schemes is that since they probe only a single bucket in a hash table, they need to use a large number of hash tables to achieve a high query accuracy. For ANNS- L 2 , a multi-probe scheme was proposed to overcome this drawback by strategically probing multiple buckets in a hash table. In this work, we propose MP-RW-LSH, the first and so far only multi-probe LSH solution to ANNS in L 1 distance, and show that it achieves a better tradeoff between scalability and query efficiency than all existing LSH-based solutions. We also explain why a state-of-the-art ANNS -L 1 solution called Cauchy projection LSH (CP-LSH) is fundamentally not suitable for multi-probe extension. Finally, as a use case, we construct, using MP-RW-LSH as the underlying "ANNS- L 1 engine", a new ANNS-E (E for edit distance) solution that beats the state of the art.


2021 ◽  
Vol 28 (2) ◽  
pp. 25-38
Author(s):  
Fábio Carlos Moreno ◽  
Cinthyan Sachs C. de Barbosa ◽  
Edio Roberto Manfio

This paper deals with the construction of digital lexicons within the scope of Natural Language Processing. Data Structures called Hash Tables have demonstrated to generate good results for Natural Language Interface for Databases and have data dispersion, response speed and programming simplicity as main features. The storage of the desired information is done by associating a key through the hashing functions that is responsible for distributing the information in this table. The objective of this paper is to present the tool called Visual TaHs that uses a sparse table to a real lexicon (Lexicon of Herbs), improving performance results of several implemented hash functions. Such structure has achieved satisfactory results in terms of speed and storage when compared to conventional databases and can work in various media, such as desktop, Web and mobile.


2021 ◽  
Vol 50 (1) ◽  
pp. 59-59
Author(s):  
Marcin Zukowski

Hash tables are possibly the single most researched element of the database query processing layers. There are many good reasons for that. They are critical for some key operations like joins and aggregation, and as such are one of the largest contributors to the overall query performance. Their efficiency is heavily impacted by variations of workloads, hardware and implementation, leading to many research opportunities. At the same time, they are sufficiently small and local in scope, allowing a starting researcher, or even a student, to understand them and contribute novel ideas. And benchmark them. . . Oh, the benchmarks. . . :)


2021 ◽  
Vol 50 (1) ◽  
pp. 87-94
Author(s):  
Baotong Lu ◽  
Xiangpeng Hao ◽  
Tianzheng Wang ◽  
Eric Lo

Byte-addressable persistent memory (PM) brings hash tables the potential of low latency, cheap persistence and instant recovery. The recent advent of Intel Optane DC Persistent Memory Modules (DCPMM) further accelerates this trend. Many new hash table designs have been proposed, but most of them were based on emulation and perform sub-optimally on real PM. They were also piecewise and partial solutions that side-stepped many important properties, in particular good scalability, high load factor and instant recovery.


2021 ◽  
Vol 50 (1) ◽  
pp. 60-67
Author(s):  
Tim Gubner ◽  
Viktor Leis ◽  
Peter Boncz

Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory footprint is often determined by how hash tables and the tuples within them are represented. In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently- and infrequently-accessed value slices. By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Self-aligned Region (USSR) accelerates handling frequently occurring strings, which are widespread in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure. We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2-4× and improves performance by up to 1.5×. On a real-world BI workload, we measured a 2× improvement in performance and in micro-benchmarks we observed speedups of up to 25×.


Sign in / Sign up

Export Citation Format

Share Document