scholarly journals Locality-Sensitive Hashing for Information Retrieval System on Multiple GPGPU Devices

2020 ◽  
Vol 10 (7) ◽  
pp. 2539 ◽  
Author(s):  
Toan Nguyen Mau ◽  
Yasushi Inoguchi

It is challenging to build a real-time information retrieval system, especially for systems with high-dimensional big data. To structure big data, many hashing algorithms that map similar data items to the same bucket to advance the search have been proposed. Locality-Sensitive Hashing (LSH) is a common approach for reducing the number of dimensions of a data set, by using a family of hash functions and a hash table. The LSH hash table is an additional component that supports the indexing of hash values (keys) for the corresponding data/items. We previously proposed the Dynamic Locality-Sensitive Hashing (DLSH) algorithm with a dynamically structured hash table, optimized for storage in the main memory and General-Purpose computation on Graphics Processing Units (GPGPU) memory. This supports the handling of constantly updated data sets, such as songs, images, or text databases. The DLSH algorithm works effectively with data sets that are updated with high frequency and is compatible with parallel processing. However, the use of a single GPGPU device for processing big data is inadequate, due to the small memory capacity of GPGPU devices. When using multiple GPGPU devices for searching, we need an effective search algorithm to balance the jobs. In this paper, we propose an extension of DLSH for big data sets using multiple GPGPUs, in order to increase the capacity and performance of the information retrieval system. Different search strategies on multiple DLSH clusters are also proposed to adapt our parallelized system. With significant results in terms of performance and accuracy, we show that DLSH can be applied to real-life dynamic database systems.

2019 ◽  
Vol 151 ◽  
pp. 1108-1113 ◽  
Author(s):  
Youssef CHOUNI ◽  
Mohamed ERRITALI ◽  
Youness MADANI ◽  
Hanane EZZIKOURI

Author(s):  
Lise Kim ◽  
Esma Yahia ◽  
Frédéric Segonds ◽  
Philippe Veron ◽  
Victor Fau

AbstractManufacturing industry data are distributed, heterogeneous and numerous, resulting in different challenges including the fast, exhaustive and relevant querying of data. In order to provide an innovative answer to this challenge, the authors consider an information retrieval system based on a graph database. In this paper, the authors focus on determining the essential functions to consider in this context. The authors define a three-step methodology using root causes analysis and resolution. This methodology is then applied to a data set and queries representative of an industrial use case. As a result, the authors list four major issues to consider and discuss their potential resolutions.


IEEE Access ◽  
2017 ◽  
Vol 5 ◽  
pp. 11269-11277 ◽  
Author(s):  
Seungjin Choi ◽  
Jiwan Seo ◽  
Mucheol Kim ◽  
Sunghyun Kang ◽  
Sangyong Han

Sign in / Sign up

Export Citation Format

Share Document