scholarly journals Performance Analysis of Duplicate Record Detection Techniques

2019 ◽  
Vol 9 (5) ◽  
pp. 4755-4758
Author(s):  
S. H. Adil ◽  
M. Ebrahim ◽  
S. S. A. Ali ◽  
K. Raza

In this paper, a comprehensive performance analysis of duplicate data detection techniques for relational databases has been performed. The research focuses on traditional SQL based and modern bloom filter techniques to find and eliminate records which already exist in the database while performing bulk insertion operation (i.e. bulk insertion involved in the loading phase of the Extract, Transform, and Load (ETL) process and data synchronization in multisite database synchronization). The comprehensive performance analysis was performed on several data sizes using SQL, bloom filter, and parallel bloom filter. The results show that the parallel bloom filter is highly suitable for duplicate detection in the database.

Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1778
Author(s):  
Binhao He ◽  
Meiting Xue ◽  
Shubiao Liu ◽  
Wei Luo

As one of the most important operations in relational databases, the join is data-intensive and time-consuming. Thus, offloading this operation using field-programmable gate arrays (FPGAs) has attracted much interest and has been broadly researched in recent years. However, the available SRAM-based join architectures are often resource-intensive, power-consuming, or low-throughput. Besides, a lower match rate does not lead to a shorter operation time. To address these issues, a Bloom filter (BF)-based parallel join architecture is presented in this paper. This architecture first leverages the BF to discard the tuples that are not in the join result and classifies the remaining tuples into different channels. Second, a binary search tree is used to reduce the number of comparisons. The proposed method was implemented on a Xilinx FPGA, and the experimental results show that under a match rate of 50%, our architecture achieved a high join throughput of 145.8 million tuples per second and a maximum acceleration factor of 2.3 compared to the existing SRAM-based join architectures.


Author(s):  
Julian Bueno ◽  
Joshua Robertson ◽  
Matej Hejda ◽  
Antonio Hurtado

Sign in / Sign up

Export Citation Format

Share Document