A Bloom Filter Application for Processing Big Datasets through MapReduce Framework

Author(s):  
Milko Todorov Marinov
Author(s):  
LAKSHMI PRANEETHA

Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.


2011 ◽  
Vol 22 (4) ◽  
pp. 773-781
Author(s):  
Gui-Ming ZHU ◽  
De-Ke GUO ◽  
Shi-Yao JIN

2012 ◽  
Vol 35 (5) ◽  
pp. 910-917
Author(s):  
Gui-Ming ZHU ◽  
De-Ke GUO ◽  
Shi-Yao JIN

2010 ◽  
Vol 30 (9) ◽  
pp. 2335-2338
Author(s):  
Hua-yun YAN ◽  
Ji-hong GUAN
Keyword(s):  

Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1778
Author(s):  
Binhao He ◽  
Meiting Xue ◽  
Shubiao Liu ◽  
Wei Luo

As one of the most important operations in relational databases, the join is data-intensive and time-consuming. Thus, offloading this operation using field-programmable gate arrays (FPGAs) has attracted much interest and has been broadly researched in recent years. However, the available SRAM-based join architectures are often resource-intensive, power-consuming, or low-throughput. Besides, a lower match rate does not lead to a shorter operation time. To address these issues, a Bloom filter (BF)-based parallel join architecture is presented in this paper. This architecture first leverages the BF to discard the tuples that are not in the join result and classifies the remaining tuples into different channels. Second, a binary search tree is used to reduce the number of comparisons. The proposed method was implemented on a Xilinx FPGA, and the experimental results show that under a match rate of 50%, our architecture achieved a high join throughput of 145.8 million tuples per second and a maximum acceleration factor of 2.3 compared to the existing SRAM-based join architectures.


2021 ◽  
Author(s):  
Zengjie Wang ◽  
Wen Luo ◽  
Linwang Yuan ◽  
Hong Gao ◽  
Fan Wu ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document