bloom filter
Recently Published Documents


TOTAL DOCUMENTS

781
(FIVE YEARS 194)

H-INDEX

30
(FIVE YEARS 5)

2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Sean Randall ◽  
Helen Wichmann ◽  
Adrian Brown ◽  
James Boyd ◽  
Tom Eitelhuber ◽  
...  

Abstract Background Privacy preserving record linkage (PPRL) methods using Bloom filters have shown promise for use in operational linkage settings. However real-world evaluations are required to confirm their suitability in practice. Methods An extract of records from the Western Australian (WA) Hospital Morbidity Data Collection 2011–2015 and WA Death Registrations 2011–2015 were encoded to Bloom filters, and then linked using privacy-preserving methods. Results were compared to a traditional, un-encoded linkage of the same datasets using the same blocking criteria to enable direct investigation of the comparison step. The encoded linkage was carried out in a blinded setting, where there was no access to un-encoded data or a ‘truth set’. Results The PPRL method using Bloom filters provided similar linkage quality to the traditional un-encoded linkage, with 99.3% of ‘groupings’ identical between privacy preserving and clear-text linkage. Conclusion The Bloom filter method appears suitable for use in situations where clear-text identifiers cannot be provided for linkage.


2022 ◽  
Vol 4 (2) ◽  
Author(s):  
Hiroyuki Kano ◽  
Keisuke Hakuta

AbstractA private set intersection protocol is one of the secure multi-party computation protocols, and allows participants to compute the intersection of their sets without revealing them to each other. Ion et al. proposed the private intersection-sum protocol (PI-Sum). The PI-Sum is one of the two-party private set intersection protocol. In the PI-Sum, two parties (say Alice and Bob) have the private sets A and B. Moreover, Bob additionaly has a rational integer associated with each element of B. The PI-Sum allows Bob to obtain the sum of the rational integers associated with the elements of $$A \cap B$$ A ∩ B . This paper proposes the efficiency improvement techniques for the PI-Sum. The proposed techniques are based on Bloom filters which are probabilistic data structures. More precisely, this paper proposes three protocols which are modifications of the PI-Sum. The proposed protocols are more efficient than the PI-Sum.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

In this study we implemented four different versions of Apriori, namely, basic and basic multi-threaded, bloom filter, trie, and count-min sketch, and proposed a new algorithm – NCLAT (Near Candidate-Less Apriori with Tidlists). We compared the runtimes and max memory usages of our implementations among each other as well as with the runtime of Borgelt’s Apriori implementation in some of the cases. NCLAT implementation is more efficient than the other Apriori implementations that we know of in terms of the number of times the database is scanned, and the number of candidates generated. Unlike the original Apriori algorithm which scans the database for every level and creates all of the candidates in advance for each level, NCLAT scans the database only once and creates candidate itemsets only for level one but not afterwards. Thus the number of candidates created is equal to the number of unique items in the database.


2021 ◽  
Vol 17 (4) ◽  
pp. 1-23
Author(s):  
Datong Zhang ◽  
Yuhui Deng ◽  
Yi Zhou ◽  
Yifeng Zhu ◽  
Xiao Qin

Data deduplication techniques construct an index consisting of fingerprint entries to identify and eliminate duplicated copies of repeating data. The bottleneck of disk-based index lookup and data fragmentation caused by eliminating duplicated chunks are two challenging issues in data deduplication. Deduplication-based backup systems generally employ containers storing contiguous chunks together with their fingerprints to preserve data locality for alleviating the two issues, which is still inadequate. To address these two issues, we propose a container utilization based hot fingerprint entry distilling strategy to improve the performance of deduplication-based backup systems. We divide the index into three parts: hot fingerprint entries, fragmented fingerprint entries, and useless fingerprint entries. A container with utilization smaller than a given threshold is called a sparse container . Fingerprint entries that point to non-sparse containers are hot fingerprint entries. For the remaining fingerprint entries, if a fingerprint entry matches any fingerprint of forthcoming backup chunks, it is classified as a fragmented fingerprint entry. Otherwise, it is classified as a useless fingerprint entry. We observe that hot fingerprint entries account for a small part of the index, whereas the remaining fingerprint entries account for the majority of the index. This intriguing observation inspires us to develop a hot fingerprint entry distilling approach named HID . HID segregates useless fingerprint entries from the index to improve memory utilization and bypass disk accesses. In addition, HID separates fragmented fingerprint entries to make a deduplication-based backup system directly rewrite fragmented chunks, thereby alleviating adverse fragmentation. Moreover, HID introduces a feature to treat fragmented chunks as unique chunks. This feature compensates for the shortcoming that a Bloom filter cannot directly identify certain duplicated chunks (i.e., the fragmented chunks). To take full advantage of the preceding feature, we propose an evolved HID strategy called EHID . EHID incorporates a Bloom filter, to which only hot fingerprints are mapped. In doing so, EHID exhibits two salient features: (i) EHID avoids disk accesses to identify unique chunks and the fragmented chunks; (ii) EHID slashes the false positive rate of the integrated Bloom filter. These salient features push EHID into the high-efficiency mode. Our experimental results show our approach reduces the average memory overhead of the index by 34.11% and 25.13% when using the Linux dataset and the FSL dataset, respectively. Furthermore, compared with the state-of-the-art method HAR, EHID boosts the average backup throughput by up to a factor of 2.25 with the Linux dataset, and EHID reduces the average disk I/O traffic by up to 66.21% when it comes to the FSL dataset. EHID also marginally improves the system's restore performance.


2021 ◽  
Vol 2021 ◽  
pp. 1-21
Author(s):  
Hongmin Gao ◽  
Shoushan Luo ◽  
Zhaofeng Ma ◽  
Xiaodan Yan ◽  
Yanping Xu

Due to capacity limitations, large amounts of data generated by IoT devices are often stored on cloud servers. These data are usually encrypted to prevent the disclosure, which significantly affects the availability of this data. Searchable encryption (SE) allows a party to store his data created by his IoT devices or mobile in encryption on the cloud server to protect his privacy while retaining his ability to search for data. However, the general SE techniques are all pay-then-use. The searchable encryption service providers (SESP) are considered curious but honest, making it unfair and unreliable. To address these problems, we combined ciphertext-policy attribute-based encryption, Bloom filter, and blockchain to propose a blockchain-based fair and reliable searchable encryption scheme (BFR-SE) in this paper. In BFR-SE, we constructed an attribute-based searchable encryption model that can provide fine-grained access control. The data owner stores the indices on SESP and stores some additional auxiliary information on the blockchain. After a data user initiates a request, SESP must return the correct and integral search results before the deadline. Otherwise, the data user can send an arbitration request, and the blockchain will make a ruling. The blockchain will only perform arbitrations based on auxiliary information when disputes arise, saving the computing resources on-chain. We analyzed the security and privacy of BFR-SE and simulated our scheme on the EOS blockchain, which proves that BFR-SE is feasible. Meanwhile, we provided a thorough analysis of storage and computing overhead, proving that BFR-SE is practical and has good performance.


2021 ◽  
Vol 2022 (1) ◽  
pp. 373-395
Author(s):  
Badih Ghazi ◽  
Ben Kreuter ◽  
Ravi Kumar ◽  
Pasin Manurangsi ◽  
Jiayu Peng ◽  
...  

Abstract Consider the setting where multiple parties each hold a multiset of users and the task is to estimate the reach (i.e., the number of distinct users appearing across all parties) and the frequency histogram (i.e., fraction of users appearing a given number of times across all parties). In this work we introduce a new sketch for this task, based on an exponentially distributed counting Bloom filter. We combine this sketch with a communication-efficient multi-party protocol to solve the task in the multi-worker setting. Our protocol exhibits both differential privacy and security guarantees in the honest-but-curious model and in the presence of large subsets of colluding workers; furthermore, its reach and frequency histogram estimates have a provably small error. Finally, we show the practicality of the protocol by evaluating it on internet-scale audiences.


Sensors ◽  
2021 ◽  
Vol 21 (22) ◽  
pp. 7607
Author(s):  
Ngoc-Thanh Dinh ◽  
Younghan Kim

One of the main advantages of information-centric networking (ICN) is that a requested piece of content can be retrieved from a content store (CS) at any intermediate node, instead of its original content producer. In existing ICN designs, nodes forward Interest packets mainly based on forwarding information base (FIB). FIB is constructed from name prefixes registered by content producers with a list of next hops to the name prefixes. The ICN forwarding engine uses those information to forward Interest packets towards corresponding content producers. CS information of a node is currently used only for checking the availability of cached content objects at the node and is not considered in the data plane of existing ICN forwarding mechanisms. This paper highlights the importance of CS information in an ICN forwarding mechanism and enables neighbor CS information in the data plane to improve the cache hit ratio and forwarding efficiency, especially for resource-constraint Internet of Things (IoT). We propose an efficient CS-based forwarding scheme for IoT. The proposed forwarding scheme exploits CS information of neighbors to find efficient routes to forward Interest packets toward nearby nodes with corresponding cached content. For that, we carefully design an efficient way for CS information sharing using counting bloom filter. We implement the proposed scheme and compare with state-of-the-art ICN forwarding schemes in IoT. Experimental results indicate that the proposed forwarding scheme achieves a significant improvement in terms of cache hit ratio, energy efficiency, content retrieval latency, and response rate.


2021 ◽  
Author(s):  
Shihao Liu ◽  
Wanming Luo ◽  
Xu Zhou ◽  
Bin Yang ◽  
Yihao Jia ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document