Efficient Update Control of Bloom Filter Replicas in Large Scale Distributed Systems

2010 ◽

pp. 114-149 ◽

Cited By ~ 1

Author(s):

Thomas Weise ◽

Raymond Chiong

Keyword(s):

Distributed Systems ◽

Evolutionary Algorithms ◽

Network Topology ◽

Large Scale ◽

Computational Time ◽

Distributed Environment ◽

Rapid Changes ◽

Np Complete ◽

Nature Inspired Algorithms ◽

Ubiquitous Presence

The ubiquitous presence of distributed systems has drastically changed the way the world interacts, and impacted not only the economics and governance but also the society at large. It is therefore important for the architecture and infrastructure within the distributed environment to be continuously renewed in order to cope with the rapid changes driven by the innovative technologies. However, many problems in distributed computing are either of dynamic nature, large scale, NP complete, or a combination of any of these. In most cases, exact solutions are hardly found. As a result, a number of intelligent nature-inspired algorithms have been used recently, as these algorithms are capable of achieving good quality solutions in reasonable computational time. Among all the nature-inspired algorithms, evolutionary algorithms are considerably the most extensively applied ones. This chapter presents a systematic review of evolutionary algorithms employed to solve various problems related to distributed systems. The review is aimed at providing an insight of evolutionary approaches, in particular genetic algorithms and genetic programming, in solving problems in five different areas of network optimization: network topology, routing, protocol synthesis, network security, and parameter settings and configuration. Some interesting applications from these areas will be discussed in detail with the use of illustrative examples.

Download Full-text

Secure Privacy Preserving Record Linkage of Large Databases by Modified Bloom Filter Encodings

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.29 ◽

2017 ◽

Vol 1 (1) ◽

Cited By ~ 2

Author(s):

Rainer Schnell ◽

Christian Borgs

Keyword(s):

Record Linkage ◽

Large Scale ◽

Bloom Filter ◽

Privacy Preserving ◽

Error Rates ◽

Bloom Filters ◽

Data Sets ◽

Research Subjects ◽

Practical Applications ◽

Large Databases

ABSTRACTObjectiveIn most European settings, record linkage across different institutions has to be based on personal identifiers such as names, birthday or place of birth. To protect the privacy of research subjects, the identifiers have to be encrypted. In practice, these identifiers show error rates up to 20% per identifier, therefore linking on encrypted identifiers usually implies the loss of large subsets of the databases. In many applications, this loss of cases is related to variables of interest for the subject matter of the study. Therefore, this kind of record-linkage will generate biased estimates. These problems gave rise to techniques of Privacy Preserving Record Linkage (PPRL). Many different PPRL techniques have been suggested within the last 10 years, very few of them are suitable for practical applications with large database containing millions of records as they are typical for administrative or medical databases. One proven technique for PPRL for large scale applications is PPRL based on Bloom filters.MethodUsing appropriate parameter settings, Bloom filter approaches show linkage results comparable to linkage based on unencrypted identifiers. Furthermore, this approach has been used in real-world settings with data sets containing up to 100 Million records. By the application of suitable blocking strategies, linking can be done in reasonable time.ResultHowever, Bloom filters have been subject of cryptographic attacks. Previous research has shown that the straight application of Bloom filters has a nonzero re-identification risk. We will present new results on recently developed techniques to defy all known attacks on PPRL Bloom filters. These computationally simple algorithms modify the identifiers by different cryptographic diffusion techniques. The presentation will demonstrate these new algorithms and show their performance concerning precision, recall and re-identification risk on large databases.

Download Full-text

Energy-Efficient Data Transfers in Large-Scale Distributed Systems

Handbook of Energy-Aware and Green Computing, Volume 2 ◽

10.1201/b11640-23 ◽

2013 ◽

pp. 363-380

Keyword(s):

Distributed Systems ◽

Energy Efficient ◽

Large Scale ◽

Efficient Data ◽

Data Transfers

Download Full-text

An adaptive control mechanism for access control in large-scale distributed systems

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331213488903 ◽

2013 ◽

Vol 36 (1) ◽

pp. 26-37 ◽

Cited By ~ 1

Author(s):

Xiaofeng Jiang ◽

Jun Li ◽

Hongsheng Xi

Keyword(s):

Adaptive Control ◽

Distributed Systems ◽

Access Control ◽

Large Scale ◽

Control Mechanism

Download Full-text

A Failure Detection System for Large Scale Distributed Systems

International Journal of Distributed Systems and Technologies ◽

10.4018/jdst.2011070105 ◽

2011 ◽

Vol 2 (3) ◽

pp. 64-87 ◽

Cited By ~ 7

Author(s):

Andrei Lavinia ◽

Ciprian Dobre ◽

Florin Pop ◽

Valentin Cristea

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Detection System ◽

Failure Detection ◽

Difficult Problem ◽

Distributed Environment ◽

Dynamic Configuration ◽

Fundamental Building Block ◽

Heavy Loads ◽

Traffic Optimization

Failure detection is a fundamental building block for ensuring fault tolerance in large scale distributed systems. It is also a difficult problem. Resources under heavy loads can be mistaken as being failed. The failure of a network link can be detected by the lack of a response, but this also occurs when a computational resource fails. Although progress has been made, no existing approach provides a system that covers all essential aspects related to a distributed environment. This paper presents a failure detection system based on adaptive, decentralized failure detectors. The system is developed as an independent substrate, working asynchronously and independent of the application flow. It uses a hierarchical protocol, creating a clustering mechanism that ensures a dynamic configuration and traffic optimization. It also uses a gossip strategy for failure detection at local levels to minimize detection time and remove wrong suspicions. Results show that the system scales with the number of monitored resources, while still considering the QoS requirements of both applications and resources.

Download Full-text

A Failure Detection System for Large Scale Distributed Systems

Development of Distributed Systems from Design to Application and Maintenance ◽

10.4018/978-1-4666-2647-8.ch008 ◽

2012 ◽

pp. 127-151

Author(s):

Andrei Lavinia ◽

Ciprian Dobre ◽

Florin Pop ◽

Valentin Cristea

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Detection System ◽

Failure Detection ◽

Difficult Problem ◽

Distributed Environment ◽

Dynamic Configuration ◽

Fundamental Building Block ◽

Heavy Loads ◽

Traffic Optimization

Failure detection is a fundamental building block for ensuring fault tolerance in large scale distributed systems. It is also a difficult problem. Resources under heavy loads can be mistaken as being failed. The failure of a network link can be detected by the lack of a response, but this also occurs when a computational resource fails. Although progress has been made, no existing approach provides a system that covers all essential aspects related to a distributed environment. This paper presents a failure detection system based on adaptive, decentralized failure detectors. The system is developed as an independent substrate, working asynchronously and independent of the application flow. It uses a hierarchical protocol, creating a clustering mechanism that ensures a dynamic configuration and traffic optimization. It also uses a gossip strategy for failure detection at local levels to minimize detection time and remove wrong suspicions. Results show that the system scales with the number of monitored resources, while still considering the QoS requirements of both applications and resources.

Download Full-text

findere: fast and precise approximate membership query

10.1101/2021.05.31.446182 ◽

2021 ◽

Author(s):

Lucas Robidou ◽

Pierre Peterlongo

Keyword(s):

Data Structure ◽

False Positive ◽

False Positive Rate ◽

False Negative ◽

Bloom Filters ◽

Membership Query ◽

Simple Strategy ◽

Large Sets ◽

Positive Rate ◽

Speed Up

Approximate membership query (AMQ) structures as Cuckoo filters or Bloom filters are widely used for representing large sets of elements. Their lightweight space usage explains their success, mainly as they are the only way to scale hundreds of billions or trillions of elements. However, they suffer by nature from non-avoidable false-positive calls that bias downstream analyses of methods using these data structures. In this work we propose a simple strategy and its implementation for reducing the false-positive rate of any AMQ data structure indexing k-mers (words of length k). The method we propose, called findere, enables to speed-up the queries by a factor two and to decrease the false-positive rate by two order of magnitudes. This achievement is done one the fly at query time, without modifying the original indexing data-structure, without generating false-negative calls and with no memory overhead. With no drawback, this method, as simple as it is effective, reduces either the false-positive rate or the space required to represent a set given a user-defined false-positive rate.

Download Full-text

Summary Instance: Scalable Event Priority Determination Engine for Large-Scale Distributed Event-Based System

International Journal of Distributed Sensor Networks ◽

10.1155/2015/390329 ◽

2015 ◽

Vol 2015 ◽

pp. 1-14

Author(s):

Ruisheng Shi ◽

Yang Zhang ◽

Lina Lan ◽

Fei Li ◽

Junliang Chen

Keyword(s):

Large Scale ◽

Vehicular Ad Hoc Networks ◽

Ad Hoc ◽

Bloom Filter ◽

Bloom Filters ◽

Priority Rules ◽

Network Nodes ◽

Significant Performance ◽

Hoc Networks ◽

Event Based

Data prioritization problem is paramount for distributed publish/subscribe infrastructure to the timely delivery of real-time events since a large number of low priority events may clog the channel thereby causing high priority events to get delayed. The challenge raised for the event-based middleware in large-scale distributed system such as vehicular ad hoc networks is that event priority determination engine must be efficient and scalable in terms of priority rule size and event throughputs. This paper proposes an innovative approach based on Bloom filter and event discretization. A Bloom filter data structure is used to store the rule instances and their priorities. The complex rule evaluation is reduced to set membership testing as queries on Bloom filters. The time complexity of data prioritization is constant and independent of the number of priority rules. As event discretization signatures can be cached, this approach is cache friendly in nature. The previous computation results can be cached in overlay network nodes and reused to improve the system throughputs and determination time. We have evaluated our proposed approach and the results show a significant performance improvement.

Download Full-text

Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index

10.1101/217372 ◽

2017 ◽

Cited By ~ 7

Author(s):

Prashant Pandey ◽

Fatemeh Almodaresi ◽

Michael A. Bender ◽

Michael Ferdman ◽

Rob Johnson ◽

...

Keyword(s):

Large Scale ◽

Bloom Filter ◽

False Positives ◽

Graph Representation ◽

Bloom Filters ◽

De Bruijn Graph ◽

Rna Seq ◽

Performance Evaluation Index ◽

Colored De Bruijn Graph ◽

Scale Sequence

AbstractMotivationSequence-level searches on large collections of RNA-seq experiments, such as the NIH Sequence Read Archive (SRA), would enable one to ask many questions about the expression or variation of a given transcript in a population. Bloom filter-based indexes and variants, such as the Sequence Bloom Tree, have been proposed in the past to solve this problem. However, these approaches suffer from fundamental limitations of the Bloom filter, resulting in slow build and query times, less-than-optimal space usage, and large numbers of false positives.ResultsThis paper introduces Mantis, a space-efficient data structure that can be used to index thousands of rawread experiments and facilitate large-scale sequence searches on those experiments. Mantis uses counting quotient filters instead of Bloom filters, enabling rapid index builds and queries, small indexes, and exact results, i.e., no false positives or negatives. Furthermore, Mantis is also a colored de Bruijn graph representation, so it supports fast graph traversal and other topological analyses in addition to large-scale sequence-level searches.In our performance evaluation, index construction with Mantis is 4.4× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6× –108× faster than SSBT and has no false positives or false negatives. For example, Mantis was able to search for all 200,400 known human transcripts in an index of 2652 human blood, breast, and brain RNA-seq experiments in one hour and 22 minutes; SBT took close to 4 days and AllSomeSBT took about eight hours.Mantis is written in C++11 and is available at https://github.com/splatlab/mantis.

Download Full-text

MetaProFi: A protein-based Bloom filter for storing and querying sequence data for accurate identification of functionally relevant genetic variants

10.1101/2021.08.12.456081 ◽

2021 ◽

Author(s):

Sanjay Kumar Srikakulam ◽

Sebastian Keller ◽

Fawaz Dabbaghie ◽

Robert Bals ◽

Olga V. Kalinina

Keyword(s):

Sequence Data ◽

False Negative ◽

False Negative Rate ◽

Bloom Filter ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Bloom Filters ◽

Accurate Identification ◽

Biologically Relevant ◽

Technological Advances

Technological advances of next-generation sequencing present new computational challenges to develop methods to store and query these data in time- and memory-efficient ways. We present MetaProFi (https://github.com/kalininalab/metaprofi), a Bloom filter-based tool that, in addition to supporting nucleotide sequences, can for the first time directly store and query amino acid sequences and translated nucleotide sequences, thus bringing sequence comparison to a more biologically relevant protein level. Owing to the properties of Bloom filters, it has a zero false-negative rate, allows for exact and inexact searches, and leverages disk storage and Zstandard compression to achieve high time and space efficiency. We demonstrate the utility of MetaProFi by indexing UniProtKB datasets at organism- and at sequence-level in addition to the indexing of Tara Oceans dataset and the 2585 human RNA-seq experiments, showing that MetaProFi consumes far less disk space than state-of-the-art-tools while also improving performance.

Download Full-text

Efficient Update Control of Bloom Filter Replicas in Large Scale Distributed Systems

Evolutionary Approaches and Their Applications to Distributed Systems

Secure Privacy Preserving Record Linkage of Large Databases by Modified Bloom Filter Encodings

Energy-Efficient Data Transfers in Large-Scale Distributed Systems

An adaptive control mechanism for access control in large-scale distributed systems

A Failure Detection System for Large Scale Distributed Systems

A Failure Detection System for Large Scale Distributed Systems

findere: fast and precise approximate membership query

Summary Instance: Scalable Event Priority Determination Engine for Large-Scale Distributed Event-Based System

Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index

MetaProFi: A protein-based Bloom filter for storing and querying sequence data for accurate identification of functionally relevant genetic variants

Export Citation Format