MetaProFi: A protein-based Bloom filter for storing and querying sequence data for accurate identification of functionally relevant genetic variants

Technological advances of next-generation sequencing present new computational challenges to develop methods to store and query these data in time- and memory-efficient ways. We present MetaProFi (https://github.com/kalininalab/metaprofi), a Bloom filter-based tool that, in addition to supporting nucleotide sequences, can for the first time directly store and query amino acid sequences and translated nucleotide sequences, thus bringing sequence comparison to a more biologically relevant protein level. Owing to the properties of Bloom filters, it has a zero false-negative rate, allows for exact and inexact searches, and leverages disk storage and Zstandard compression to achieve high time and space efficiency. We demonstrate the utility of MetaProFi by indexing UniProtKB datasets at organism- and at sequence-level in addition to the indexing of Tara Oceans dataset and the 2585 human RNA-seq experiments, showing that MetaProFi consumes far less disk space than state-of-the-art-tools while also improving performance.

Download Full-text

A Method of Segmenting Apples Based on Gray-Centered RGB Color Space

Remote Sensing ◽

10.3390/rs13061211 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1211

Author(s):

Pan Fan ◽

Guodong Lang ◽

Bin Yan ◽

Xiaoyan Lei ◽

Pengju Guo ◽

...

Keyword(s):

Vision System ◽

False Positive Rate ◽

Clustering Algorithms ◽

False Negative ◽

Color Space ◽

False Negative Rate ◽

Feature Selection Method ◽

Image Features ◽

Accurate Identification ◽

Rgb Color Space

In recent years, many agriculture-related problems have been evaluated with the integration of artificial intelligence techniques and remote sensing systems. The rapid and accurate identification of apple targets in an illuminated and unstructured natural orchard is still a key challenge for the picking robot’s vision system. In this paper, by combining local image features and color information, we propose a pixel patch segmentation method based on gray-centered red–green–blue (RGB) color space to address this issue. Different from the existing methods, this method presents a novel color feature selection method that accounts for the influence of illumination and shadow in apple images. By exploring both color features and local variation in apple images, the proposed method could effectively distinguish the apple fruit pixels from other pixels. Compared with the classical segmentation methods and conventional clustering algorithms as well as the popular deep-learning segmentation algorithms, the proposed method can segment apple images more accurately and effectively. The proposed method was tested on 180 apple images. It offered an average accuracy rate of 99.26%, recall rate of 98.69%, false positive rate of 0.06%, and false negative rate of 1.44%. Experimental results demonstrate the outstanding performance of the proposed method.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Efficient Update Control of Bloom Filter Replicas in Large Scale Distributed Systems

Handbook of Research on Scalable Computing Technologies ◽

10.4018/978-1-60566-661-7.ch034 ◽

2010 ◽

pp. 785-807 ◽

Cited By ~ 2

Author(s):

Yifeng Zhu ◽

Hong Jiang

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Control Mechanism ◽

False Negative ◽

Bloom Filter ◽

Analytical Models ◽

Bloom Filters ◽

Distributed Environment ◽

Membership Query ◽

Efficient Data

This chapter discusses the false rates of Bloom filters in a distributed environment. A Bloom filter (BF) is a space-efficient data structure to support probabilistic membership query. In distributed systems, a Bloom filter is often used to summarize local services or objects and this Bloom filter is replicated to remote hosts. This allows remote hosts to perform fast membership query without contacting the original host. However, when the services or objects are changed, the remote Bloom replica may become stale. This chapter analyzes the impact of staleness on the false positive and false negative for membership queries on a Bloom filter replica. An efficient update control mechanism is then proposed based on the analytical results to minimize the updating overhead. This chapter validates the analytical models and the update control mechanism through simulation experiments.

Download Full-text

Improving Bloom Filter Performance on Sequence Data Using $$k$$ -mer Bloom Filters

Lecture Notes in Computer Science - Research in Computational Molecular Biology ◽

10.1007/978-3-319-31957-5_10 ◽

2016 ◽

pp. 137-151 ◽

Cited By ~ 2

Author(s):

David Pellow ◽

Darya Filippova ◽

Carl Kingsford

Keyword(s):

Sequence Data ◽

Bloom Filter ◽

Bloom Filters ◽

Filter Performance

Download Full-text

A Temporal and Spatial Data Redundancy Processing Algorithm for RFID Surveillance Data

Wireless Communications and Mobile Computing ◽

10.1155/2020/6937912 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Siye Wang ◽

Ziwen Cao ◽

Yanfang Zhang ◽

Weiqing Huang ◽

Jianguo Jiang

Keyword(s):

Radio Frequency Identification ◽

Spatial Data ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Bloom Filter ◽

Data Redundancy ◽

Filter Performance ◽

Redundant Data ◽

Rfid Data

The Radio Frequency Identification (RFID) data acquisition rate used for monitoring is so high that the RFID data stream contains a large amount of redundant data, which increases the system overhead. To balance the accuracy and real-time performance of monitoring, it is necessary to filter out redundant RFID data. We propose an algorithm called Time-Distance Bloom Filter (TDBF) that takes into account the read time and read distance of RFID tags, which greatly reduces data redundancy. In addition, we have proposed a measurement of the filter performance evaluation indicators. In experiments, we found that the performance score of the TDBF algorithm was 5.2, while the Time Bloom Filter (TBF) score was only 0.03, which indicates that the TDBF algorithm can achieve a lower false negative rate, lower false positive rate, and higher data compression rate. Furthermore, in a dynamic scenario, the TDBF algorithm can filter out valid data according to the actual scenario requirements.

Download Full-text

Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters

Journal of Computational Biology ◽

10.1089/cmb.2016.0155 ◽

2017 ◽

Vol 24 (6) ◽

pp. 547-557 ◽

Cited By ~ 7

Author(s):

David Pellow ◽

Darya Filippova ◽

Carl Kingsford

Keyword(s):

Sequence Data ◽

Bloom Filter ◽

Bloom Filters ◽

Filter Performance

Download Full-text

Prednis-OH NO! – A Case of Anaphylaxis Induced by Prednisone

10.21203/rs.3.rs-120496/v1 ◽

2020 ◽

Author(s):

Nicholas Chapman ◽

Taylor Caddell ◽

Rage Geringer ◽

Greg Hicks

Keyword(s):

Case Report ◽

Skin Testing ◽

False Negative ◽

False Negative Rate ◽

Lymphocyte Transformation ◽

Systemic Circulation ◽

Primary Objective ◽

Accurate Identification ◽

High False Negative Rate ◽

Threatening Condition

Abstract Background: Anaphylaxis is a potentially life-threatening condition caused by the sudden release of inflammatory mediators into the systemic circulation. Among this condition’s etiologies, corticosteroid-induced anaphylaxis, despite being uncommon, should receive due consideration given the frequency of steroid use in various settings. Any patient that presents with shortness of breath, wheezing, hypotension, urticaria, or other characteristic signs of anaphylaxis following the administration of steroids should be promptly evaluated. Because of the potentially fatal nature of anaphylaxis, clinicians must be familiar with the presentation, diagnosis, and management of the reaction. Case Report: The primary objective of this case report is to discuss an example of such a reaction in a 21-year-old female with a past medical history of anxiety, depression, and alcoholism who presented with anaphylaxis following prednisone use, as well as the proposed pathophysiology and management thereafter. She was managed with intravenous epinephrine and diphenhydramine with complete resolution of her symptoms. She was subsequently discharged with an EpiPen, cetirizine, and advised to establish care with an allergist for follow up and additional allergy testing. To complete this case report, we performed a review of current primary literature on the subject. Conclusions: Though uncertain, many potential mechanisms of sensitization to corticosteroids were identified, including haptenization, preservatives, excipients, and conjugated esters. Various means exist to aid in diagnosis, such as skin testing, immunoCAP assays, lymphocyte transformation tests, basophil activation tests, and graded drug challenges, though these tests are associated with a high false negative rate. Accurate identification of the causative agent is crucial in facilitating avoidance or rapid desensitization prior to future corticosteroid use.

Download Full-text

MetaProFi: A Protein-Based Bloom Filter for Storing and Querying Sequence Data for Accurate Identification of Functionally Relevant Genetic Variants

SSRN Electronic Journal ◽

10.2139/ssrn.3936041 ◽

2021 ◽

Author(s):

Sanjay K. Srikakulam ◽

Sebastian Keller ◽

Fawaz Dabbaghie ◽

Robert Bals ◽

Olga V. Kalinina

Keyword(s):

Genetic Variants ◽

Sequence Data ◽

Bloom Filter ◽

Accurate Identification

Download Full-text

Molecular Analysis of CAP59 Gene Sequences from Five Serotypes of Cryptococcus neoformans

Journal of Clinical Microbiology ◽

10.1128/jcm.38.3.992-995.2000 ◽

2000 ◽

Vol 38 (3) ◽

pp. 992-995 ◽

Cited By ~ 15

Author(s):

Yuka Nakamura ◽

Rui Kano ◽

Shinichi Watanabe ◽

Atsuhiko Hasegawa

Keyword(s):

Cryptococcus Neoformans ◽

Molecular Analysis ◽

Phylogenetic Relationships ◽

Mixed Type ◽

Sequence Data ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Nucleotide Sequence Data ◽

Serotype A ◽

Serotype B

The nucleotide sequences of CAP59 genes from five serotypes of Cryptococcus neoformans were analyzed for their phylogenetic relationships. Approximately 600-bp genomic DNA fragments of the CAP59 gene were amplified from each isolate by PCR and sequenced. The CAP59 nucleotide sequences of C. neoformans showed more than 90% similarity among the five serotypes. By phylogenetic analysis, their sequences were divided into three clusters: serotypes A and AD, serotypes B and C, and serotype D. In addition, the results of reduced amino acid sequences were similar to the nucleotide sequence data. These data revealed that serotype AD was genetically close to serotype A rather than serotype D, although it had been considered to be a mixed type of serotype A and D by serological analysis. Furthermore, the nucleotide sequences of the serotype B and C isolates of C. neoformanswere very similar to each other. These results indicated that serotype B and C isolates belonging to C. neoformans var.gattii were genetically homogeneous and closely related. The molecular analysis of the CAP59 gene will provide useful information for the differentiation of serotypes of C. neoformans and for an understanding of their phylogenetic relationships.

Download Full-text

Locality Sensitive Hardware Signature Variants for Hardware Transactional Memory

10.21203/rs.3.rs-439549/v1 ◽

2021 ◽

Author(s):

sangeetha r ◽

Satyanarayana Vollala ◽

Ramasubramanian N

Keyword(s):

False Positive ◽

Transactional Memory ◽

False Positive Rate ◽

False Negative ◽

Bloom Filter ◽

Conflict Detection ◽

Bloom Filters ◽

Positive Rate ◽

Memory Conflicts ◽

Probabilistic Data Structure

Abstract Lock based techniques have its own limitations like priority inversion, convoying, and deadlock. Lock free techniques overcome those mentioned limitations. Transactional memory (TM) is one leading lock free technique used in recent multi core processors like Intel Haswell and IBM BlueGene/Q. TM has to do data versioning and conflict detection. For conflict detection probabilistic data structure called Bloom Filters are used. Bloom filter based hardware signatures are used in TM. In TM shared memory conflicts like RAW, WAR, and WAW hazards are handled by Bloom Filter (BF). Hardware signatures store memory addresses in hashed form on Bloom filters. Bloom filters are easy to use, performance efficient data structures lead to false positive but never support false negative. Locality sensitive hardware signatures reduce filter occupancy by sharing bits for the contiguous memory addresses, in turn reduces the false positive rate. This paper implements existing H3 – HS and LS – HS proposed by Ricardo Quislant et al. [13]. Also this paper proposes RS – HS, CS – HS, and RO – HS. RO – HS equally spreads addresses among bloom filters thereby reduces filter occupancy. In turn reduced filter occupancy leads to better False Positive Rate.

Download Full-text