scholarly journals MetaProFi: A protein-based Bloom filter for storing and querying sequence data for accurate identification of functionally relevant genetic variants

2021 ◽  
Author(s):  
Sanjay Kumar Srikakulam ◽  
Sebastian Keller ◽  
Fawaz Dabbaghie ◽  
Robert Bals ◽  
Olga V. Kalinina

Technological advances of next-generation sequencing present new computational challenges to develop methods to store and query these data in time- and memory-efficient ways. We present MetaProFi (https://github.com/kalininalab/metaprofi), a Bloom filter-based tool that, in addition to supporting nucleotide sequences, can for the first time directly store and query amino acid sequences and translated nucleotide sequences, thus bringing sequence comparison to a more biologically relevant protein level. Owing to the properties of Bloom filters, it has a zero false-negative rate, allows for exact and inexact searches, and leverages disk storage and Zstandard compression to achieve high time and space efficiency. We demonstrate the utility of MetaProFi by indexing UniProtKB datasets at organism- and at sequence-level in addition to the indexing of Tara Oceans dataset and the 2585 human RNA-seq experiments, showing that MetaProFi consumes far less disk space than state-of-the-art-tools while also improving performance.

2021 ◽  
Vol 13 (6) ◽  
pp. 1211
Author(s):  
Pan Fan ◽  
Guodong Lang ◽  
Bin Yan ◽  
Xiaoyan Lei ◽  
Pengju Guo ◽  
...  

In recent years, many agriculture-related problems have been evaluated with the integration of artificial intelligence techniques and remote sensing systems. The rapid and accurate identification of apple targets in an illuminated and unstructured natural orchard is still a key challenge for the picking robot’s vision system. In this paper, by combining local image features and color information, we propose a pixel patch segmentation method based on gray-centered red–green–blue (RGB) color space to address this issue. Different from the existing methods, this method presents a novel color feature selection method that accounts for the influence of illumination and shadow in apple images. By exploring both color features and local variation in apple images, the proposed method could effectively distinguish the apple fruit pixels from other pixels. Compared with the classical segmentation methods and conventional clustering algorithms as well as the popular deep-learning segmentation algorithms, the proposed method can segment apple images more accurately and effectively. The proposed method was tested on 180 apple images. It offered an average accuracy rate of 99.26%, recall rate of 98.69%, false positive rate of 0.06%, and false negative rate of 1.44%. Experimental results demonstrate the outstanding performance of the proposed method.


1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.


Author(s):  
Yifeng Zhu ◽  
Hong Jiang

This chapter discusses the false rates of Bloom filters in a distributed environment. A Bloom filter (BF) is a space-efficient data structure to support probabilistic membership query. In distributed systems, a Bloom filter is often used to summarize local services or objects and this Bloom filter is replicated to remote hosts. This allows remote hosts to perform fast membership query without contacting the original host. However, when the services or objects are changed, the remote Bloom replica may become stale. This chapter analyzes the impact of staleness on the false positive and false negative for membership queries on a Bloom filter replica. An efficient update control mechanism is then proposed based on the analytical results to minimize the updating overhead. This chapter validates the analytical models and the update control mechanism through simulation experiments.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Siye Wang ◽  
Ziwen Cao ◽  
Yanfang Zhang ◽  
Weiqing Huang ◽  
Jianguo Jiang

The Radio Frequency Identification (RFID) data acquisition rate used for monitoring is so high that the RFID data stream contains a large amount of redundant data, which increases the system overhead. To balance the accuracy and real-time performance of monitoring, it is necessary to filter out redundant RFID data. We propose an algorithm called Time-Distance Bloom Filter (TDBF) that takes into account the read time and read distance of RFID tags, which greatly reduces data redundancy. In addition, we have proposed a measurement of the filter performance evaluation indicators. In experiments, we found that the performance score of the TDBF algorithm was 5.2, while the Time Bloom Filter (TBF) score was only 0.03, which indicates that the TDBF algorithm can achieve a lower false negative rate, lower false positive rate, and higher data compression rate. Furthermore, in a dynamic scenario, the TDBF algorithm can filter out valid data according to the actual scenario requirements.


2017 ◽  
Vol 24 (6) ◽  
pp. 547-557 ◽  
Author(s):  
David Pellow ◽  
Darya Filippova ◽  
Carl Kingsford

2020 ◽  
Author(s):  
Nicholas Chapman ◽  
Taylor Caddell ◽  
Rage Geringer ◽  
Greg Hicks

Abstract Background: Anaphylaxis is a potentially life-threatening condition caused by the sudden release of inflammatory mediators into the systemic circulation. Among this condition’s etiologies, corticosteroid-induced anaphylaxis, despite being uncommon, should receive due consideration given the frequency of steroid use in various settings. Any patient that presents with shortness of breath, wheezing, hypotension, urticaria, or other characteristic signs of anaphylaxis following the administration of steroids should be promptly evaluated. Because of the potentially fatal nature of anaphylaxis, clinicians must be familiar with the presentation, diagnosis, and management of the reaction. Case Report: The primary objective of this case report is to discuss an example of such a reaction in a 21-year-old female with a past medical history of anxiety, depression, and alcoholism who presented with anaphylaxis following prednisone use, as well as the proposed pathophysiology and management thereafter. She was managed with intravenous epinephrine and diphenhydramine with complete resolution of her symptoms. She was subsequently discharged with an EpiPen, cetirizine, and advised to establish care with an allergist for follow up and additional allergy testing. To complete this case report, we performed a review of current primary literature on the subject. Conclusions: Though uncertain, many potential mechanisms of sensitization to corticosteroids were identified, including haptenization, preservatives, excipients, and conjugated esters. Various means exist to aid in diagnosis, such as skin testing, immunoCAP assays, lymphocyte transformation tests, basophil activation tests, and graded drug challenges, though these tests are associated with a high false negative rate. Accurate identification of the causative agent is crucial in facilitating avoidance or rapid desensitization prior to future corticosteroid use.


2000 ◽  
Vol 38 (3) ◽  
pp. 992-995 ◽  
Author(s):  
Yuka Nakamura ◽  
Rui Kano ◽  
Shinichi Watanabe ◽  
Atsuhiko Hasegawa

The nucleotide sequences of CAP59 genes from five serotypes of Cryptococcus neoformans were analyzed for their phylogenetic relationships. Approximately 600-bp genomic DNA fragments of the CAP59 gene were amplified from each isolate by PCR and sequenced. The CAP59 nucleotide sequences of C. neoformans showed more than 90% similarity among the five serotypes. By phylogenetic analysis, their sequences were divided into three clusters: serotypes A and AD, serotypes B and C, and serotype D. In addition, the results of reduced amino acid sequences were similar to the nucleotide sequence data. These data revealed that serotype AD was genetically close to serotype A rather than serotype D, although it had been considered to be a mixed type of serotype A and D by serological analysis. Furthermore, the nucleotide sequences of the serotype B and C isolates of C. neoformanswere very similar to each other. These results indicated that serotype B and C isolates belonging to C. neoformans var.gattii were genetically homogeneous and closely related. The molecular analysis of the CAP59 gene will provide useful information for the differentiation of serotypes of C. neoformans and for an understanding of their phylogenetic relationships.


2021 ◽  
Author(s):  
sangeetha r ◽  
Satyanarayana Vollala ◽  
Ramasubramanian N

Abstract Lock based techniques have its own limitations like priority inversion, convoying, and deadlock. Lock free techniques overcome those mentioned limitations. Transactional memory (TM) is one leading lock free technique used in recent multi core processors like Intel Haswell and IBM BlueGene/Q. TM has to do data versioning and conflict detection. For conflict detection probabilistic data structure called Bloom Filters are used. Bloom filter based hardware signatures are used in TM. In TM shared memory conflicts like RAW, WAR, and WAW hazards are handled by Bloom Filter (BF). Hardware signatures store memory addresses in hashed form on Bloom filters. Bloom filters are easy to use, performance efficient data structures lead to false positive but never support false negative. Locality sensitive hardware signatures reduce filter occupancy by sharing bits for the contiguous memory addresses, in turn reduces the false positive rate. This paper implements existing H3 – HS and LS – HS proposed by Ricardo Quislant et al. [13]. Also this paper proposes RS – HS, CS – HS, and RO – HS. RO – HS equally spreads addresses among bloom filters thereby reduces filter occupancy. In turn reduced filter occupancy leads to better False Positive Rate.


Sign in / Sign up

Export Citation Format

Share Document