scholarly journals BionetBF: A Novel Bloom Filter for Faster Membership Identification of Paired Biological Network Data

2021 ◽  
Author(s):  
Sabuzima Nayak ◽  
Ripon Patgiri

Biological network represents the interaction or relationship between the biological entities such as proteins and genes of a biological process. A biological network with thousands of millions of vertices makes its processing complex and challenging. In this article, we have proposed a novel Bloom Filter for biological networks, called BionetBF, to provide fast membership identification of the biological network edges or paired biological data. BionetBF is capable of executing millions of operations within a second on datasets having millions of paired biological data while occupying tiny amount of main memory. We have conducted rigorous experiments to prove the performance of BionetBF with large datasets. The experiment is conducted using 12 generated datasets and three biological network datasets. BionetBF demonstrates higher performance while maintaining a 0.001 false positive probability. BionetBF is also compared with other filters: Cuckoo Filter and Libbloom, where BionetBF proves its supremacy by exhibiting higher performance with a smaller sized memory compared with large sized filters of Cuckoo Filter and Libbloom.

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Reagon Karki ◽  
Alpha Tom Kodamullil ◽  
Charles Tapley Hoyt ◽  
Martin Hofmann-Apitius

Abstract Background Literature derived knowledge assemblies have been used as an effective way of representing biological phenomenon and understanding disease etiology in systems biology. These include canonical pathway databases such as KEGG, Reactome and WikiPathways and disease specific network inventories such as causal biological networks database, PD map and NeuroMMSig. The represented knowledge in these resources delineates qualitative information focusing mainly on the causal relationships between biological entities. Genes, the major constituents of knowledge representations, tend to express differentially in different conditions such as cell types, brain regions and disease stages. A classical approach of interpreting a knowledge assembly is to explore gene expression patterns of the individual genes. However, an approach that enables quantification of the overall impact of differentially expressed genes in the corresponding network is still lacking. Results Using the concept of heat diffusion, we have devised an algorithm that is able to calculate the magnitude of regulation of a biological network using expression datasets. We have demonstrated that molecular mechanisms specific to Alzheimer (AD) and Parkinson Disease (PD) regulate with different intensities across spatial and temporal resolutions. Our approach depicts that the mitochondrial dysfunction in PD is severe in cortex and advanced stages of PD patients. Similarly, we have shown that the intensity of aggregation of neurofibrillary tangles (NFTs) in AD increases as the disease progresses. This finding is in concordance with previous studies that explain the burden of NFTs in stages of AD. Conclusions This study is one of the first attempts that enable quantification of mechanisms represented as biological networks. We have been able to quantify the magnitude of regulation of a biological network and illustrate that the magnitudes are different across spatial and temporal resolution.


Author(s):  
Lun Hu ◽  
Jun Zhang ◽  
Xiangyu Pan ◽  
Hong Yan ◽  
Zhu-Hong You

Abstract Motivation Clustering analysis in a biological network is to group biological entities into functional modules, thus providing valuable insight into the understanding of complex biological systems. Existing clustering techniques make use of lower-order connectivity patterns at the level of individual biological entities and their connections, but few of them can take into account of higher-order connectivity patterns at the level of small network motifs. Results Here, we present a novel clustering framework, namely HiSCF, to identify functional modules based on the higher-order structure information available in a biological network. Taking advantage of higher-order Markov stochastic process, HiSCF is able to perform the clustering analysis by exploiting a variety of network motifs. When compared with several state-of-the-art clustering models, HiSCF yields the best performance for two practical clustering applications, i.e. protein complex identification and gene co-expression module detection, in terms of accuracy. The promising performance of HiSCF demonstrates that the consideration of higher-order network motifs gains new insight into the analysis of biological networks, such as the identification of overlapping protein complexes and the inference of new signaling pathways, and also reveals the rich higher-order organizational structures presented in biological networks. Availability and implementation HiSCF is available at https://github.com/allenv5/HiSCF. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2010 ◽  
Vol 7 (2) ◽  
Author(s):  
Benjamin Kormeier ◽  
Klaus Hippe ◽  
Patrizio Arrigo ◽  
Thoralf Töpel ◽  
Sebastian Janowski ◽  
...  

SummaryFor the implementation of the virtual cell, the fundamental question is how to model and simulate complex biological networks. Therefore, based on relevant molecular database and information systems, biological data integration is an essential step in constructing biological networks. In this paper, we will motivate the applications BioDWH - an integration toolkit for building life science data warehouses, CardioVINEdb - a information system for biological data in cardiovascular-disease and VANESA- a network editor for modeling and simulation of biological networks. Based on this integration process, the system supports the generation of biological network models. A case study of a cardiovascular-disease related gene-regulated biological network is also presented.


2019 ◽  
Author(s):  
Ripon Patgiri ◽  
Sabuzima Nayak ◽  
Samir Kumar Borgohain

Bloom Filter is a data structure for membership query which is deployed in diverse research domains to boost up system’s performance and to lower on-chip memory consumption. However, there are still lacking of a high accuracy Bloom Filterwithoutcompromisingtheperformanceandmemoryspace. Moreover, the scalability causes more memory consumption as well as time complexity. Therefore, in this paper, we present a novel Bloom Filter, called accurate Bloom Filter (acBF), which features: a) an impressive guaranteed accuracy of 99.98%, b) a maximum false positive probability of 0.00015, c) lower collision probability, d) free from false negative, e) optimal insertion and membershipquerycost,andg)≤ 8−bits ofmemoryconsumption per item. acBF deploys eight multidimensional Bloom Filter. ThesemultidimensionalBloomFilterseliminatethefalsepositives at eight stages without sacrificing the system performance. We have conducted rigorous experiments to validate the accuracy of acBF which is unprecedentedly high. Also, acBF is compared with Scalable Bloom Filter (SBF) and Cuckoo Filter (CF). Experiments show acBF outperforms SBF and CF in terms of accuracy, and scalability. Moreover, performance of acBF outperforms CF in lookup operation. But, CF outperforms acBF in insertion. However, accuracy of acBF is incomparable with both SBF and CF.


2021 ◽  
Author(s):  
Lucas Miguel Carvalho

Due to the large generation of omics data on a large scale in the last few years, the extraction of information from biological data has become more complex and its integration or comparison as well. One of the ways to represent interactions of biological data is through networks, which summarize information on interactions between their nodes through edges. The comparison of two biological networks using network metrics, biological enrichment, and visualization consists of data that allows us to understand differences in the interactomes of contrasting conditions. We describe BioNetComp, a python package to compare two different interactomes through different metrics and data visualization without the need for a web platform or software, just by command-line. As a result, we present a comparison made between the interactomes generated from the differentially expressed genes at two different points during a typical bioethanol fermentation. BioNetComp is available at github.com/lmigueel/BioNetComp.


2014 ◽  
Vol 2014 ◽  
pp. 1-16 ◽  
Author(s):  
Huajun Chen ◽  
Xi Chen ◽  
Peiqin Gu ◽  
Zhaohui Wu ◽  
Tong Yu

Recently, huge amounts of data are generated in the domain of biology. Embedded with domain knowledge from different disciplines, the isolated biological resources are implicitly connected. Thus it has shaped a big network of versatile biological knowledge. Faced with such massive, disparate, and interlinked biological data, providing an efficient way to model, integrate, and analyze the big biological network becomes a challenge. In this paper, we present a general OWL (web ontology language) reasoning framework to study the implicit relationships among biological entities. A comprehensive biological ontology across traditional Chinese medicine (TCM) and western medicine (WM) is used to create a conceptual model for the biological network. Then corresponding biological data is integrated into a biological knowledge network as the data model. Based on the conceptual model and data model, a scalable OWL reasoning method is utilized to infer the potential associations between biological entities from the biological network. In our experiment, we focus on the association discovery between TCM and WM. The derived associations are quite useful for biologists to promote the development of novel drugs and TCM modernization. The experimental results show that the system achieves high efficiency, accuracy, scalability, and effectivity.


2020 ◽  
Vol 15 ◽  
Author(s):  
Omer Irshad ◽  
Muhammad Usman Ghani Khan

Aim: To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations. Background: Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell. Objective: To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data. Method: We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with Result: To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers. Conclusion: Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax


2014 ◽  
Vol 11 (2) ◽  
pp. 68-79
Author(s):  
Matthias Klapperstück ◽  
Falk Schreiber

Summary The visualization of biological data gained increasing importance in the last years. There is a large number of methods and software tools available that visualize biological data including the combination of measured experimental data and biological networks. With growing size of networks their handling and exploration becomes a challenging task for the user. In addition, scientists also have an interest in not just investigating a single kind of network, but on the combination of different types of networks, such as metabolic, gene regulatory and protein interaction networks. Therefore, fast access, abstract and dynamic views, and intuitive exploratory methods should be provided to search and extract information from the networks. This paper will introduce a conceptual framework for handling and combining multiple network sources that enables abstract viewing and exploration of large data sets including additional experimental data. It will introduce a three-tier structure that links network data to multiple network views, discuss a proof of concept implementation, and shows a specific visualization method for combining metabolic and gene regulatory networks in an example.


Biotechnology ◽  
2019 ◽  
pp. 120-139
Author(s):  
Seetharaman Balaji

The largest digital repository of information, the World Wide Web keeps growing exponentially and calls for data mining services to provide tailored web experiences. This chapter discusses the overview of information retrieval, knowledge discovery and data mining. It reviews the different stages of data mining and introduces the wide spread biological databanks, their explosion, integration, data warehousing, information retrieval, text mining, text repositories for biological research publications, domain specific search engines, web mining, biological networks and visualization, ontology and systems biology. This chapter also illustrates some technical jargon with picture analogy for a novice learner to understand the concepts clearly.


Author(s):  
Seetharaman Balaji

The largest digital repository of information, the World Wide Web keeps growing exponentially and calls for data mining services to provide tailored web experiences. This chapter discusses the overview of information retrieval, knowledge discovery and data mining. It reviews the different stages of data mining and introduces the wide spread biological databanks, their explosion, integration, data warehousing, information retrieval, text mining, text repositories for biological research publications, domain specific search engines, web mining, biological networks and visualization, ontology and systems biology. This chapter also illustrates some technical jargon with picture analogy for a novice learner to understand the concepts clearly.


Sign in / Sign up

Export Citation Format

Share Document