scholarly journals BioNERO: an all-in-one R/Bioconductor package for comprehensive and easy biological network reconstruction

2021 ◽  
Author(s):  
Fabricio Almeida-Silva ◽  
Thiago M. Venancio

Currently, standard network analysis workflows rely on many different packages, often requiring users to have a solid statistics and programming background. Here, we present BioNERO, an R package that aims to integrate all aspects of network analysis workflows, including expression data preprocessing, gene coexpression and regulatory network inference, functional analyses, and intra and interspecies network comparisons. The state-of-the-art methods implemented in BioNERO ensure that users can perform all analyses with a single package in a simple pipeline, without needing to learn a myriad of package-specific syntaxes. BioNERO offers a user-friendly framework that can be easily incorporated in systems biology pipelines. Availability and implementation The package is available at Bioconductor (http://bioconductor.org/packages/BioNERO).

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Nisar Wani ◽  
Debmalya Barh ◽  
Khalid Raza

Abstract Connecting transcriptional and post-transcriptional regulatory networks solves an important puzzle in the elucidation of gene regulatory mechanisms. To decipher the complexity of these connections, we build co-expression network modules for mRNA as well as miRNA expression profiles of breast cancer data. We construct gene and miRNA co-expression modules using the weighted gene co-expression network analysis (WGCNA) method and establish the significance of these modules (Genes/miRNAs) for cancer phenotype. This work also infers an interaction network between the genes of the turquoise module from mRNA expression data and hubs of the turquoise module from miRNA expression data. A pathway enrichment analysis using a miRsystem web tool for miRNA hubs and some of their targets, reveal their enrichment in several important pathways associated with the progression of cancer.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 344
Author(s):  
Mahmoud Ahmed ◽  
Deok Ryong Kim

Researchers use ChIP binding data to identify potential transcription factor binding sites. Similarly, they use gene expression data from sequencing or microarrays to quantify the effect of the factor overexpression or knockdown on its targets. Therefore, the integration of the binding and expression data can be used to improve the understanding of a transcription factor function. Here, we implemented the binding and expression target analysis (BETA) in an R/Bioconductor package. This algorithm ranks the targets based on the distances of their assigned peaks from the factor ChIP experiment and the signed statistics from gene expression profiling with factor perturbation. We further extend BETA to integrate two sets of data from two factors to predict their targets and their combined functions. In this article, we briefly describe the workings of the algorithm and provide a workflow with a real dataset for using it. The gene targets and the aggregate functions of transcription factors YY1 and YY2 in HeLa cells were identified. Using the same datasets, we identified the shared targets of the two factors, which were found to be, on average, more cooperatively regulated.


2022 ◽  
Vol 27 (2) ◽  
pp. 1-25
Author(s):  
Somesh Singh ◽  
Tejas Shah ◽  
Rupesh Nasre

Betweenness centrality (BC) is a popular centrality measure, based on shortest paths, used to quantify the importance of vertices in networks. It is used in a wide array of applications including social network analysis, community detection, clustering, biological network analysis, and several others. The state-of-the-art Brandes’ algorithm for computing BC has time complexities of and for unweighted and weighted graphs, respectively. Brandes’ algorithm has been successfully parallelized on multicore and manycore platforms. However, the computation of vertex BC continues to be time-consuming for large real-world graphs. Often, in practical applications, it suffices to identify the most important vertices in a network; that is, those having the highest BC values. Such applications demand only the top vertices in the network as per their BC values but do not demand their actual BC values. In such scenarios, not only is computing the BC of all the vertices unnecessary but also exact BC values need not be computed. In this work, we attempt to marry controlled approximations with parallelization to estimate the k -highest BC vertices faster, without having to compute the exact BC scores of the vertices. We present a host of techniques to determine the top- k vertices faster , with a small inaccuracy, by computing approximate BC scores of the vertices. Aiding our techniques is a novel vertex-renumbering scheme to make the graph layout more structured , which results in faster execution of parallel Brandes’ algorithm on GPU. Our experimental results, on a suite of real-world and synthetic graphs, show that our best performing technique computes the top- k vertices with an average speedup of 2.5× compared to the exact parallel Brandes’ algorithm on GPU, with an error of less than 6%. Our techniques also exhibit high precision and recall, both in excess of 94%.


Author(s):  
Matteo Manica ◽  
Charlotte Bunne ◽  
Roland Mathis ◽  
Joris Cadow ◽  
Mehmet Eren Ahsen ◽  
...  

Abstract Summary The advent of high-throughput technologies has provided researchers with measurements of thousands of molecular entities and enable the investigation of the internal regulatory apparatus of the cell. However, network inference from high-throughput data is far from being a solved problem. While a plethora of different inference methods have been proposed, they often lead to non-overlapping predictions, and many of them lack user-friendly implementations to enable their broad utilization. Here, we present Consensus Interaction Network Inference Service (COSIFER), a package and a companion web-based platform to infer molecular networks from expression data using state-of-the-art consensus approaches. COSIFER includes a selection of state-of-the-art methodologies for network inference and different consensus strategies to integrate the predictions of individual methods and generate robust networks. Availability and implementation COSIFER Python source code is available at https://github.com/PhosphorylatedRabbits/cosifer. The web service is accessible at https://ibm.biz/cosifer-aas. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Gourab Ghosh Roy ◽  
Nicholas Geard ◽  
Karin Verspoor ◽  
Shan He

Abstract Motivation Inferring gene regulatory networks (GRNs) from expression data is a significant systems biology problem. A useful inference algorithm should not only unveil the global structure of the regulatory mechanisms but also the details of regulatory interactions such as edge direction (from regulator to target) and sign (activation/inhibition). Many popular GRN inference algorithms cannot infer edge signs, and those that can infer signed GRNs cannot simultaneously infer edge directions or network cycles. Results To address these limitations of existing algorithms, we propose Polynomial Lasso Bagging (PoLoBag) for signed GRN inference with both edge directions and network cycles. PoLoBag is an ensemble regression algorithm in a bagging framework where Lasso weights estimated on bootstrap samples are averaged. These bootstrap samples incorporate polynomial features to capture higher-order interactions. Results demonstrate that PoLoBag is consistently more accurate for signed inference than state-of-the-art algorithms on simulated and real-world expression datasets. Availability and implementation Algorithm and data are freely available at https://github.com/gourabghoshroy/PoLoBag. Supplementary information Supplementary data are available at Bioinformatics online.


2013 ◽  
Author(s):  
Jeffrey D. Allen ◽  
Yang Xie ◽  
Guanghua Xiao

Reverse engineering approaches to construct context-specific gene regulatory networks (GRNs) based on genome-wide mRNA expression data have led to significant biological findings. However, the reliability and reproducibility of the reconstructed GRNs needs to be improved. Here, we propose an ensemble-based network aggregation approach to improve the accuracy of the network topology constructed from mRNA expression data. To evaluate the performance of different approaches, we created dozens of simulated networks and also tested our methods on three Escherichia coli datasets. We demonstrate three novel applications from this development. First, bootstrapping can be done on the available samples, turning any network reconstruction approach into an ensemble method. Second, this aggregation approach can be used to combine GRNs from different network inference methods, creating a novel network reconstruction approach that consistently outperforms any constituent method. Third, the approach can be used to effectively integrate GRNs constructed from different studies – producing more accurate networks. We are releasing an implementation of these techniques as an R package “ENA” which is able to run network inference in parallel across multiple servers. We made all of the code and data used in our simulations and analysis available online at https://github.com/QBRC/ENA-Research to ensure the reproducibility of our results.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 344
Author(s):  
Mahmoud Ahmed ◽  
Deok Ryong Kim

Researchers use ChIP binding data to identify potential transcription factor binding sites. Similarly, they use gene expression data from sequencing or microarrays to quantify the effect of the transcription factor overexpression or knockdown on its targets. Therefore, the integration of the binding and expression data can be used to improve the understanding of a transcription factor function. Here, we implemented the binding and expression target analysis (BETA) in an R/Bioconductor package. This algorithm ranks the targets based on the distances of their assigned peaks from the transcription factor ChIP experiment and the signed statistics from gene expression profiling with transcription factor perturbation. We further extend BETA to integrate two sets of data from two transcription factors to predict their targets and their combined functions. In this article, we briefly describe the workings of the algorithm and provide a workflow with a real dataset for using it. The gene targets and the aggregate functions of transcription factors YY1 and YY2 in HeLa cells were identified. Using the same datasets, we identified the shared targets of the two transcription factors, which were found to be, on average, more cooperatively regulated.


2016 ◽  
Author(s):  
Jigar S. Desai ◽  
Ryan C. Sartor ◽  
Lovely Mae Lawas ◽  
SV Krishna Jagadish ◽  
Colleen J. Doherty

AbstractOrganisms respond to changes in their environment through transcriptional regulatory networks (TRNs). The regulatory hierarchy of these networks can be inferred from expression data. Computational approaches to identify TRNs can be applied in any species where quality RNA can be acquired, However, ChIP-Seq and similar validation methods are challenging to employ in non-model species. Improving the accuracy of computational inference methods can significantly reduce the cost and time of subsequent validation experiments. We have developed ExRANGES, an approach that improves the ability to computationally infer TRN from time series expression data. ExRANGES utilizes both the rate of change in expression and the absolute expression level to identify TRN connections. We evaluated ExRANGES in five data sets from different model systems. ExRANGES improved the identification of experimentally validated transcription factor targets for all species tested, even in unevenly spaced and sparse data sets. This improved ability to predict known regulator-target relationships enhances the utility of network inference approaches in non-model species where experimental validation is challenging. We integrated ExRANGES with two different network construction approaches and it has been implemented as an R package available here: http://github.com/DohertyLab/ExRANGES. To install the package type: devtools::install_github(“DohertyLab/ExRANGES”)


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Zhaolian Lu ◽  
Keenan Berry ◽  
Zhenbin Hu ◽  
Yu Zhan ◽  
Tae-Hyuk Ahn ◽  
...  

Abstract Transcription initiation is regulated in a highly organized fashion to ensure proper cellular functions. Accurate identification of transcription start sites (TSSs) and quantitative characterization of transcription initiation activities are fundamental steps for studies of regulated transcriptions and core promoter structures. Several high-throughput techniques have been developed to sequence the very 5′end of RNA transcripts (TSS sequencing) on the genome scale. Bioinformatics tools are essential for processing, analysis, and visualization of TSS sequencing data. Here, we present TSSr, an R package that provides rich functions for mapping TSS and characterizations of structures and activities of core promoters based on all types of TSS sequencing data. Specifically, TSSr implements several newly developed algorithms for accurately identifying TSSs from mapped sequencing reads and inference of core promoters, which are a prerequisite for subsequent functional analyses of TSS data. Furthermore, TSSr also enables users to export various types of TSS data that can be visualized by genome browser for inspection of promoter activities in association with other genomic features, and to generate publication-ready TSS graphs. These user-friendly features could greatly facilitate studies of transcription initiation based on TSS sequencing data. The source code and detailed documentations of TSSr can be freely accessed at https://github.com/Linlab-slu/TSSr.


Biostatistics ◽  
2015 ◽  
Vol 17 (1) ◽  
pp. 16-28 ◽  
Author(s):  
Laurent Jacob ◽  
Johann A. Gagnon-Bartsch ◽  
Terence P. Speed

Abstract When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset—as opposed to the study of an observed factor of interest—taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize.


Sign in / Sign up

Export Citation Format

Share Document