scholarly journals gpuZoo: Cost-effective estimation of gene regulatory networks using the Graphics Processing Unit

2021 ◽  
Author(s):  
Marouen Ben Guebila ◽  
Daniel C Morgan ◽  
Kimberly Glass ◽  
Marieke Lydia Kuijjer ◽  
Dawn L DeMeo ◽  
...  

Gene regulatory network inference allows for the study of transcriptional control to identify the alteration of cellular processes in human diseases. Our group has developed several tools to model a variety of regulatory processes, including transcriptional (PANDA, SPIDER) and post-transcriptional (PUMA) gene regulation, and gene regulation in individual samples (LIONESS). These methods work by performing repeated operations on data matrices in order to integrate information across multiple lines of biological evidence. This limits their use for large-scale genomic studies due to the associated high computational burden. To address this limitation, we developed gpuZoo, which includes GPU-accelerated implementations of these algorithms. The runtime of the gpuZoo implementation in MATLAB and Python is up to 61 times faster and 28 times less expensive than the multi-core CPU implementation of the same methods. gpuZoo takes advantage of the modern multi-GPU device architecture to build a population of sample-specific gene regulatory networks with similar runtime and cost improvements by combining GPU acceleration with an efficient on-line derivation. Taken together, gpuZoo allows parallel and on-line gene regulatory network inference in large-scale genomic studies with cost-effective performance. gpuZoo is available in MATLAB through the netZooM package https://github.com/netZoo/netZooM and in Python through the netZooPy package https://github.com/netZoo/netZooPy.

Computation ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 146
Author(s):  
Michael Banf ◽  
Thomas Hartwig

Gene regulation is orchestrated by a vast number of molecules, including transcription factors and co-factors, chromatin regulators, as well as epigenetic mechanisms, and it has been shown that transcriptional misregulation, e.g., caused by mutations in regulatory sequences, is responsible for a plethora of diseases, including cancer, developmental or neurological disorders. As a consequence, decoding the architecture of gene regulatory networks has become one of the most important tasks in modern (computational) biology. However, to advance our understanding of the mechanisms involved in the transcriptional apparatus, we need scalable approaches that can deal with the increasing number of large-scale, high-resolution, biological datasets. In particular, such approaches need to be capable of efficiently integrating and exploiting the biological and technological heterogeneity of such datasets in order to best infer the underlying, highly dynamic regulatory networks, often in the absence of sufficient ground truth data for model training or testing. With respect to scalability, randomized approaches have proven to be a promising alternative to deterministic methods in computational biology. As an example, one of the top performing algorithms in a community challenge on gene regulatory network inference from transcriptomic data is based on a random forest regression model. In this concise survey, we aim to highlight how randomized methods may serve as a highly valuable tool, in particular, with increasing amounts of large-scale, biological experiments and datasets being collected. Given the complexity and interdisciplinary nature of the gene regulatory network inference problem, we hope our survey maybe helpful to both computational and biological scientists. It is our aim to provide a starting point for a dialogue about the concepts, benefits, and caveats of the toolbox of randomized methods, since unravelling the intricate web of highly dynamic, regulatory events will be one fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases.


2020 ◽  
Vol 21 (11) ◽  
pp. 1054-1059
Author(s):  
Bin Yang ◽  
Yuehui Chen

: Reconstruction of gene regulatory networks (GRN) plays an important role in understanding the complexity, functionality and pathways of biological systems, which could support the design of new drugs for diseases. Because differential equation models are flexible androbust, these models have been utilized to identify biochemical reactions and gene regulatory networks. This paper investigates the differential equation models for reverse engineering gene regulatory networks. We introduce three kinds of differential equation models, including ordinary differential equation (ODE), time-delayed differential equation (TDDE) and stochastic differential equation (SDE). ODE models include linear ODE, nonlinear ODE and S-system model. We also discuss the evolutionary algorithms, which are utilized to search the optimal structures and parameters of differential equation models. This investigation could provide a comprehensive understanding of differential equation models, and lead to the discovery of novel differential equation models.


2016 ◽  
Vol 12 (2) ◽  
pp. 588-597 ◽  
Author(s):  
Jun Wu ◽  
Xiaodong Zhao ◽  
Zongli Lin ◽  
Zhifeng Shao

Transcriptional regulation is a basis of many crucial molecular processes and an accurate inference of the gene regulatory network is a helpful and essential task to understand cell functions and gain insights into biological processes of interest in systems biology.


2019 ◽  
Author(s):  
Daniel Morgan ◽  
Matthew Studham ◽  
Andreas Tjärnberg ◽  
Holger Weishaupt ◽  
Fredrik J. Swartling ◽  
...  

AbstractThe gene regulatory network (GRN) of human cells encodes mechanisms to ensure proper functioning. However, if this GRN is dysregulated, the cell may enter into a disease state such as cancer. Understanding the GRN as a system can therefore help identify novel mechanisms underlying disease, which can lead to new therapies. Reliable inference of GRNs is however still a major challenge in systems biology.To deduce regulatory interactions relevant to cancer, we applied a recent computational inference framework to data from perturbation experiments in squamous carcinoma cell line A431. GRNs were inferred using several methods, and the false discovery rate was controlled by the NestBoot framework. We developed a novel approach to assess the predictiveness of inferred GRNs against validation data, despite the lack of a gold standard. The best GRN was significantly more predictive than the null model, both in crossvalidated benchmarks and for an independent dataset of the same genes under a different perturbation design. It agrees with many known links, in addition to predicting a large number of novel interactions from which a subset was experimentally validated. The inferred GRN captures regulatory interactions central to cancer-relevant processes and thus provides mechanistic insights that are useful for future cancer research.Data available at GSE125958Inferred GRNs and inference statistics available at https://dcolin.shinyapps.io/CancerGRN/ Software available at https://bitbucket.org/sonnhammergrni/genespider/src/BFECV/Author SummaryCancer is the second most common cause of death globally, and although cancer treatments have improved in recent years, we need to understand how regulatory mechanisms are altered in cancer to combat the disease efficiently. By applying gene perturbations and inference of gene regulatory networks to 40 genes known or suspected to have a role in cancer due to interactions with the oncogene MYC, we deduce their underlying regulatory interactions. Using a recent computational framework for inference together with a novel method for cross validation, we infer a reliable regulatory model of this system in a completely data driven manner, not reliant on literature or priors. The novel interactions add to the understanding of the progressive oncogenic regulatory process and may provide new targets for therapy.


2020 ◽  
Vol 19 ◽  
pp. 153303382090911
Author(s):  
Qi-en He ◽  
Yi-fan Tong ◽  
Zhou Ye ◽  
Li-xia Gao ◽  
Yi-zhi Zhang ◽  
...  

Radiotherapy is one of the most important cancer treatments, but its response varies greatly among individual patients. Therefore, the prediction of radiosensitivity, identification of potential signature genes, and inference of their regulatory networks are important for clinical and oncological reasons. Here, we proposed a novel multiple genomic fused partial least squares deep regression method to simultaneously analyze multi-genomic data. Using 60 National Cancer Institute cell lines as examples, we aimed to identify signature genes by optimizing the radiosensitivity prediction model and uncovering regulatory relationships. A total of 113 signature genes were selected from more than 20,000 genes. The root mean square error of the model was only 0.0025, which was much lower than previously published results, suggesting that our method can predict radiosensitivity with the highest accuracy. Additionally, our regulatory network analysis identified 24 highly important ‘hub’ genes. The data analysis workflow we propose provides a unified and computational framework to harness the full potential of large-scale integrated cancer genomic data for integrative signature discovery. Furthermore, the regression model, signature genes, and their regulatory network should provide a reliable quantitative reference for optimizing personalized treatment options, and may aid our understanding of cancer progress mechanisms.


Author(s):  
Bing Liu ◽  
Ina Hoeschele ◽  
Alberto de la Fuente

In this chapter, we review the current state of Gene Regulatory Network inference based on ‘Genetical Genomics’ experiments (Brem & Kruglyak, 2005; Brem, Yvert, Clinton & Kruglyak, 2002; Jansen, 2003; Jansen & Nap, 2001; Schadt et al., 2003) as a special case of causal network inference in ‘Systems Genetics’ (Threadgill, 2006). In a Genetical Genomics experiment, a segregating or genetically randomized population is DNA marker genotyped and gene-expression profiled on a genomewide scale. The genotypes are regarded as natural, multifactorial perturbations resulting in different gene-expression ‘phenotypes’, and causal relationships can therefore be established between the measured genotypes and the gene-expression phenotypes. In this chapter, we review different computational approaches to Gene Regulatory Network inference based on the joint analysis of DNA marker and expression data and additionally of DNA sequence information if available. This includes different methods for expression QTL mapping, selection of regulator-target pairs, construction of an encompassing network, which strongly constrains the network search space, and pairwise and multivariate methods for Gene Regulatory Network inference, such as Bayesian Networks and Structural Equation Modeling.


Author(s):  
Gourab Ghosh Roy ◽  
Nicholas Geard ◽  
Karin Verspoor ◽  
Shan He

Abstract Motivation Inferring gene regulatory networks (GRNs) from expression data is a significant systems biology problem. A useful inference algorithm should not only unveil the global structure of the regulatory mechanisms but also the details of regulatory interactions such as edge direction (from regulator to target) and sign (activation/inhibition). Many popular GRN inference algorithms cannot infer edge signs, and those that can infer signed GRNs cannot simultaneously infer edge directions or network cycles. Results To address these limitations of existing algorithms, we propose Polynomial Lasso Bagging (PoLoBag) for signed GRN inference with both edge directions and network cycles. PoLoBag is an ensemble regression algorithm in a bagging framework where Lasso weights estimated on bootstrap samples are averaged. These bootstrap samples incorporate polynomial features to capture higher-order interactions. Results demonstrate that PoLoBag is consistently more accurate for signed inference than state-of-the-art algorithms on simulated and real-world expression datasets. Availability and implementation Algorithm and data are freely available at https://github.com/gourabghoshroy/PoLoBag. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Aryan Kamal ◽  
Christian Arnold ◽  
Annique Claringbould ◽  
Rim Moussa ◽  
Neha Daga ◽  
...  

Among the biggest challenges in the post-GWAS (genome-wide association studies) era is the interpretation of disease-associated genetic variants in non-coding genomic regions. Enhancers have emerged as key players in mediating the effect of genetic variants on complex traits and diseases. Their activity is regulated by a combination of transcription factors (TFs), epigenetic changes and genetic variants. Several approaches exist to link enhancers to their target genes, and others that infer TF-gene connections. However, we currently lack a framework that systematically integrates enhancers into TF-gene regulatory networks. Furthermore, we lack an unbiased way of assessing whether inferred regulatory interactions are biologically meaningful. Here we present two methods, implemented as user-friendly R-packages, for building and evaluating enhancer-mediated gene regulatory networks (eGRNs) called GRaNIE (Gene Regulatory Network Inference including Enhancers - https://git.embl.de/grp-zaugg/GRaNIE) and GRaNPA (Gene Regulatory Network Performance Analysis - https://git.embl.de/grp-zaugg/GRaNPA), respectively. GRaNIE jointly infers TF-enhancer, enhancer-gene and TF-gene interactions by integrating open chromatin data such as ATAC-Seq or H3K27ac with RNA-seq across a set of samples (e.g. individuals), and optionally also Hi-C data. GRaNPA is a general framework for evaluating the biological relevance of TF-gene GRNs by assessing their performance for predicting cell-type specific differential expression. We demonstrate the power of our tool-suite by investigating gene regulatory mechanisms in macrophages that underlie their response to infection, and their involvement in common genetic diseases including autoimmune diseases.Among the biggest challenges in the post-GWAS (genome-wide association studies) era is the interpretation of disease-associated genetic variants in non-coding genomic regions. Enhancers have emerged as key players in mediating the effect of genetic variants on complex traits and diseases. Their activity is regulated by a combination of transcription factors (TFs), epigenetic changes and genetic variants. Several approaches exist to link enhancers to their target genes, and others that infer TF-gene connections. However, we currently lack a framework that systematically integrates enhancers into TF-gene regulatory networks. Furthermore, we lack an unbiased way of assessing whether inferred regulatory interactions are biologically meaningful. Here we present two methods, implemented as user-friendly R-packages, for building and evaluating enhancer-mediated gene regulatory networks (eGRNs) called GRaNIE (Gene Regulatory Network Inference including Enhancers - https://git.embl.de/grp-zaugg/GRaNIE) and GRaNPA (Gene Regulatory Network Performance Analysis - https://git.embl.de/grp-zaugg/GRaNPA), respectively. GRaNIE jointly infers TF-enhancer, enhancer-gene and TF-gene interactions by integrating open chromatin data such as ATAC-Seq or H3K27ac with RNA-seq across a set of samples (e.g. individuals), and optionally also Hi-C data. GRaNPA is a general framework for evaluating the biological relevance of TF-gene GRNs by assessing their performance for predicting cell-type specific differential expression. We demonstrate the power of our tool-suite by investigating gene regulatory mechanisms in macrophages that underlie their response to infection, and their involvement in common genetic diseases including autoimmune diseases.Among the biggest challenges in the post-GWAS (genome-wide association studies) era is the interpretation of disease-associated genetic variants in non-coding genomic regions. Enhancers have emerged as key players in mediating the effect of genetic variants on complex traits and diseases. Their activity is regulated by a combination of transcription factors (TFs), epigenetic changes and genetic variants. Several approaches exist to link enhancers to their target genes, and others that infer TF-gene connections. However, we currently lack a framework that systematically integrates enhancers into TF-gene regulatory networks. Furthermore, we lack an unbiased way of assessing whether inferred regulatory interactions are biologically meaningful. Here we present two methods, implemented as user-friendly R-packages, for building and evaluating enhancer-mediated gene regulatory networks (eGRNs) called GRaNIE (Gene Regulatory Network Inference including Enhancers - https://git.embl.de/grp-zaugg/GRaNIE) and GRaNPA (Gene Regulatory Network Performance Analysis - https://git.embl.de/grp-zaugg/GRaNPA), respectively. GRaNIE jointly infers TF-enhancer, enhancer-gene and TF-gene interactions by integrating open chromatin data such as ATAC-Seq or H3K27ac with RNA-seq across a set of samples (e.g. individuals), and optionally also Hi-C data. GRaNPA is a general framework for evaluating the biological relevance of TF-gene GRNs by assessing their performance for predicting cell-type specific differential expression. We demonstrate the power of our tool-suite by investigating gene regulatory mechanisms in macrophages that underlie their response to infection, and their involvement in common genetic diseases including autoimmune diseases.


2021 ◽  
Author(s):  
Ayoub Lasri ◽  
Vahid Shahrezaei ◽  
Marc Sturrock

Single cell RNA-sequencing (scRNA-seq) has very rapidly become the new workhorse of modern biology providing an unprecedented global view on cellular diversity and heterogeneity. In particular, the structure of gene-gene expression correlation contains information on the underlying gene regulatory networks. However, interpretation of scRNA-seq data is challenging due to specific experimental error and biases that are unique to this kind of data including drop-out (or technical zeros). To deal with this problem several methods for imputation of zeros for scRNA-seq have been developed. However, it is not clear how these processing steps affect inference of genetic networks from single cell data. Here, we introduce Biomodelling.jl, a tool for generation of synthetic scRNA-seq data using multiscale modelling of stochastic gene regulatory networks in growing and dividing cells. Our tool produces realistic transcription data with a known ground truth network topology that can be used to benchmark different approaches for gene regulatory network inference. Using this tool we investigate the impact of different imputation methods on the performance of several network inference algorithms. Biomodelling.jl provides a versatile and useful tool for future development and benchmarking of network inference approaches using scRNA-seq data.


Sign in / Sign up

Export Citation Format

Share Document