scholarly journals Joint eQTL mapping and Inference of Gene Regulatory Network Improves Power of Detecting both cis- and trans-eQTLs

Author(s):  
Xin Zhou ◽  
Xiaodong Cai

Abstract Motivation Genetic variations of expression quantitative trait loci (eQTLs) play a critical role in influencing complex traits and diseases development. Two main factors that affect the statistical power of detecting eQTLs are: 1) relatively small size of samples available, and 2) heavy burden of multiple testing due to a very large number of variants to be tested. The later issue is particularly severe when one tries to identify trans-eQTLs that are far away from the genes they influence. If one can exploit co-expressed genes jointly in eQTL-mapping, effective sample size can be increased. Furthermore, using the structure of the gene regulatory network (GRN) may help to identify trans-eQTLs without increasing multiple testing burden. Results In this paper, we employ the structure equation model (SEM) to model both GRN and effect of eQTLs on gene expression, and then develop a novel algorithm, named sparse SEM for eQTL mapping (SSEMQ), to conduct joint eQTL mapping and GRN inference. The SEM can exploit co-expressed genes jointly in eQTL mapping and also use GRN to determine trans-eQTLs. Computer simulations demonstrate that our SSEMQ significantly outperforms nine existing eQTL mapping methods. SSEMQ is further employed to analyze two real datasets of human breast and whole blood tissues, yielding a number of cis- and trans-eQTLs. Availability R package ssemQr is available at https://github.com/Ivis4ml/ssemQr.git. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Xin Zhou ◽  
Xiaodong Cai

AbstractMotivationGenetic variations of expression quantitative trait loci (eQTLs) play a critical role in influencing complex traits and diseases development. Two main factors that affect the statistical power of detecting eQTLs are: 1) relatively small size of samples available, and 2) heavy burden of multiple testing due to a very large number of variants to be tested. The later issue is particularly severe when one tries to identify trans-eQTLs that are far away from the genes they influence. If one can exploit co-expressed genes jointly in eQTL-mapping, effective sample size can be increased. Furthermore, using the structure of the gene regulatory network (GRN) may help to identify trans-eQTLs without increasing multiple testing burden.ResultsIn this paper, we employ the structure equation model (SEM) to model both GRN and effect of eQTLs on gene expression, and then develop a novel algorithm, named sparse SEM, for eQTL mapping (SSEMQ) to conduct joint eQTL mapping and GRN inference. The SEM can exploit co-expressed genes jointly in eQTL mapping and also use GRN to determine trans-eQTLs. Computer simulations demonstrate that our SSEMQ significantly outperforms eight existing eQTL mapping methods. SSEMQ is further employed to analyze a real dataset of human breast tissues, yielding a number of cis- and trans-eQTLs.AvailabilityR package ssemQr is available on https://github.com/Ivis4ml/ssemQr.git.


2021 ◽  
Author(s):  
Sreemol Gokuladhas ◽  
William Schierding ◽  
Roan Eltigani Zaied ◽  
Tayaza Fadason ◽  
Murim Choi ◽  
...  

Background & Aims: Non-alcoholic fatty liver disease (NAFLD) is a multi-system metabolic disease that co-occurs with various hepatic and extra-hepatic diseases. The phenotypic manifestation of NAFLD is primarily observed in the liver. Therefore, identifying liver-specific gene regulatory interactions between variants associated with NAFLD and multimorbid conditions may help to improve our understanding of underlying shared aetiology. Methods: Here, we constructed a liver-specific gene regulatory network (LGRN) consisting of genome-wide spatially constrained expression quantitative trait loci (eQTLs) and their target genes. The LGRN was used to identify regulatory interactions involving NAFLD-associated genetic modifiers and their inter-relationships to other complex traits. Results and Conclusions: We demonstrate that MBOAT7 and IL32, which are associated with NAFLD progression, are regulated by spatially constrained eQTLs that are enriched for an association with liver enzyme levels. MBOAT7 transcript levels are also linked to eQTLs associated with cirrhosis, and other traits that commonly co-occur with NAFLD. In addition, genes that encode interacting partners of NAFLD-candidate genes within the liver-specific protein-protein interaction network were affected by eQTLs enriched for phenotypes relevant to NAFLD (e.g. IgG glycosylation patterns, OSA). Furthermore, we identified distinct gene regulatory networks formed by the NAFLD-associated eQTLs in normal versus diseased liver, consistent with the context-specificity of the eQTLs effects. Interestingly, genes targeted by NAFLD-associated eQTLs within the LGRN were also affected by eQTLs associated with NAFLD-related traits (e.g. obesity and body fat percentage). Overall, the genetic links identified between these traits expand our understanding of shared regulatory mechanisms underlying NAFLD multimorbidities.


PLoS Genetics ◽  
2015 ◽  
Vol 11 (4) ◽  
pp. e1005136 ◽  
Author(s):  
Eric M. Camino ◽  
John C. Butts ◽  
Alison Ordway ◽  
Jordan E. Vellky ◽  
Mark Rebeiz ◽  
...  

2021 ◽  
Author(s):  
Xiangyu Pan ◽  
Zhaoxia Ma ◽  
Xinqi Sun ◽  
Hui Li ◽  
Tingting Zhang ◽  
...  

Biologists long recognized that the genetic information encoded in DNA leads to trait innovation via gene regulatory network (GRN) in development. Here, we generated paired expression and chromatin accessibility data during rumen and esophagus development in sheep and revealed 1,601 active ruminant-specific conserved non-coding elements (active-RSCNEs). To interpret the function of these active-RSCNEs, we developed a Conserved Non-coding Element interpretation method by gene Regulatory network (CNEReg) to define toolkit transcription factors (TTF) and model its regulation on rumen specific gene via batteries of active-RSCNEs during development. Our developmental GRN reveals 18 TTFs and 313 active-RSCNEs regulating the functional modules of the rumen and identifies OTX1, SOX21, HOXC8, SOX2, TP63, PPARG and 16 active-RSCNEs that functionally distinguish the rumen from the esophagus. We argue that CNEReg is an attractive systematic approach to integrate evo-devo concepts with omics data to understand how gene regulation evolves and shapes complex traits.


Author(s):  
Gianvito Pio ◽  
Paolo Mignone ◽  
Giuseppe Magazzù ◽  
Guido Zampieri ◽  
Michelangelo Ceci ◽  
...  

Abstract Motivation Gene regulation is responsible for controlling numerous physiological functions and dynamically responding to environmental fluctuations. Reconstructing the human network of gene regulatory interactions is thus paramount to understanding the cell functional organisation across cell types, as well as to elucidating pathogenic processes and identifying molecular drug targets. Although significant effort has been devoted towards this direction, existing computational methods mainly rely on gene expression levels, possibly ignoring the information conveyed by mechanistic biochemical knowledge. Moreover, except for a few recent attempts, most of the existing approaches only consider the information of the organism under analysis, without exploiting the information of related model organisms. Results We propose a novel method for the reconstruction of the human gene regulatory network, based on a transfer learning strategy that synergically exploits information from human and mouse, conveyed by gene-related metabolic features generated in-silico from gene expression data. Specifically, we learn a predictive model from metabolic activity inferred via tissue-specific metabolic modelling of artificial gene knockouts. Our experiments show that the combination of our transfer learning approach with the constructed metabolic features provides a significant advantage in terms of reconstruction accuracy, as well as additional clues on the contribution of each constructed metabolic feature. Availability The system, the datasets and all the results obtained in this study are available at: https://doi.org/10.6084/m9.figshare.c.5237687 Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3833-3840
Author(s):  
Ming-Ju Tsai ◽  
Jyun-Rong Wang ◽  
Shinn-Jang Ho ◽  
Li-Sun Shu ◽  
Wen-Lin Huang ◽  
...  

Abstract Motivation Non-linear ordinary differential equation (ODE) models that contain numerous parameters are suitable for inferring an emulated gene regulatory network (eGRN). However, the number of experimental measurements is usually far smaller than the number of parameters of the eGRN model that leads to an underdetermined problem. There is no unique solution to the inference problem for an eGRN using insufficient measurements. Results This work proposes an evolutionary modelling algorithm (EMA) that is based on evolutionary intelligence to cope with the underdetermined problem. EMA uses an intelligent genetic algorithm to solve the large-scale parameter optimization problem. An EMA-based method, GREMA, infers a novel type of gene regulatory network with confidence levels for every inferred regulation. The higher the confidence level is, the more accurate the inferred regulation is. GREMA gradually determines the regulations of an eGRN with confidence levels in descending order using either an S-system or a Hill function-based ODE model. The experimental results showed that the regulations with high-confidence levels are more accurate and robust than regulations with low-confidence levels. Evolutionary intelligence enhanced the mean accuracy of GREMA by 19.2% when using the S-system model with benchmark datasets. An increase in the number of experimental measurements may increase the mean confidence level of the inferred regulations. GREMA performed well compared with existing methods that have been previously applied to the same S-system, DREAM4 challenge and SOS DNA repair benchmark datasets. Availability and implementation All of the datasets that were used and the GREMA-based tool are freely available at https://nctuiclab.github.io/GREMA. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 32 (6) ◽  
pp. 875-883 ◽  
Author(s):  
S. M. Minhaz Ud-Dean ◽  
Rudiyanto Gunawan

Abstract Motivation: We addressed the problem of inferring gene regulatory network (GRN) from gene expression data of knockout (KO) experiments. This inference is known to be underdetermined and the GRN is not identifiable from data. Past studies have shown that suboptimal design of experiments (DOE) contributes significantly to the identifiability issue of biological networks, including GRNs. However, optimizing DOE has received much less attention than developing methods for GRN inference. Results: We developed REDuction of UnCertain Edges (REDUCE) algorithm for finding the optimal gene KO experiment for inferring directed graphs (digraphs) of GRNs. REDUCE employed ensemble inference to define uncertain gene interactions that could not be verified by prior data. The optimal experiment corresponds to the maximum number of uncertain interactions that could be verified by the resulting data. For this purpose, we introduced the concept of edge separatoid which gave a list of nodes (genes) that upon their removal would allow the verification of a particular gene interaction. Finally, we proposed a procedure that iterates over performing KO experiments, ensemble update and optimal DOE. The case studies including the inference of Escherichia coli GRN and DREAM 4 100-gene GRNs, demonstrated the efficacy of the iterative GRN inference. In comparison to systematic KOs, REDUCE could provide much higher information return per gene KO experiment and consequently more accurate GRN estimates. Conclusions: REDUCE represents an enabling tool for tackling the underdetermined GRN inference. Along with advances in gene deletion and automation technology, the iterative procedure brings an efficient and fully automated GRN inference closer to reality. Availability and implementation: MATLAB and Python scripts of REDUCE are available on www.cabsel.ethz.ch/tools/REDUCE. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 (10) ◽  
Author(s):  
Qiong Zhang ◽  
Lei Zhang ◽  
Ying Huang ◽  
Pengcheng Ma ◽  
Bingyu Mao ◽  
...  

AbstractDopaminergic (DA) neurons in the arcuate nucleus (ARC) of the hypothalamus play essential roles in the secretion of prolactin and the regulation of energy homeostasis. However, the gene regulatory network responsible for the development of the DA neurons remains poorly understood. Here we report that the transcription factor special AT-rich binding protein 2 (Satb2) is required for the development of ARC DA neurons. Satb2 is expressed in a large proportion of DA neurons without colocalization with proopiomelanocortin (POMC), orexigenic agouti-related peptide (AgRP), neuropeptide-Y (NPY), somatostatin (Sst), growth hormone-releasing hormone (GHRH), or galanin in the ARC. Nestin-Cre;Satb2flox/flox (Satb2 CKO) mice show a reduced number of ARC DA neurons with unchanged numbers of the other types of ARC neurons, and exhibit an increase of serum prolactin level and an elevated metabolic rate. The reduction of ARC DA neurons in the CKO mice is observed at an embryonic stage and Dlx1 is identified as a potential downstream gene of Satb2 in regulating the development of ARC DA neurons. Together, our study demonstrates that Satb2 plays a critical role in the gene regulatory network directing the development of DA neurons in ARC.


2019 ◽  
Vol 35 (24) ◽  
pp. 5199-5206 ◽  
Author(s):  
Fredrik Wrede ◽  
Andreas Hellander

Abstract Motivation Discrete stochastic models of gene regulatory network models are indispensable tools for biological inquiry since they allow the modeler to predict how molecular interactions give rise to nonlinear system output. Model exploration with the objective of generating qualitative hypotheses about the workings of a pathway is usually the first step in the modeling process. It involves simulating the gene network model under a very large range of conditions, due to the large uncertainty in interactions and kinetic parameters. This makes model exploration highly computational demanding. Furthermore, with no prior information about the model behavior, labor-intensive manual inspection of very large amounts of simulation results becomes necessary. This limits systematic computational exploration to simplistic models. Results We have developed an interactive, smart workflow for model exploration based on semi-supervised learning and human-in-the-loop labeling of data. The workflow lets a modeler rapidly discover ranges of interesting behaviors predicted by the model. Utilizing that similar simulation output is in proximity of each other in a feature space, the modeler can focus on informing the system about what behaviors are more interesting than others by labeling, rather than analyzing simulation results with custom scripts and workflows. This results in a large reduction in time-consuming manual work by the modeler early in a modeling project, which can substantially reduce the time needed to go from an initial model to testable predictions and downstream analysis. Availability and implementation A python-package is available at https://github.com/Wrede/mio.git. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 98 (2) ◽  
pp. 209-217 ◽  
Author(s):  
D.G. Michael ◽  
T.J.F. Pranzatelli ◽  
B.M. Warner ◽  
H. Yin ◽  
J.A. Chiorini

Significant effort has been applied to identify the genome-wide gene expression profiles associated with salivary gland development and pathophysiology. However, relatively little is known about the regulators that control salivary gland gene expression. We integrated data from DNase1 digital genomic footprinting, RNA-seq, and gene expression microarrays to comprehensively characterize the cis- and trans-regulatory components controlling gene expression of the healthy submandibular salivary gland. Analysis of 32 human tissues and 87 mouse tissues was performed to identify the highly expressed and tissue-enriched transcription factors driving salivary gland gene expression. Following RNA analysis, protein expression levels and subcellular localization of 39 salivary transcription factors were confirmed by immunohistochemistry. These expression analyses revealed that the salivary gland highly expresses transcription factors associated with endoplasmic reticulum stress, human T-cell lymphotrophic virus 1 expression, and Epstein-Barr virus reactivation. DNase1 digital genomic footprinting to a depth of 333,426,353 reads was performed and utilized to generate a salivary gland gene regulatory network describing the genome-wide chromatin accessibility and transcription factor binding of the salivary gland at a single-nucleotide resolution. Analysis of the DNase1 gene regulatory network identified dense interconnectivity among PLAG1, MYB, and 13 other transcription factors associated with balanced chromosomal translocations and salivary gland tumors. Collectively, these analyses provide a comprehensive atlas of the cis- and trans-regulators of the salivary gland and highlight known aberrantly regulated pathways of diseases affecting the salivary glands.


Sign in / Sign up

Export Citation Format

Share Document