scholarly journals Comparison between instrumental variable and mediation-based approaches for reconstructing causal gene networks in yeast

2021 ◽  
Author(s):  
Adriaan-Alexander Ludl ◽  
Tom Michoel

Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data...

2020 ◽  
Author(s):  
Adriaan-Alexander Ludl ◽  
Tom Michoel

AbstractCausal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are combined, genomic variants can be used to orient the direction of causality between gene expression traits. Instrumental variable methods use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene’s expression level, and assign target genes based on distal eQTL associations. Mediation-based methods additionally require that distal eQTL associations are mediated by the source gene. A detailed comparison between these methods has not yet been conducted, due to the lack of a standardized implementation of different methods, the limited sample size of most multi-omics datasets, and the absence of ground-truth networks for most organisms. Here we used Findr, a software package providing uniform implementations of instrumental variable, mediation, and coexpression-based methods, a recent dataset of 1,012 segregants from a cross between two budding yeast strains, and the YEASTRACT database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of mediation saturates at large sample sizes, due to a loss of sensitivity when residual correlations become significant. Instrumental variable methods on the other hand contain false positive predictions, due to genomic linkage between eQTL instruments. Instrumental variable and mediation-based methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. Instrumental variable methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas mediation failed due to Stb5p auto-regulating its own expression. Mediation suggests a new candidate gene, DNM1, for a hotspot on Chr XII, whereas instrumental variable methods could not distinguish between multiple genes located within the hotspot. In conclusion, causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks, which could be further improved by the development of methods to control for residual correlations in mediation analyses and genomic linkage and pleiotropic effects from transcriptional hotspots in instrumental variable analyses.


2021 ◽  
Author(s):  
Mai Adachi Nakazawa ◽  
Yoshinori Tamada ◽  
Yoshihisa Tanaka ◽  
Marie Ikeguchi ◽  
Kako Higashihara ◽  
...  

The identification of cancer subtypes is important for the understanding of tumor heterogeneity. In recent years, numerous computational methods have been proposed for this problem based on the multi-omics data of patients. It is widely accepted that different cancer subtypes are induced by different molecular regulatory networks. However, only a few incorporate the differences between their molecular systems into the classification processes. In this study, we present a novel method to classify cancer subtypes based on patient-specific molecular systems. Our method quantifies patient-specific gene networks, which are estimated from their transcriptome data. By clustering their quantified networks, our method allows for cancer subtyping, taking into consideration the differences in the molecular systems of patients. Comprehensive analyses of The Cancer Genome Atlas (TCGA) datasets applied to our method confirmed that they were able to identify more clinically meaningful cancer subtypes than the existing subtypes and found that the identified subtypes comprised different molecular features. Our findings show that the proposed method, based on a simple classification using the patient-specific molecular systems, can identify cancer subtypes even with single omics data, which cannot otherwise be captured by existing methods using multi-omics data.


2017 ◽  
Vol 14 (2) ◽  
Author(s):  
Noël Malod-Dognin ◽  
Nataša Pržulj

AbstractMapping the complete functional layout of a cell and understanding the cross-talk between different processes are fundamental challenges. They elude us because of the incompleteness and noisiness of molecular data and because of the computational intractability of finding the exact answer. We perform a simple integration of three types of baker’s yeast omics data to elucidate the functional organization and lines of cross-functional communication. We examine protein–protein interaction (PPI), co-expression (COEX) and genetic interaction (GI) data, and explore their relationship with the gold standard of functional organization, the Gene Ontology (GO). We utilize a simple framework that identifies functional cross-communication lines in each of the three data types, in GO, and collectively in the integrated model of the three omics data types; we present each of them in our new Functional Organization Map (FOM) model. We compare the FOMs of the three omics datasets with the FOM of GO and find that GI is in best agreement with GO, followed COEX and PPI. We integrate the three FOMs into a unified FOM and find that it is in better agreement with the FOM of GO than those of any omics dataset alone, demonstrating functional complementarity of different omics data.


Quantum ◽  
2021 ◽  
Vol 5 ◽  
pp. 515
Author(s):  
Paolo Perinotti

We study the relation of causal influence between input systems of a reversible evolution and its output systems, in the context of operational probabilistic theories. We analyse two different definitions that are borrowed from the literature on quantum theory—where they are equivalent. One is the notion based on signalling, and the other one is the notion used to define the neighbourhood of a cell in a quantum cellular automaton. The latter definition, that we adopt in the general scenario, turns out to be strictly weaker than the former: it is possible for a system to have causal influence on another one without signalling to it. Remarkably, the counterexample comes from classical theory, where the proposed notion of causal influence determines a redefinition of the neighbourhood of a cell in cellular automata. We stress that, according to our definition, it is impossible anyway to have causal influence in the absence of an interaction, e.g. in a Bell-like scenario. We study various conditions for causal influence, and introduce the feature that we call no interaction without disturbance, under which we prove that signalling and causal influence coincide. The proposed definition has interesting consequences on the analysis of causal networks, and leads to a revision of the notion of neighbourhood for classical cellular automata, clarifying a puzzle regarding their quantisation that apparently makes the neighbourhood larger than the original one.


2021 ◽  
Author(s):  
Anjun Ma ◽  
Xiaoying Wang ◽  
Cankun Wang ◽  
Jingxian Li ◽  
Tong Xiao ◽  
...  

We present DeepMAPS, a deep learning platform for cell-type-specific biological gene network inference from single-cell multi-omics (scMulti-omics). DeepMAPS includes both cells and genes in a heterogeneous graph to infer cell-cell, cell-gene, and gene-gene relations simultaneously. The graph attention neural network considers a cell and a gene with both local and global information, making DeepMAPS more robust to data noises. We benchmarked DeepMAPS on 18 datasets for cell clustering and network inference, and the results showed that our method outperforms various existing tools. We further applied DeepMAPS on a case study of lung tumor leukocyte CITE-seq data and observed superior performance in cell clustering, and predicted biologically meaningful cell-cell communication pathways based on the inferred gene networks. To improve the feasibility and ensure the reproducibility of analyzing scMulti-omics data, we deployed a webserver with multi-functions and various visualizations. Overall, we valued DeepMAPS as a novel platform of the state-of-the-art deep learning model in the single-cell study and can promote the use of scMulti-omics data in the community.


2021 ◽  
Author(s):  
Lukas Aufinger ◽  
Johann Brenner ◽  
Friedrich C Simmel

Complex non-linear dynamics such as period doubling and chaos have been previously found in computational models of the oscillatory gene networks of biological circadian clocks, but their experimental study is difficult. Here, we present experimental evidence of period doubling in a forced synthetic genetic oscillator operated in a cell-free gene expression system. To this end, an oscillatory negative feedback gene circuit is established in a microfluidic reactor, which allows continuous operation of the system over extended periods of time. We first thoroughly characterize the unperturbed oscillator and find good agreement with a four-species ODE model of the system. Guided by simulations, microfluidics is then used to periodically perturb the system by modulating the concentration of one of the oscillator components with a given amplitude and frequency. When the ratio of the external `zeitgeber' period and the intrinisic period is close to 1, we experimentally find period doubling and quadrupling in the oscillator dynamics, whereas for longer zeitgeber periods, we find stable entrainment. Our theoretical model suggests favorable conditions for which the oscillator can be utilized as an externally synchronized clock, but also demonstrates that related systems could, in principle, display chaotic dynamics.


2021 ◽  
Author(s):  
Tao Peng ◽  
Gregory M. Chen ◽  
Kai Tan

ABSTRACTSingle-cell omics assays have become essential tools for identifying and characterizing cell types and states of complex tissues. While each single-modality assay reveals distinctive features about the sequenced cells, true multi-omics assays are still in early stage of development. This notion signifies the importance of computationally integrating single-cell omics data that are conducted on various samples across various modalities. In addition, the advent of multiplexed molecular imaging assays has given rise to a need for computational methods for integrative analysis of single-cell imaging and omics data. Here, we present GLUER (inteGrative anaLysis of mUlti-omics at single-cEll Resolution), a flexible tool for integration of single-cell multi-omics data and imaging data. Using multiple true multi-omics data sets as the ground truth, we demonstrate that GLUER achieved significant improvement over existing methods in terms of the accuracy of matching cells across different data modalities resulting in ameliorating downstream analyses such as clustering and trajectory inference. We further demonstrate the broad utility of GLUER for integrating single-cell transcriptomics data with imaging-based spatial proteomics and transcriptomics data. Finally, we extend GLUER to leverage true cell-pair labels when available in true multi-omics data, and show that this approach improves co-embedding and clustering results. With the rapid accumulation of single-cell multi-omics and imaging data, integrated data holds the promise of furthering our understanding of the role of heterogeneity in development and disease.


2018 ◽  
Vol 35 (13) ◽  
pp. 2226-2234 ◽  
Author(s):  
Ameen Eetemadi ◽  
Ilias Tagkopoulos

Abstract Motivation Gene expression prediction is one of the grand challenges in computational biology. The availability of transcriptomics data combined with recent advances in artificial neural networks provide an unprecedented opportunity to create predictive models of gene expression with far reaching applications. Results We present the Genetic Neural Network (GNN), an artificial neural network for predicting genome-wide gene expression given gene knockouts and master regulator perturbations. In its core, the GNN maps existing gene regulatory information in its architecture and it uses cell nodes that have been specifically designed to capture the dependencies and non-linear dynamics that exist in gene networks. These two key features make the GNN architecture capable to capture complex relationships without the need of large training datasets. As a result, GNNs were 40% more accurate on average than competing architectures (MLP, RNN, BiRNN) when compared on hundreds of curated and inferred transcription modules. Our results argue that GNNs can become the architecture of choice when building predictors of gene expression from exponentially growing corpus of genome-wide transcriptomics data. Availability and implementation https://github.com/IBPA/GNN Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (24) ◽  
pp. 5182-5190 ◽  
Author(s):  
Luis G Leal ◽  
Alessia David ◽  
Marjo-Riita Jarvelin ◽  
Sylvain Sebert ◽  
Minna Männikkö ◽  
...  

Abstract Motivation Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. Results We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. Availability and implementation An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. Supplementary information Supplementary data are available at Bioinformatics online.


2010 ◽  
Vol 08 (04) ◽  
pp. 679-701 ◽  
Author(s):  
ANDRÉ FUJITA ◽  
JOÃO RICARDO SATO ◽  
KANAME KOJIMA ◽  
LUCIANA RODRIGUES GOMES ◽  
MASAO NAGASAKI ◽  
...  

Wiener and Granger have introduced an intuitive concept of causality (Granger causality) between two variables which is based on the idea that an effect never occurs before its cause. Later, Geweke generalized this concept to a multivariate Granger causality, i.e. n variables Granger-cause another variable. Although Granger causality is not "effective causality" in the Aristothelic sense, this concept is useful to infer directionality and information flow in observational data. Granger causality is usually identified by using VAR (Vector Autoregressive) models due to their simplicity. In the last few years, several VAR-based models were presented in order to model gene regulatory networks. Here, we generalize the multivariate Granger causality concept in order to identify Granger causalities between sets of gene expressions, i.e. whether a set of n genes Granger-causes another set of m genes, aiming at identifying the flow of information between gene networks (or pathways). The concept of Granger causality for sets of variables is presented. Moreover, a method for its identification with a bootstrap test is proposed. This method is applied in simulated and also in actual biological gene expression data in order to model regulatory networks. This concept may be useful for the understanding of the complete information flow from one network or pathway to the other, mainly in regulatory networks. Linking this concept to graph theory, sink and source can be generalized to node sets. Moreover, hub and centrality for sets of genes can be defined based on total information flow. Another application is in annotation, when the functionality of a set of genes is unknown, but this set is Granger-caused by another set of genes which is well studied. Therefore, this information may be useful to infer or construct some hypothesis about the unknown set of genes.


Sign in / Sign up

Export Citation Format

Share Document