scholarly journals Application of Modular Response Analysis to Medium- to Large-Size Biological Systems

2021 ◽  
Author(s):  
Meriem Mekedem ◽  
Patrice Ravel ◽  
Jacques Colinge

The development of high-throughput genomic technologies associated with recent genetic perturbation techniques such as short hairpin RNA (shRNA), gene trapping, or gene editing (CRISPR/Cas9) has made it possible to obtain large perturbation data sets. These data sets are invaluable sources of information regarding the function of genes, and they offer unique opportunities to reverse engineer gene regulatory networks in specific cell types. Modular response analysis (MRA) is a well-accepted mathematical modeling method that is precisely aimed at such network inference tasks, but its use has been limited to rather small biological systems so far. In this study, we show that MRA can be employed on large systems with almost 1,000 network components. In particular, we show that MRA performance surpasses general-purpose mutual information-based algorithms. Part of these competitive results was obtained by the application of a novel heuristic that pruned MRA-inferred interactions a posteriori. We also exploited a block structure in MRA linear algebra to parallelize large system resolutions.

2018 ◽  
Author(s):  
Dayanne M. Castro ◽  
Nicholas R. de Veaux ◽  
Emily R. Miraldi ◽  
Richard Bonneau

AbstractGene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets.


2021 ◽  
Author(s):  
Lisa Maria Steinheuer ◽  
Sebastian Canzler ◽  
Jörg Hackermüller

AbstractGene correlation network inference from single-cell transcriptomics data potentially allows to gain unprecendented insights into cell type-specific regulatory programs. ScRNA-seq data is severely affected by dropout, which significantly hampers and restrains current downstream analysis. Although newly developed tools are capable to deal with sparse data, no appropriate single-cell network inference workflow has been established. A potential way to end this deadlock is the application of data imputation methods, which already proofed to be useful in specific contexts of single-cell data analysis, e.g., recovering cell clusters. In order to infer cell-type specific networks, two prerequisites must be met: the identification of cluster-specific cell-types and the network inference itself.Here, we propose a benchmarking framework to investigate both objections. By using suitable reference data with inherent correlation structure, six representative imputation tools and appropriate evaluation measures, we were able to systematically infer the impact of data imputation on network inference. Major network structures were found to be preserved in low dropout data sets. For moderately sparse data sets, DCA was able to recover gene correlation structures, although systematically introducing higher correlation values. No imputation tool was able to recover true signals from high dropout data. However, by using an additional biological data set we could show that cell-cell correlation by means of specific marker gene expression was not compromised through data imputation.Our analysis showed that network inference is feasible for low and moderately sparse data sets by using the unimputed and DCA-prepared data, respectively. High sparsity data, on the other side, still pose a major problem since current imputation techniques are not able to facilitate network inference. The annotation of cluster-specific cell-types as a prerequisite is not hampered by data imputation but their power to restore the deeply hidden correlation structures is still not sufficient enough.


2009 ◽  
Vol 14 (9) ◽  
pp. 1054-1066 ◽  
Author(s):  
Keith A. Houck ◽  
David J. Dix ◽  
Richard S. Judson ◽  
Robert J. Kavlock ◽  
Jian Yang ◽  
...  

The complexity of human biology has made prediction of health effects as a consequence of exposure to environmental chemicals especially challenging. Complex cell systems, such as the Biologically Multiplexed Activity Profiling (BioMAP) primary, human, cell-based disease models, leverage cellular regulatory networks to detect and distinguish chemicals with a broad range of target mechanisms and biological processes relevant to human toxicity. Here the authors use the BioMAP human cell systems to characterize effects relevant to human tissue and inflammatory disease biology following exposure to the 320 environmental chemicals in the Environmental Protection Agency’s (EPA’s) ToxCast phase I library. The ToxCast chemicals were assayed at 4 concentrations in 8 BioMAP cell systems, with a total of 87 assay endpoints resulting in more than 100,000 data points. Within the context of the BioMAP database, ToxCast compounds could be classified based on their ability to cause overt cytotoxicity in primary human cell types or according to toxicity mechanism class derived from comparisons to activity profiles of BioMAP reference compounds. ToxCast chemicals with similarity to inducers of mitochondrial dysfunction, cAMP elevators, inhibitors of tubulin function, inducers of endoplasmic reticulum stress, or NFκB pathway inhibitors were identified based on this BioMAP analysis. This data set is being combined with additional ToxCast data sets for development of predictive toxicity models at the EPA. ( Journal of Biomolecular Screening 2009:1054-1066)


2021 ◽  
Author(s):  
Gulden Olgun ◽  
Vishaka Gopalan ◽  
Sridhar Hannenhalli

Micro-RNAs (miRNA) are critical in development, homeostasis, and diseases, including cancer. However, our understanding of miRNA function at cellular resolution is thwarted by the inability of the standard single cell RNA-seq protocols to capture miRNAs. Here we introduce a machine learning tool -- miRSCAPE -- to infer miRNA expression in a sample from its RNA-seq profile. We establish miRSCAPE's accuracy separately in 10 tissues comprising ~10,000 tumor and normal bulk samples and demonstrate that miRSCAPE accurately infers cell type-specific miRNA activities (predicted vs observed fold-difference correlation ~ 0.81) in two independent datasets where miRNA profiles of specific cell types are available (HEK-GBM, Kidney-Breast-Skin). When trained on human hematopoietic cancers, miRSCAPE can identify active miRNAs in 8 hematopoietic cell lines in mouse with a reasonable accuracy (auROC = 0.67). Finally, we apply miRSCAPE to infer miRNA activities in scRNA clusters in Pancreatic and Lung cancers, as well as in 56 cell types in the Human Cell Landscape (HCL). Across the board, miRSCAPE recapitulates and provides a refined view of known miRNA biology. miRSCAPE is freely available and promises to substantially expand our understanding of gene regulatory networks at cellular resolution.


2002 ◽  
Vol 218 (4) ◽  
pp. 507-520 ◽  
Author(s):  
FRANK J. BRUGGEMAN ◽  
HANS V. WESTERHOFF ◽  
JAN B. HOEK ◽  
BORIS N. KHOLODENKO

2017 ◽  
Vol 2017 ◽  
pp. 1-16 ◽  
Author(s):  
Wenqing Jean Lee ◽  
Sumantra Chatterjee ◽  
Sook Peng Yap ◽  
Siew Lan Lim ◽  
Xing Xing ◽  
...  

Embryogenesis is an intricate process involving multiple genes and pathways. Some of the key transcription factors controlling specific cell types are the Sox trio, namely, Sox5, Sox6, and Sox9, which play crucial roles in organogenesis working in a concerted manner. Much however still needs to be learned about their combinatorial roles during this process. A developmental genomics and systems biology approach offers to complement the reductionist methodology of current developmental biology and provide a more comprehensive and integrated view of the interrelationships of complex regulatory networks that occur during organogenesis. By combining cell type-specific transcriptome analysis and in vivo ChIP-Seq of the Sox trio using mouse embryos, we provide evidence for the direct control of Sox5 and Sox6 by the transcriptional trio in the murine model and by Morpholino knockdown in zebrafish and demonstrate the novel role of Tgfb2, Fbxl18, and Tle3 in formation of Sox5, Sox6, and Sox9 dependent tissues. Concurrently, a complete embryonic gene regulatory network has been generated, identifying a wide repertoire of genes involved and controlled by the Sox trio in the intricate process of normal embryogenesis.


2017 ◽  
Author(s):  
Andreas Sagner ◽  
Zachary B. Gaber ◽  
Julien Delile ◽  
Jennifer H. Kong ◽  
David L. Rousso ◽  
...  

ABSTRACTDuring tissue development, multipotent progenitors differentiate into specific cell types in characteristic spatial and temporal patterns. We address the mechanism linking progenitor identity and differentiation rate in the neural tube, where motor neuron (MN) progenitors differentiate more rapidly than other progenitors. Using single cell transcriptomics, we define the transcriptional changes associated with the transition of neural progenitors into MNs. Reconstruction of gene expression dynamics from these data indicate a pivotal role for the MN determinant Olig2 just prior to MN differentiation. Olig2 represses expression of the Notch signaling pathway effectors Hes1 and Hes5. Olig2 repression of Hes5 appears to be direct, via a conserved regulatory element within the Hes5 locus that restricts expression from MN progenitors. These findings reveal a tight coupling between the regulatory networks that control patterning and neuronal differentiation, and demonstrate how Olig2 acts as the developmental pacemaker coordinating the spatial and temporal pattern of MN generation.


2021 ◽  
Author(s):  
Christopher Innocenti ◽  
Zhenning Zhang ◽  
Balaji Selvaraj ◽  
Isabelle Gaffney ◽  
Michalis Frangos ◽  
...  

Understanding the complex biology of the tumor microenvironment (TME) is necessary to understand the mechanisms of action of immuno-oncology therapies and to match the right therapies to the right patients. Multiplex immunofluorescence (mIF) is a useful technology that has tremendous potential to further our understanding of cancer patho-biology; however, tools that fully leverage the high dimensionality of this data are still in their infancy. We describe here a novel deep learning pipeline aimed to allow Graph-based Inspection of Tissues via Embeddings, GraphITE. GraphITE transforms mIF data into a graph representation, where unsupervised learning algorithms can be utilised to generate embeddings representing cellular `neighbourhoods'. The embeddings can be downprojected and explored for clustering analysis, and patterns can be mapped back to the image as well as interrogated for phenotypical, morphological, or structural distinctiveness. GraphITE supports the extraction of information not only on the phenotypes of individual cells or the relationships between specific cell types, but is able to characterize cell neighborhoods to look for more complex interactions, thereby allowing pathologists and data scientists to explore mIF data sets, uncovering patterns that are otherwise obscured by the high-dimensionality of the data. In this work, we showcase the current setup of the system, going from raw input data all the way to a user friendly exploration tool. Using this tool, we show how the data can be navigated in a way previously not possible.


2016 ◽  
Author(s):  
David J. Arenillas ◽  
Alistair R.R. Forrest ◽  
Hideya Kawaji ◽  
Timo Lassman ◽  
Wyeth W. Wasserman ◽  
...  

AbstractSummaryWith the emergence of large-scale Cap Analysis of Gene Expression (CAGE) data sets from individual labs and the FANTOM consortium, one can now analyze the cis-regulatory regions associated with gene transcription at an unprecedented level of refinement. By coupling transcription factor binding site (TFBS) enrichment analysis with CAGE-derived genomic regions, CAGEd-oPOSSUM can identify TFs that act as key regulators of genes involved in specific mammalian cell and tissue types. The webtool allows for the analysis of CAGE-derived transcription start sites (TSSs) either provided by the user or selected from ~1,300 mammalian samples from the FANTOM5 project with pre-computed TFBS predicted with JASPAR TF binding profiles. The tool helps power insights into the regulation of genes through the study of the specific usage of TSSs within specific cell types and/or under specific conditions.Availability and implementationThe CAGEd-oPOSUM web tool is implemented in Perl, MySQL, and Apache and is available at http://cagedop.cmmt.ubc.ca/CAGEd_oPOSSUM.Supporting InformationSupplementary Text, Figures, and Data are available online at bioRxiv.


Sign in / Sign up

Export Citation Format

Share Document