Random Forest Factorization Reveals Latent Structure in Single Cell RNA Sequencing Data

2021 ◽  
Author(s):  
Boris M. Brenerman ◽  
Benjamin D. Shapiro ◽  
Michael C. Schatz ◽  
Alexis Battle

AbstractSingle-cell RNA sequencing data contain patterns of correlation that are poorly captured by techniques that rely on linear estimation or assumptions of Gaussian behavior. We apply random forest regression to scRNAseq data from mouse brains, which identifies the co-regulation of genes within specific cellular contexts. By analyzing the estimators of the random forest, we identify several novel candidate gene regulatory networks and compare these networks in aged and young mice. We demonstrate that cell populations have cell-type specific phenotypes of aging that are not detected by other methods, including the collapse of differentiating oligodendrocytes but not precursors or mature oligodendrocytes.

Author(s):  
Rui-Qi Wang ◽  
Wei Zhao ◽  
Hai-Kui Yang ◽  
Jia-Mei Dong ◽  
Wei-Jie Lin ◽  
...  

Colorectal cancer (CRC) manifests as gastrointestinal tumors with high intratumoral heterogeneity. Recent studies have demonstrated that CRC may consist of tumor cells with different consensus molecular subtypes (CMS). The advancements in single-cell RNA sequencing have facilitated the development of gene regulatory networks to decode key regulators for specific cell types. Herein, we comprehensively analyzed the CMS of CRC patients by using single-cell RNA-sequencing data. CMS for all malignant cells were assigned using CMScaller. Gene set variation analysis showed pathway activity differences consistent with those reported in previous studies. Cell–cell communication analysis confirmed that CMS1 was more closely related to immune cells, and that monocytes and macrophages play dominant roles in the CRC tumor microenvironment. On the basis of the constructed gene regulation networks (GRNs) for each subtype, we identified that the critical transcription factor ERG is universally activated and upregulated in all CMS in comparison with normal cells, and that it performed diverse roles by regulating the expression of different downstream genes. In summary, molecular subtyping of single-cell RNA-sequencing data for colorectal cancer could elucidate the heterogeneity in gene regulatory networks and identify critical regulators of CRC.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Christopher A Jackson ◽  
Dayanne M Castro ◽  
Giuseppe-Antonio Saldi ◽  
Richard Bonneau ◽  
David Gresham

Understanding how gene expression programs are controlled requires identifying regulatory relationships between transcription factors and target genes. Gene regulatory networks are typically constructed from gene expression data acquired following genetic perturbation or environmental stimulus. Single-cell RNA sequencing (scRNAseq) captures the gene expression state of thousands of individual cells in a single experiment, offering advantages in combinatorial experimental design, large numbers of independent measurements, and accessing the interaction between the cell cycle and environmental responses that is hidden by population-level analysis of gene expression. To leverage these advantages, we developed a method for scRNAseq in budding yeast (Saccharomyces cerevisiae). We pooled diverse transcriptionally barcoded gene deletion mutants in 11 different environmental conditions and determined their expression state by sequencing 38,285 individual cells. We benchmarked a framework for learning gene regulatory networks from scRNAseq data that incorporates multitask learning and constructed a global gene regulatory network comprising 12,228 interactions.


2019 ◽  
Author(s):  
Christopher A Jackson ◽  
Dayanne M Castro ◽  
Giuseppe-Antonio Saldi ◽  
Richard Bonneau ◽  
David Gresham

AbstractUnderstanding how gene expression programs are controlled requires identifying regulatory relationships between transcription factors and target genes. Gene regulatory networks are typically constructed from gene expression data acquired following genetic perturbation or environmental stimulus. Single-cell RNA sequencing (scRNAseq) captures the gene expression state of thousands of individual cells in a single experiment, offering advantages in combinatorial experimental design, large numbers of independent measurements, and accessing the interaction between the cell cycle and environmental responses that is hidden by population-level analysis of gene expression. To leverage these advantages, we developed a method for transcriptionally barcoding gene deletion mutants and performing scRNAseq in budding yeast (Saccharomyces cerevisiae). We pooled diverse genotypes in 11 different environmental conditions and determined their expression state by sequencing 38,285 individual cells. We developed, and benchmarked, a framework for learning gene regulatory networks from scRNAseq data that incorporates multitask learning and constructed a global gene regulatory network comprising 12,018 interactions. Our study establishes a general approach to gene regulatory network reconstruction from scRNAseq data that can be employed in any organism.


2020 ◽  
Author(s):  
Harsh Shrivastava ◽  
Xiuwei Zhang ◽  
Srinivas Aluru ◽  
Le Song

AbstractMotivationGene regulatory networks (GRNs) are graphs that specify the interactions between transcription factors (TFs) and their target genes. Understanding these interactions is crucial for studying the mechanisms in cell differentiation, growth and development. Computational methods are needed to infer these networks from measured data. Although the availability of single cell RNA-Sequencing (scRNA-Seq) data provides unprecedented scale and resolution of gene-expression data, the inference of GRNs remains a challenge, mainly due to the complexity of the regulatory relationships and the noise in the data.ResultsWe propose GRNUlar, a novel deep learning architecture based on the unrolled algorithms idea for GRN inference from scRNA-Seq data. Like some existing methods which use prior information of which genes are TFs, GRNUlar also incorporates this TF information using a sparse multi-task deep learning architecture. We also demonstrate the application of a recently developed unrolled architecture GLAD to recover undirected GRNs in the absence of TF information. These unrolled architectures require supervision to train, for which we leverage the existing synthetic data simulators which generate scRNA-Seq data guided by a GRN. We show that unrolled algorithms outperform the state-of-the-art methods on synthetic data as well as real datasets in both the settings of TF information being absent or available.AvailabilityGithub link to GRNUlar - https://github.com/Harshs27/[email protected]


2020 ◽  
Author(s):  
Andreas Fønss Møller ◽  
Kedar Nath Natarajan

AbstractRecent single-cell RNA-sequencing atlases have surveyed and identified major cell-types across different mouse tissues. Here, we computationally reconstruct gene regulatory networks from 3 major mouse cell atlases to capture functional regulators critical for cell identity, while accounting for a variety of technical differences including sampled tissues, sequencing depth and author assigned cell-type labels. Extracting the regulatory crosstalk from mouse atlases, we identify and distinguish global regulons active in multiple cell-types from specialised cell-type specific regulons. We demonstrate that regulon activities accurately distinguish individual cell types, despite differences between individual atlases. We generate an integrated network that further uncovers regulon modules with coordinated activities critical for cell-types, and validate modules using available experimental data. Inferring regulatory networks during myeloid differentiation from wildtype and Irf8 KO cells, we uncover functional contribution of Irf8 regulon activity and composition towards monocyte lineage. Our analysis provides an avenue to further extract and integrate the regulatory crosstalk from single-cell expression data.SummaryIntegrated single-cell gene regulatory network from three mouse cell atlases captures global and cell-type specific regulatory modules and crosstalk, important for cellular identity.


2018 ◽  
Vol 34 (13) ◽  
pp. i79-i88 ◽  
Author(s):  
Maziyar Baran Pouyan ◽  
Dennis Kostka

2019 ◽  
Vol 101 (3) ◽  
pp. 716-730 ◽  
Author(s):  
Ryan J. Spurney ◽  
Lisa Van den Broeck ◽  
Natalie M. Clark ◽  
Adam P. Fisher ◽  
Maria A. de Luis Balaguer ◽  
...  

2021 ◽  
Author(s):  
Meichen Dong ◽  
Yiping He ◽  
Yuchao Jiang ◽  
Fei Zou

In contrast to differential gene expression analysis at single-gene level, gene regulatory networks (GRN) analysis depicts complex transcriptomic interactions among genes for better understandings of underlying genetic architectures of human diseases and traits. Recently, single-cell RNA sequencing (scRNA-seq) data has started to be used for constructing GRNs at a much finer resolution than bulk RNA-seq data and microarray data. However, scRNA-seq data are inherently sparse which hinders the direct application of the popular Gaussian graphical models (GGMs). Furthermore, most existing approaches for constructing GRNs with scRNA-seq data only consider gene networks under one condition. To better understand GRNs under different but related conditions with single-cell resolution, we propose to construct Joint Gene Networks with scRNA-seq data (JGNsc) using the GGMs framework. To facilitate the use of GGMs, JGNsc first proposes a hybrid imputation procedure that combines a Bayesian zero-inflated Poisson (ZIP) model with an iterative low-rank matrix completion step to efficiently impute zero-inflated counts resulted from technical artifacts. JGNsc then transforms the imputed data via a nonparanormal transformation, based on which joint GGMs are constructed. We demonstrate JGNsc and assess its performance using synthetic data. The application of JGNsc on two cancer clinical studies of medulloblastoma and glioblastoma identifies novel findings in addition to confirming well-known biological results.


2019 ◽  
Author(s):  
Qiao Wen Tan ◽  
Marek Mutwil

0.ABSTRACTPrediction of gene function and gene regulatory networks is one of the most active topics in bioinformatics. The accumulation of publicly available gene expression data for hundreds of plant species, together with advances in bioinformatical methods and affordable computing, sets ingenuity as the major bottleneck in understanding gene function and regulation. Here, we show how a credit card-sized computer retailing for less than 50 USD can be used to rapidly predict gene function and infer regulatory networks from RNA sequencing data. To achieve this, we constructed a bioinformatical pipeline that downloads and allows quality-control of RNA sequencing data; and generates a gene co-expression network that can reveal enzymes and transcription factors participating and controlling a given biosynthetic pathway. We exemplify this by first identifying genes and transcription factors involved in the biosynthesis of secondary cell wall in the plant Artemisia annua, the main natural source of the anti-malarial drug artemisinin. Networks were then used to dissect the artemisinin biosynthesis pathway, which suggest potential transcription factors regulating artemisinin biosynthesis. We provide the source code of our pipeline and envision that the ubiquity of affordable computing, availability of biological data and increased bioinformatical training of biologists will transform the field of bioinformatics.HighlightsProcessing of large scale transcriptomic data with affordable single-board computersTranscription factors can be found in the same network as their targetsCo-expression of transcription factors and genes in secondary cell wall biosynthesisCo-expression of transcription factors and genes involved in artemisinin biosynthesis


Cell Reports ◽  
2020 ◽  
Vol 33 (10) ◽  
pp. 108472
Author(s):  
Zhaoning Wang ◽  
Miao Cui ◽  
Akansha M. Shah ◽  
Wei Tan ◽  
Ning Liu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document