scholarly journals MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data

2017 ◽  
Author(s):  
David van Dijk ◽  
Juozas Nainys ◽  
Roshan Sharma ◽  
Pooja Kaithail ◽  
Ambrose J. Carr ◽  
...  

ABSTRACTSingle-cell RNA-sequencing is fast becoming a major technology that is revolutionizing biological discovery in fields such as development, immunology and cancer. The ability to simultaneously measure thousands of genes at single cell resolution allows, among other prospects, for the possibility of learning gene regulatory networks at large scales. However, scRNA-seq technologies suffer from many sources of significant technical noise, the most prominent of which is ‘dropout’ due to inefficient mRNA capture. This results in data that has a high degree of sparsity, with typically only ~10% non-zero values. To address this, we developed MAGIC (Markov Affinity-based Graph Imputation of Cells), a method for imputing missing values, and restoring the structure of the data. After MAGIC, we find that two- and three-dimensional gene interactions are restored and that MAGIC is able to impute complex and non-linear shapes of interactions. MAGIC also retains cluster structure, enhances cluster-specific gene interactions and restores trajectories, as demonstrated in mouse retinal bipolar cells, hematopoiesis, and our newly generated epithelial-to-mesenchymal transition dataset.

2021 ◽  
Vol 118 (19) ◽  
pp. e2102050118
Author(s):  
Abhijeet P. Deshmukh ◽  
Suhas V. Vasaikar ◽  
Katarzyna Tomczak ◽  
Shubham Tripathi ◽  
Petra den Hollander ◽  
...  

The epithelial-to-mesenchymal transition (EMT) plays a critical role during normal development and in cancer progression. EMT is induced by various signaling pathways, including TGF-β, BMP, Wnt–β-catenin, NOTCH, Shh, and receptor tyrosine kinases. In this study, we performed single-cell RNA sequencing on MCF10A cells undergoing EMT by TGF-β1 stimulation. Our comprehensive analysis revealed that cells progress through EMT at different paces. Using pseudotime clustering reconstruction of gene-expression profiles during EMT, we found sequential and parallel activation of EMT signaling pathways. We also observed various transitional cellular states during EMT. We identified regulatory signaling nodes that drive EMT with the expression of important microRNAs and transcription factors. Using a random circuit perturbation methodology, we demonstrate that the NOTCH signaling pathway acts as a key driver of TGF-β–induced EMT. Furthermore, we demonstrate that the gene signatures of pseudotime clusters corresponding to the intermediate hybrid EMT state are associated with poor patient outcome. Overall, this study provides insight into context-specific drivers of cancer progression and highlights the complexities of the EMT process.


2019 ◽  
Vol 28 (21) ◽  
pp. 3569-3583 ◽  
Author(s):  
Patricia M Schnepp ◽  
Mengjie Chen ◽  
Evan T Keller ◽  
Xiang Zhou

Abstract Integrating single-cell RNA sequencing (scRNA-seq) data with genotypes obtained from DNA sequencing studies facilitates the detection of functional genetic variants underlying cell type-specific gene expression variation. Unfortunately, most existing scRNA-seq studies do not come with DNA sequencing data; thus, being able to call single nucleotide variants (SNVs) from scRNA-seq data alone can provide crucial and complementary information, detection of functional SNVs, maximizing the potential of existing scRNA-seq studies. Here, we perform extensive analyses to evaluate the utility of two SNV calling pipelines (GATK and Monovar), originally designed for SNV calling in either bulk or single-cell DNA sequencing data. In both pipelines, we examined various parameter settings to determine the accuracy of the final SNV call set and provide practical recommendations for applied analysts. We found that combining all reads from the single cells and following GATK Best Practices resulted in the highest number of SNVs identified with a high concordance. In individual single cells, Monovar resulted in better quality SNVs even though none of the pipelines analyzed is capable of calling a reasonable number of SNVs with high accuracy. In addition, we found that SNV calling quality varies across different functional genomic regions. Our results open doors for novel ways to leverage the use of scRNA-seq for the future investigation of SNV function.


2018 ◽  
Author(s):  
Wenhao Tang ◽  
François Bertaux ◽  
Philipp Thomas ◽  
Claire Stefanelli ◽  
Malika Saint ◽  
...  

Normalisation of single cell RNA sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability and high amounts of missing observations typical of scRNA-seq datasets make this task particularly challenging. Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We demonstrate using publicly-available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule FISH measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared to other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalisation, imputation and true count recovery of gene expression measurements from scRNA-seq data.


2019 ◽  
Author(s):  
Ana Carolina Leote ◽  
Xiaohui Wu ◽  
Andreas Beyer

AbstractSingle-cell RNA sequencing (scRNA-seq) methods are typically unable to quantify the expression levels of all genes in a cell, creating a need for the computational prediction of missing values (‘dropout imputation’). Most existing dropout imputation methods are limited in the sense that they exclusively use the scRNA-seq dataset at hand and do not exploit external gene-gene relationship information.Here, we show that a transcriptional regulatory network learned from external, independent gene expression data improves dropout imputation. Using a variety of human scRNA-seq datasets we demonstrate that our network-based approach outperforms published state-of-the-art methods. The network-based approach performs particularly well for lowly expressed genes, including cell-type-specific transcriptional regulators. Additionally, we tested a baseline approach, where we imputed missing values using the sample-wide average expression of a gene. Unexpectedly, up to 48% of the genes were better predicted using this baseline approach, suggesting negligible cell-to-cell variation of expression levels for many genes. Our work shows that there is no single best imputation method; rather, the best method depends on gene-specific features, such as expression level and expression variation across cells. We thus implemented an R-package called ADImpute (available from https://github.com/anacarolinaleote/ADImpute) that automatically determines the best imputation method for each gene in a dataset.


Author(s):  
Rui-Qi Wang ◽  
Wei Zhao ◽  
Hai-Kui Yang ◽  
Jia-Mei Dong ◽  
Wei-Jie Lin ◽  
...  

Colorectal cancer (CRC) manifests as gastrointestinal tumors with high intratumoral heterogeneity. Recent studies have demonstrated that CRC may consist of tumor cells with different consensus molecular subtypes (CMS). The advancements in single-cell RNA sequencing have facilitated the development of gene regulatory networks to decode key regulators for specific cell types. Herein, we comprehensively analyzed the CMS of CRC patients by using single-cell RNA-sequencing data. CMS for all malignant cells were assigned using CMScaller. Gene set variation analysis showed pathway activity differences consistent with those reported in previous studies. Cell–cell communication analysis confirmed that CMS1 was more closely related to immune cells, and that monocytes and macrophages play dominant roles in the CRC tumor microenvironment. On the basis of the constructed gene regulation networks (GRNs) for each subtype, we identified that the critical transcription factor ERG is universally activated and upregulated in all CMS in comparison with normal cells, and that it performed diverse roles by regulating the expression of different downstream genes. In summary, molecular subtyping of single-cell RNA-sequencing data for colorectal cancer could elucidate the heterogeneity in gene regulatory networks and identify critical regulators of CRC.


2021 ◽  
Author(s):  
Lukas J Vlahos ◽  
Aleksandar Obradovic ◽  
Pasquale Laise ◽  
Jeremy Worley ◽  
Xiangtian Tan ◽  
...  

While single-cell RNA sequencing provides a new window on physiologic and pathologic tissue biology and heterogeneity, it suffers from low signal-to-noise ratio and a high dropout rate at the individual gene level, thus challenging quantitative analyses. To address this problem, we introduce PISCES (Protein-activity Inference for Single Cell Studies), an integrated analytical framework for the protein activity-based analysis of single cell subpopulations. PISCES leverages the assembly of lineage-specific gene regulatory networks, to accurately measure activity of each protein based on the expression its transcriptional targets (regulon), using the ARACNe and metaVIPER algorithms, respectively. It implements novel analytical and visualization functions, including activity-based cluster analysis, identification of cell state repertoires, and elucidation of master regulators of cell state and cell state transitions, with full interoperability with Seurat's single-cell data format. Accuracy and reproducibility assessment, via technical and biological validation assays and by assessing concordance with antibody and CITE-Seq-based measurements, show dramatic improvement in the ability to identify rare subpopulations and to assess activity of key lineage markers, compared to gene expression analysis.


2021 ◽  
Author(s):  
Meichen Dong ◽  
Yiping He ◽  
Yuchao Jiang ◽  
Fei Zou

In contrast to differential gene expression analysis at single-gene level, gene regulatory networks (GRN) analysis depicts complex transcriptomic interactions among genes for better understandings of underlying genetic architectures of human diseases and traits. Recently, single-cell RNA sequencing (scRNA-seq) data has started to be used for constructing GRNs at a much finer resolution than bulk RNA-seq data and microarray data. However, scRNA-seq data are inherently sparse which hinders the direct application of the popular Gaussian graphical models (GGMs). Furthermore, most existing approaches for constructing GRNs with scRNA-seq data only consider gene networks under one condition. To better understand GRNs under different but related conditions with single-cell resolution, we propose to construct Joint Gene Networks with scRNA-seq data (JGNsc) using the GGMs framework. To facilitate the use of GGMs, JGNsc first proposes a hybrid imputation procedure that combines a Bayesian zero-inflated Poisson (ZIP) model with an iterative low-rank matrix completion step to efficiently impute zero-inflated counts resulted from technical artifacts. JGNsc then transforms the imputed data via a nonparanormal transformation, based on which joint GGMs are constructed. We demonstrate JGNsc and assess its performance using synthetic data. The application of JGNsc on two cancer clinical studies of medulloblastoma and glioblastoma identifies novel findings in addition to confirming well-known biological results.


2021 ◽  
Author(s):  
Boris M. Brenerman ◽  
Benjamin D. Shapiro ◽  
Michael C. Schatz ◽  
Alexis Battle

AbstractSingle-cell RNA sequencing data contain patterns of correlation that are poorly captured by techniques that rely on linear estimation or assumptions of Gaussian behavior. We apply random forest regression to scRNAseq data from mouse brains, which identifies the co-regulation of genes within specific cellular contexts. By analyzing the estimators of the random forest, we identify several novel candidate gene regulatory networks and compare these networks in aged and young mice. We demonstrate that cell populations have cell-type specific phenotypes of aging that are not detected by other methods, including the collapse of differentiating oligodendrocytes but not precursors or mature oligodendrocytes.


2020 ◽  
Author(s):  
Harsh Shrivastava ◽  
Xiuwei Zhang ◽  
Srinivas Aluru ◽  
Le Song

AbstractMotivationGene regulatory networks (GRNs) are graphs that specify the interactions between transcription factors (TFs) and their target genes. Understanding these interactions is crucial for studying the mechanisms in cell differentiation, growth and development. Computational methods are needed to infer these networks from measured data. Although the availability of single cell RNA-Sequencing (scRNA-Seq) data provides unprecedented scale and resolution of gene-expression data, the inference of GRNs remains a challenge, mainly due to the complexity of the regulatory relationships and the noise in the data.ResultsWe propose GRNUlar, a novel deep learning architecture based on the unrolled algorithms idea for GRN inference from scRNA-Seq data. Like some existing methods which use prior information of which genes are TFs, GRNUlar also incorporates this TF information using a sparse multi-task deep learning architecture. We also demonstrate the application of a recently developed unrolled architecture GLAD to recover undirected GRNs in the absence of TF information. These unrolled architectures require supervision to train, for which we leverage the existing synthetic data simulators which generate scRNA-Seq data guided by a GRN. We show that unrolled algorithms outperform the state-of-the-art methods on synthetic data as well as real datasets in both the settings of TF information being absent or available.AvailabilityGithub link to GRNUlar - https://github.com/Harshs27/[email protected]


Sign in / Sign up

Export Citation Format

Share Document