Bulk Tissue Cell Type Deconvolution with Multi-Subject Single-Cell Expression Reference

AbstractWe present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables characterization of cellular heterogeneity of complex tissues for identification of disease mechanisms.

Download Full-text

Single-cell network biology for resolving cellular heterogeneity in human diseases

Experimental & Molecular Medicine ◽

10.1038/s12276-020-00528-0 ◽

2020 ◽

Vol 52 (11) ◽

pp. 1798-1808

Author(s):

Junha Cha ◽

Insuk Lee

Keyword(s):

Single Cell ◽

Gene Networks ◽

Cell Types ◽

Network Biology ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Transcriptome Data ◽

Cell Type ◽

Cell Network ◽

Cell Type Specific

AbstractUnderstanding cellular heterogeneity is the holy grail of biology and medicine. Cells harboring identical genomes show a wide variety of behaviors in multicellular organisms. Genetic circuits underlying cell-type identities will facilitate the understanding of the regulatory programs for differentiation and maintenance of distinct cellular states. Such a cell-type-specific gene network can be inferred from coregulatory patterns across individual cells. Conventional methods of transcriptome profiling using tissue samples provide only average signals of diverse cell types. Therefore, reconstructing gene regulatory networks for a particular cell type is not feasible with tissue-based transcriptome data. Recently, single-cell omics technology has emerged and enabled the capture of the transcriptomic landscape of every individual cell. Although single-cell gene expression studies have already opened up new avenues, network biology using single-cell transcriptome data will further accelerate our understanding of cellular heterogeneity. In this review, we provide an overview of single-cell network biology and summarize recent progress in method development for network inference from single-cell RNA sequencing (scRNA-seq) data. Then, we describe how cell-type-specific gene networks can be utilized to study regulatory programs specific to disease-associated cell types and cellular states. Moreover, with scRNA data, modeling personal or patient-specific gene networks is feasible. Therefore, we also introduce potential applications of single-cell network biology for precision medicine. We envision a rapid paradigm shift toward single-cell network analysis for systems biology in the near future.

Download Full-text

A United Statistical Framework for Single Cell and Bulk Sequencing Data

10.1101/206532 ◽

2017 ◽

Cited By ~ 1

Author(s):

Lingxue Zhu ◽

Jing Lei ◽

Bernie Devlin ◽

Kathryn Roeder

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Accurate Estimation ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific ◽

Different Cell Types ◽

Cell Data

Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the “dropout” events. A “dropout” happens when the RNA for a gene fails to be amplified prior to sequencing, producing a “false” zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile. In addition, URSM naturally provides inference on the dropout entries in single cell data that need to be imputed for downstream analyses, as well as the mixing proportions of different cell types in bulk samples. We adopt an empirical Bayes approach, where parameters are estimated using the EM algorithm and approximate inference is obtained by Gibbs sampling. Simulation results illustrate that URSM outperforms existing approaches both in correcting for dropouts in single cell data, as well as in deconvolving bulk samples. We also demonstrate an application to gene expression data on fetal brains, where our model successfully imputes the dropout genes and reveals cell type specific expression patterns.

Download Full-text

Comprehensive analysis of single cell ATAC-seq data with SnapATAC

Nature Communications ◽

10.1038/s41467-021-21583-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Rongxin Fang ◽

Sebastian Preissl ◽

Yang Li ◽

Xiaomeng Hou ◽

Jacinta Lucero ◽

...

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Expression Patterns ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Open Chromatin ◽

Cell Type ◽

Process Data ◽

Cell Type Specific

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. Single cell analysis of accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volume of data pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC is applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis reveals ~370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate cell-type specific transcriptional regulators.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

CellO: Comprehensive and hierarchical cell type classification of human cells with the Cell Ontology

10.1101/634097 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew N. Bernstein ◽

Zhongjie Ma ◽

Michael Gleicher ◽

Colin N. Dewey

Keyword(s):

Single Cell ◽

Web Application ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Training Set ◽

Sequence Read Archive ◽

Cell Ontology ◽

Cell Type Specific ◽

Type Classification

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract

Download Full-text

DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-Seq Data

10.1101/2021.02.03.429484 ◽

2021 ◽

Author(s):

Jiaxing Chen ◽

Chinwang Cheong ◽

Liang Lan ◽

Xin Zhou ◽

Jiming Liu ◽

...

Keyword(s):

Neural Network ◽

Single Cell ◽

Regulatory Networks ◽

Deep Neural Network ◽

Neighborhood Context ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Rna Seq ◽

Cell Type Specific ◽

Gene Regulatory

AbstractSingle-cell RNA sequencing is used to capture cell-specific gene expression, thus allowing reconstruction of gene regulatory networks. The existing algorithms struggle to deal with dropouts and cellular heterogeneity, and commonly require pseudotime-ordered cells. Here, we describe DeepDRIM a supervised deep neural network that represents gene pair joint expression as images and considers the neighborhood context to eliminate the transitive interactions. Deep-DRIM yields significantly better performance than the other nine algorithms used on the eight cell lines tested, and can be used to successfully discriminate key functional modules between patients with mild and severe symptoms of coronavirus disease 2019 (COVID-19).

Download Full-text

A cis-regulatory atlas in maize at single-cell resolution

10.1101/2020.09.27.315499 ◽

2020 ◽

Author(s):

Alexandre P. Marand ◽

Zongliang Chen ◽

Andrea Gallavotti ◽

Robert J. Schmitz

Keyword(s):

Single Cell ◽

Cell Types ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Cell Type ◽

Crop Species ◽

Cell Functions ◽

Cell Type Specific ◽

Type Specification ◽

Accessible Chromatin

ABSTRACTCis-regulatory elements (CREs) encode the genomic blueprints for coordinating spatiotemporal gene expression programs underlying highly specialized cell functions. To identify CREs underlying cell-type specification and developmental transitions, we implemented single-cell sequencing of Assay for Transposase Accessible Chromatin in an atlas of Zea mays organs. We describe 92 distinct states of chromatin accessibility across more than 165,913 putative CREs, 56,575 cells, and 52 known cell-types in maize using a novel implementation of regularized quasibinomial logistic regression. Cell states were largely determined by combinatorial accessibility of transcription factors (TFs) and their binding sites. A neural network revealed that cell identity could be accurately predicted (>0.94) solely based on TF binding site accessibility. Co-accessible chromatin recapitulated higher-order chromatin interactions, with distinct sets of TFs coordinating cell type-specific regulatory dynamics. Pseudotime reconstruction and alignment with Arabidopsis thaliana trajectories identified conserved TFs, associated motifs, and cis-regulatory regions specifying sequential developmental progressions. Cell-type specific accessible chromatin regions were enriched with phenotype-associated genetic variants and signatures of selection, revealing the major cell-types and putative CREs targeted by modern maize breeding. Collectively, our analysis affords a comprehensive framework for understanding cellular heterogeneity, evolution, and cis-regulatory grammar of cell-type specification in a major crop species.

Download Full-text

Unsupervised cell functional annotation for single-cell RNA-Seq

10.1101/2021.11.20.469410 ◽

2021 ◽

Author(s):

Dongshunyi Li ◽

Jun Ding ◽

Ziv Bar-Joseph

Keyword(s):

Single Cell ◽

Dimensional Space ◽

Cell Types ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Sequencing Data ◽

Gene Sets ◽

Supervised Methods ◽

Low Dimensional

One of the first steps in the analysis of single cell RNA-Sequencing data (scRNA-Seq) is the assignment of cell types. While a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both, low dimension representation for all genes and cell specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-Seq datasets from several different organs. As we show, by using knowledge on gene sets, UNIFAN greatly outperforms prior methods developed for clustering scRNA-Seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster making annotations easier.

Download Full-text

Single-cell RNA-seq reveals the transcriptional landscape in ischemic stroke

Journal of Cerebral Blood Flow & Metabolism ◽

10.1177/0271678x211026770 ◽

2021 ◽

pp. 0271678X2110267

Author(s):

Kai Zheng ◽

Lingmin Lin ◽

Wei Jiang ◽

Lin Chen ◽

Xiyue Zhang ◽

...

Keyword(s):

Ischemic Stroke ◽

Single Cell ◽

Expression Patterns ◽

Cellular Heterogeneity ◽

Ependymal Cells ◽

Specific Gene ◽

Specific Cell ◽

Cell Type ◽

Cns Inflammation ◽

Cell Type Specific

Ischemic stroke (IS) is a detrimental neurological disease with limited treatments options. It has been challenging to define the roles of brain cell subsets in IS onset and progression due to cellular heterogeneity in the CNS. Here, we employed single-cell RNA sequencing (scRNA-seq) to comprehensively map the cell populations in the mouse model of MCAO (middle cerebral artery occlusion). We identified 17 principal brain clusters with cell-type specific gene expression patterns as well as specific cell subpopulations and their functions in various pathways. The CNS inflammation triggered upregulation of key cell type-specific genes unpublished before. Notably, microglia displayed a cell differentiation diversity after stroke among its five distinct subtypes. Importantly, we found the potential trajectory branches of the monocytes/macrophage’s subsets. Finally, we also identified distinct subclusters among brain vasculature cells, ependymal cells and other glia cells. Overall, scRNA-seq revealed the precise transcriptional changes during neuroinflammation at the single-cell level, opening up a new field for exploration of the disease mechanisms and drug discovery in stroke based on the cell-subtype specific molecules.

Download Full-text

CDSeqR: fast complete deconvolution for gene expression data from bulk tissues

BMC Bioinformatics ◽

10.1186/s12859-021-04186-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Kai Kang ◽

Caizhi Huang ◽

Yuanyuan Li ◽

David M. Umbach ◽

Leping Li

Keyword(s):

Gene Expression ◽

Cell Types ◽

Biological Tissues ◽

Specific Gene ◽

Specific Cell ◽

Specific Information ◽

Expression Data ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific

Abstract Background Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. Result We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Conclusions The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell–cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information.

Download Full-text