Identifying Cell Type-Specific Chemokine Correlates with Hierarchical Signal Extraction from Single-Cell Transcriptomes

Abstract Motivation While generative models have shown great success in sampling high-dimensional samples conditional on low-dimensional descriptors (stroke thickness in MNIST, hair color in CelebA, speaker identity in WaveNet), their generation out-of-distribution poses fundamental problems due to the difficulty of learning compact joint distribution across conditions. The canonical example of the conditional variational autoencoder (CVAE), for instance, does not explicitly relate conditions during training and, hence, has no explicit incentive of learning such a compact representation. Results We overcome the limitation of the CVAE by matching distributions across conditions using maximum mean discrepancy in the decoder layer that follows the bottleneck. This introduces a strong regularization both for reconstructing samples within the same condition and for transforming samples across conditions, resulting in much improved generalization. As this amount to solving a style-transfer problem, we refer to the model as transfer VAE (trVAE). Benchmarking trVAE on high-dimensional image and single-cell RNA-seq, we demonstrate higher robustness and higher accuracy than existing approaches. We also show qualitatively improved predictions by tackling previously problematic minority classes and multiple conditions in the context of cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data. For generic tasks, we improve Pearson correlations of high-dimensional estimated means and variances with their ground truths from 0.89 to 0.97 and 0.75 to 0.87, respectively. We further demonstrate that trVAE learns cell-type-specific responses after perturbation and improves the prediction of most cell-type-specific genes by 65%. Availability and implementation The trVAE implementation is available via github.com/theislab/trvae. The results of this article can be reproduced via github.com/theislab/trvae_reproducibility.

Download Full-text

Comprehensive analysis of single cell ATAC-seq data with SnapATAC

Nature Communications ◽

10.1038/s41467-021-21583-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Rongxin Fang ◽

Sebastian Preissl ◽

Yang Li ◽

Xiaomeng Hou ◽

Jacinta Lucero ◽

...

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Expression Patterns ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Open Chromatin ◽

Cell Type ◽

Process Data ◽

Cell Type Specific

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. Single cell analysis of accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volume of data pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC is applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis reveals ~370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate cell-type specific transcriptional regulators.

Download Full-text

Single-cell RNA sequencing of the mammalian pineal gland identifies two pinealocyte subtypes and cell type-specific daily patterns of gene expression

PLoS ONE ◽

10.1371/journal.pone.0205883 ◽

2018 ◽

Vol 13 (10) ◽

pp. e0205883 ◽

Cited By ~ 9

Author(s):

Joseph C. Mays ◽

Michael C. Kelly ◽

Steven L. Coon ◽

Lynne Holtzclaw ◽

Martin F. Rath ◽

...

Keyword(s):

Gene Expression ◽

Pineal Gland ◽

Single Cell ◽

Rna Sequencing ◽

Cell Type ◽

Single Cell Rna Sequencing ◽

Cell Type Specific ◽

Mammalian Pineal Gland ◽

Daily Patterns

Download Full-text

In vivo single-cell profiling of lncRNAs during Ebola virus infection

10.1101/2022.01.12.476002 ◽

2022 ◽

Author(s):

Luisa Santus ◽

Raquel García-Pérez ◽

Maria Sopena-Rios ◽

Aaron E Lin ◽

Gordon C Adams ◽

...

Keyword(s):

Viral Infection ◽

Single Cell ◽

Ebola Virus ◽

Cell Type ◽

Protein Coding ◽

Expression Variation ◽

Lncrna Expression ◽

Ebov Infection ◽

Cell Type Specific

Long non-coding RNAs (lncRNAs) are pivotal mediators of systemic immune response to viral infection, yet most studies concerning their expression and functions upon immune stimulation are limited to in vitro bulk cell populations. This strongly constrains our understanding of how lncRNA expression varies at single-cell resolution, and how their cell-type specific immune regulatory roles may differ compared to protein-coding genes. Here, we perform the first in-depth characterization of lncRNA expression variation at single-cell resolution during Ebola virus (EBOV) infection in vivo. Using bulk RNA-sequencing from 119 samples and 12 tissue types, we significantly expand the current macaque lncRNA annotation. We then profile lncRNA expression variation in immune circulating single-cells during EBOV infection and find that lncRNAs' expression in fewer cells is a major differentiating factor from their protein-coding gene counterparts. Upon EBOV infection, lncRNAs present dynamic and mostly cell-type specific changes in their expression profiles especially in monocytes, the main cell type targeted by EBOV. Such changes are associated with gene regulatory modules related to important innate immune responses such as interferon response and purine metabolism. Within infected cells, several lncRNAs have positively and negatively correlated expression with viral load, suggesting that expression of some of these lncRNAs might be directly hijacked by EBOV to attack host cells. This study provides novel insights into the roles that lncRNAs play in the host response to acute viral infection and paves the way for future lncRNA studies at single-cell resolution.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

CellO: Comprehensive and hierarchical cell type classification of human cells with the Cell Ontology

10.1101/634097 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew N. Bernstein ◽

Zhongjie Ma ◽

Michael Gleicher ◽

Colin N. Dewey

Keyword(s):

Single Cell ◽

Web Application ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Training Set ◽

Sequence Read Archive ◽

Cell Ontology ◽

Cell Type Specific ◽

Type Classification

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract

Download Full-text

ICTD: A semi-supervised cell type identification and deconvolution method for multi-omics data

10.1101/426593 ◽

2018 ◽

Cited By ~ 2

Author(s):

Wennan Chang ◽

Changlin Wan ◽

Xiaoyu Lu ◽

Szu-wei Tu ◽

Yifan Sun ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Training Data ◽

Marker Genes ◽

Cell Detection ◽

Omics Data ◽

Deconvolution Method ◽

Cell Type ◽

Data Set ◽

Cell Type Specific

AbstractWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.

Download Full-text

Capturing cell type-specific chromatin structural patterns by applying topic modeling to single-cell Hi-C data

10.1101/534800 ◽

2019 ◽

Cited By ~ 2

Author(s):

Hyeon-Jin Kim ◽

Galip Gürkan Yardımcı ◽

Giancarlo Bonora ◽

Vijay Ramani ◽

Jie Liu ◽

...

Keyword(s):

Single Cell ◽

Topic Modeling ◽

Biological Information ◽

Chromatin Interaction ◽

Cell Type ◽

3D Genome ◽

Genome Wide ◽

Significant Barrier ◽

Chromatin Structural ◽

Cell Type Specific

AbstractSingle-cell Hi-C (scHi-C) interrogates genome-wide chromatin interaction in individual cells, allowing us to gain insights into 3D genome organization. However, the extremely sparse nature of scHi-C data poses a significant barrier to analysis, limiting our ability to tease out hidden biological information. In this work, we approach this problem by applying topic modeling to scHi-C data. Topic modeling is well-suited for discovering latent topics in a collection of discrete data. For our analysis, we generate twelve different single-cell combinatorial indexed Hi-C (sciHi-C) libraries from five human cell lines (GM12878, H1Esc, HFF, IMR90, and HAP1), consisting over 25,000 cells. We demonstrate that topic modeling is able to successfully capture cell type differences from sciHi-C data in the form of “chromatin topics.” We further show enrichment of particular compartment structures associated with locus pairs in these topics.

Download Full-text

Single-cell RNA sequencing reveals cell type- and artery type-specific vascular remodelling in male spontaneously hypertensive rats

Cardiovascular Research ◽

10.1093/cvr/cvaa164 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jun Cheng ◽

Wenduo Gu ◽

Ting Lan ◽

Jiacheng Deng ◽

Zhichao Ni ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Spontaneously Hypertensive Rats ◽

Cell Types ◽

Vascular Remodelling ◽

Cell Type ◽

Hypertensive Rats ◽

Spontaneously Hypertensive ◽

Single Cell Rna Sequencing ◽

Cell Type Specific

Abstract Aims Hypertension is a major risk factor for cardiovascular diseases. However, vascular remodelling, a hallmark of hypertension, has not been systematically characterized yet. We described systematic vascular remodelling, especially the artery type- and cell type-specific changes, in hypertension using spontaneously hypertensive rats (SHRs). Methods and results Single-cell RNA sequencing was used to depict the cell atlas of mesenteric artery (MA) and aortic artery (AA) from SHRs. More than 20 000 cells were included in the analysis. The number of immune cells more than doubled in aortic aorta in SHRs compared to Wistar Kyoto controls, whereas an expansion of MA mesenchymal stromal cells (MSCs) was observed in SHRs. Comparison of corresponding artery types and cell types identified in integrated datasets unravels dysregulated genes specific for artery types and cell types. Intersection of dysregulated genes with curated gene sets including cytokines, growth factors, extracellular matrix (ECM), receptors, etc. revealed vascular remodelling events involving cell–cell interaction and ECM re-organization. Particularly, AA remodelling encompasses upregulated cytokine genes in smooth muscle cells, endothelial cells, and especially MSCs, whereas in MA, change of genes involving the contractile machinery and downregulation of ECM-related genes were more prominent. Macrophages and T cells within the aorta demonstrated significant dysregulation of cellular interaction with vascular cells. Conclusion Our findings provide the first cell landscape of resistant and conductive arteries in hypertensive animal models. Moreover, it also offers a systematic characterization of the dysregulated gene profiles with unbiased, artery type-specific and cell type-specific manners during hypertensive vascular remodelling.

Download Full-text