A deep generative model for multi-view profiling of single-cell RNA-seq and ATAC-seq data

Gaoyang Li; Shaliu Fu; Shuguang Wang; Chenyu Zhu; Bin Duan; Chen Tang; Xiaohan Chen; Guohui Chuai; Ping Wang; Qi Liu

doi:10.1186/s13059-021-02595-6

A deep generative model for multi-view profiling of single-cell RNA-seq and ATAC-seq data

Genome Biology ◽

10.1186/s13059-021-02595-6 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Gaoyang Li ◽

Shaliu Fu ◽

Shuguang Wang ◽

Chenyu Zhu ◽

Bin Duan ◽

...

Keyword(s):

Single Cell ◽

Regulatory Element ◽

Chromatin Accessibility ◽

Generative Model ◽

Developmental Trajectory ◽

Differential Analysis ◽

Sequencing Data ◽

Cell Clustering ◽

Cell Groups ◽

Latent Representations

AbstractHere, we present a multi-modal deep generative model, the single-cell Multi-View Profiler (scMVP), which is designed for handling sequencing data that simultaneously measure gene expression and chromatin accessibility in the same cell, including SNARE-seq, sci-CAR, Paired-seq, SHARE-seq, and Multiome from 10X Genomics. scMVP generates common latent representations for dimensionality reduction, cell clustering, and developmental trajectory inference and generates separate imputations for differential analysis and cis-regulatory element identification. scMVP can help mitigate data sparsity issues with imputation and accurately identify cell groups for different joint profiling techniques with common latent embedding, and we demonstrate its advantages on several realistic datasets.

Download Full-text

APEC: an accesson-based method for single-cell chromatin accessibility analysis

10.1101/646331 ◽

2019 ◽

Author(s):

Bin Li ◽

Young Li ◽

Kun Li ◽

Lianbang Zhu ◽

Qiaoni Yu ◽

...

Keyword(s):

Single Cell ◽

Chromatin Accessibility ◽

Developmental Trajectory ◽

Data Sets ◽

Gene Expressions ◽

Sequencing Technologies ◽

Cell Clustering ◽

Public Data ◽

Accessibility Pattern ◽

Analytical Tools

ABSTRACTThe development of sequencing technologies has promoted the survey of genome-wide chromatin accessibility at single-cell resolution; however, comprehensive analysis of single-cell epigenomic profiles remains a challenge. Here, we introduce an accessibility pattern-based epigenomic clustering (APEC) method, which classifies each individual cell by groups of accessible regions with synergistic signal patterns termed “accessons”. By integrating with other analytical tools, this python-based APEC package greatly improves the accuracy of unsupervised single-cell clustering for many different public data sets. APEC also predicts gene expressions, identifies significant differential enriched motifs, discovers super enhancers, and projects pseudotime trajectories. Furthermore, we adopted a fluorescent tagmentation-based single-cell ATAC-seq technique (ftATAC-seq) to investigated the per cell regulome dynamics of mouse thymocytes. Associated with ftATAC-seq, APEC revealed a detailed epigenomic heterogeneity of thymocytes, characterized the developmental trajectory and predicted the regulators that control the stages of maturation process. Overall, this work illustrates a powerful approach to study single-cell epigenomic heterogeneity and regulome dynamics.

Download Full-text

Developmental trajectory of prehematopoietic stem cell formation from endothelium

Blood ◽

10.1182/blood.2020004801 ◽

2020 ◽

Vol 136 (7) ◽

pp. 845-856 ◽

Cited By ~ 11

Author(s):

Qin Zhu ◽

Peng Gao ◽

Joanna Tober ◽

Laura Bennett ◽

Changya Chen ◽

...

Keyword(s):

Single Cell ◽

Fetal Liver ◽

Developmental Trajectories ◽

Cell Formation ◽

Intermediate Stage ◽

Chromatin Accessibility ◽

Small Population ◽

Developmental Trajectory ◽

Mammalian Embryo ◽

Hematopoietic Stem

Abstract Hematopoietic stem and progenitor cells (HSPCs) in the bone marrow are derived from a small population of hemogenic endothelial (HE) cells located in the major arteries of the mammalian embryo. HE cells undergo an endothelial to hematopoietic cell transition, giving rise to HSPCs that accumulate in intra-arterial clusters (IAC) before colonizing the fetal liver. To examine the cell and molecular transitions between endothelial (E), HE, and IAC cells, and the heterogeneity of HSPCs within IACs, we profiled ∼40 000 cells from the caudal arteries (dorsal aorta, umbilical, vitelline) of 9.5 days post coitus (dpc) to 11.5 dpc mouse embryos by single-cell RNA sequencing and single-cell assay for transposase-accessible chromatin sequencing. We identified a continuous developmental trajectory from E to HE to IAC cells, with identifiable intermediate stages. The intermediate stage most proximal to HE, which we term pre-HE, is characterized by increased accessibility of chromatin enriched for SOX, FOX, GATA, and SMAD motifs. A developmental bottleneck separates pre-HE from HE, with RUNX1 dosage regulating the efficiency of the pre-HE to HE transition. A distal candidate Runx1 enhancer exhibits high chromatin accessibility specifically in pre-HE cells at the bottleneck, but loses accessibility thereafter. Distinct developmental trajectories within IAC cells result in 2 populations of CD45+ HSPCs; an initial wave of lymphomyeloid-biased progenitors, followed by precursors of hematopoietic stem cells (pre-HSCs). This multiomics single-cell atlas significantly expands our understanding of pre-HSC ontogeny.

Download Full-text

A Multi-center Cross-platform Single-cell RNA Sequencing Reference Dataset

10.1101/2020.09.20.305474 ◽

2020 ◽

Cited By ~ 1

Author(s):

Xin Chen ◽

Zhaowei Yang ◽

Wanqiu Chen ◽

Yongmei Zhao ◽

Andrew Farmer ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Whole Genome Sequencing Data ◽

Differential Analysis ◽

Sequencing Data ◽

Reference Dataset ◽

Batch Correction ◽

Distinct Cell ◽

Single Cell Rna Sequencing ◽

Benchmark Datasets

AbstractSingle-cell RNA sequencing (scRNA-seq) is developing rapidly, and investigators seeking to use this technology are left with a variety of options for both experimental platform and bioinformatics methods. There is an urgent need for scRNA-seq reference datasets for benchmarking of different scRNA-seq platforms and bioinformatics methods. To be broadly applicable, these should be generated from renewable, well characterized reference samples and processed in multiple centers across different platforms. Here we present a benchmarking scRNA-seq dataset that includes 20 scRNA-seq datasets acquired either as a mixtures or as individual samples from two biologically distinct cell lines for which a large amount of multi-platform whole genome sequencing data are also available. These scRNA-seq datasets were generated from multiple popular platforms across four sequencing centers. Our benchmark datasets provide a resource that we believe will have great value for the single-cell community by serving as a reference dataset for evaluating various bioinformatics methods for scRNA-seq analyses, including but not limited to data preprocessing, imputation, normalization, clustering, batch correction, and differential analysis.

Download Full-text

IKAP—Identifying K mAjor cell Population groups in single-cell RNA-sequencing analysis

GigaScience ◽

10.1093/gigascience/giz121 ◽

2019 ◽

Vol 8 (10) ◽

Cited By ~ 2

Author(s):

Yun-Ching Chen ◽

Abhilash Suresh ◽

Chingiz Underbayev ◽

Clare Sun ◽

Komudi Singh ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Sequencing Analysis ◽

Sequencing Data ◽

Peripheral Blood Mononuclear ◽

Biologically Relevant ◽

Single Cell Rna Sequencing ◽

Cell Groups ◽

Cell Ontology

AbstractBackgroundIn single-cell RNA-sequencing analysis, clustering cells into groups and differentiating cell groups by differentially expressed (DE) genes are 2 separate steps for investigating cell identity. However, the ability to differentiate between cell groups could be affected by clustering. This interdependency often creates a bottleneck in the analysis pipeline, requiring researchers to repeat these 2 steps multiple times by setting different clustering parameters to identify a set of cell groups that are more differentiated and biologically relevant.FindingsTo accelerate this process, we have developed IKAP—an algorithm to identify major cell groups and improve differentiating cell groups by systematically tuning parameters for clustering. We demonstrate that, with default parameters, IKAP successfully identifies major cell types such as T cells, B cells, natural killer cells, and monocytes in 2 peripheral blood mononuclear cell datasets and recovers major cell types in a previously published mouse cortex dataset. These major cell groups identified by IKAP present more distinguishing DE genes compared with cell groups generated by different combinations of clustering parameters. We further show that cell subtypes can be identified by recursively applying IKAP within identified major cell types, thereby delineating cell identities in a multi-layered ontology.ConclusionsBy tuning the clustering parameters to identify major cell groups, IKAP greatly improves the automation of single-cell RNA-sequencing analysis to produce distinguishing DE genes and refine cell ontology using single-cell RNA-sequencing data.

Download Full-text

scATAC-pro: a comprehensive workbench for single-cell chromatin accessibility sequencing data

Genome Biology ◽

10.1186/s13059-020-02008-0 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 2

Author(s):

Wenbao Yu ◽

Yasin Uzun ◽

Qin Zhu ◽

Changya Chen ◽

Kai Tan

Keyword(s):

Single Cell ◽

Chromatin Accessibility ◽

Sequencing Data

Download Full-text

A multi-center cross-platform single-cell RNA sequencing reference dataset

Scientific Data ◽

10.1038/s41597-021-00809-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Xin Chen ◽

Zhaowei Yang ◽

Wanqiu Chen ◽

Yongmei Zhao ◽

Andrew Farmer ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Whole Genome Sequencing Data ◽

Differential Analysis ◽

Sequencing Data ◽

Batch Correction ◽

Distinct Cell ◽

Single Cell Rna Sequencing ◽

Cross Platform ◽

Reference Samples

AbstractSingle-cell RNA sequencing (scRNA-seq) is developing rapidly, and investigators seeking to use this technology are left with a variety of options for both experimental platform and bioinformatics methods. There is an urgent need for scRNA-seq reference datasets for benchmarking of different scRNA-seq platforms and bioinformatics methods. To be broadly applicable, these should be generated from renewable, well characterized reference samples and processed in multiple centers across different platforms. Here we present a benchmark scRNA-seq dataset that includes 20 scRNA-seq datasets acquired either as mixtures or as individual samples from two biologically distinct cell lines for which a large amount of multi-platform whole genome sequencing data are also available. These scRNA-seq datasets were generated from multiple popular platforms across four sequencing centers. We believe the datasets we describe here will provide a resource that meets this need by allowing evaluation of various bioinformatics methods for scRNA-seq analyses, including but not limited to data preprocessing, imputation, normalization, clustering, batch correction, and differential analysis.

Download Full-text

A hierarchical Bayesian model for single-cell clustering using RNA-sequencing data

The Annals of Applied Statistics ◽

10.1214/19-aoas1250 ◽

2019 ◽

Vol 13 (3) ◽

pp. 1733-1752

Author(s):

Yiyi Liu ◽

Joshua L. Warren ◽

Hongyu Zhao

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Bayesian Model ◽

Hierarchical Bayesian ◽

Hierarchical Bayesian Model ◽

Sequencing Data ◽

Cell Clustering

Download Full-text

Single-cell epigenomics maps the continuous regulatory landscape of human hematopoietic differentiation

10.1101/109843 ◽

2017 ◽

Cited By ~ 8

Author(s):

Jason D Buenrostro ◽

M Ryan Corces ◽

Beijing Wu ◽

Alicia N Schep ◽

Caleb A Lareau ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Developmental Trajectory ◽

Hematopoietic Differentiation ◽

Ensemble Averaging ◽

Regulatory Variation ◽

Multipotent Cells ◽

Normal Human

AbstractNormal human hematopoiesis involves cellular differentiation of multipotent cells into progressively more lineage-restricted states. While epigenomic landscapes of this process have been explored in immunophenotypically-defined populations, the single-cell regulatory variation that defines hematopoietic differentiation has been hidden by ensemble averaging. We generated single-cell chromatin accessibility landscapes across 8 populations of immunophenotypically-defined human hematopoietic cell types. Using bulk chromatin accessibility profiles to scaffold our single-cell data analysis, we constructed an epigenomic landscape of human hematopoiesis and characterized epigenomic heterogeneity within phenotypically sorted populations to find epigenomic lineage-bias toward different developmental branches in multipotent stem cell states. We identify and isolate sub-populations within classically-defined granulocyte-macrophage progenitors (GMPs) and use ATAC-seq and RNA-seq to confirm that GMPs are epigenomically and transcriptomically heterogeneous. Furthermore, we identified transcription factors andcis-regulatory elements linked to changes in chromatin accessibility within cellular populations and across a continuous myeloid developmental trajectory, and observe relatively simple TF motif dynamics give rise to a broad diversity of accessibility dynamics at cis-regulatory elements. Overall, this work provides a template for exploration of complex regulatory dynamics in primary human tissues at the ultimate level of granular specificity – the single cell.One Sentence SummarySingle cell chromatin accessibility reveals a high-resolution, continuous landscape of regulatory variation in human hematopoiesis.

Download Full-text

ArchR: An integrative and scalable software package for single-cell chromatin accessibility analysis

10.1101/2020.04.28.066498 ◽

2020 ◽

Cited By ~ 18

Author(s):

Jeffrey M. Granja ◽

M. Ryan Corces ◽

Sarah E. Pierce ◽

S. Tansu Bagdatli ◽

Hani Choudhry ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Single Cells ◽

Chromatin Accessibility ◽

Cell Type ◽

Biological Meaning ◽

Cell Clustering ◽

Gene Regulatory ◽

Regulatory Landscapes ◽

Scalable Software

ABSTRACTThe advent of large-scale single-cell chromatin accessibility profiling has accelerated our ability to map gene regulatory landscapes, but has outpaced the development of robust, scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; www.ArchRProject.com) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses including doublet removal, single-cell clustering and cell type identification, robust peak set generation, cellular trajectory identification, DNA element to gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility, and multi-omic integration with scRNA-seq. Enabling the analysis of over 1.2 million single cells within 8 hours on a standard Unix laptop, ArchR is a comprehensive analytical suite for end-to-end analysis of single-cell chromatin accessibility data that will accelerate the understanding of gene regulation at the resolution of individual cells.

Download Full-text

Cobolt: integrative analysis of multimodal single-cell sequencing data

Genome Biology ◽

10.1186/s13059-021-02556-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Boying Gong ◽

Yun Zhou ◽

Elizabeth Purdom

Keyword(s):

Gene Expression ◽

Single Cell ◽

Chromatin Accessibility ◽

Integrative Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Multiple Datasets ◽

Novel Method ◽

Sequencing Platforms

AbstractA growing number of single-cell sequencing platforms enable joint profiling of multiple omics from the same cells. We present , a novel method that not only allows for analyzing the data from joint-modality platforms, but provides a coherent framework for the integration of multiple datasets measured on different modalities. We demonstrate its performance on multi-modality data of gene expression and chromatin accessibility and illustrate the integration abilities of by jointly analyzing this multi-modality data with single-cell RNA-seq and ATAC-seq datasets.

Download Full-text