scholarly journals Enhancement and Imputation of Peak Signal Enables Accurate Cell-Type Classification in scATAC-seq

2021 ◽  
Vol 12 ◽  
Author(s):  
Zhe Cui ◽  
Ya Cui ◽  
Yan Gao ◽  
Tao Jiang ◽  
Tianyi Zang ◽  
...  

Single-cell Assay Transposase Accessible Chromatin sequencing (scATAC-seq) has been widely used in profiling genome-wide chromatin accessibility in thousands of individual cells. However, compared with single-cell RNA-seq, the peaks of scATAC-seq are much sparser due to the lower copy numbers (diploid in humans) and the inherent missing signals, which makes it more challenging to classify cell type based on specific expressed gene or other canonical markers. Here, we present svmATAC, a support vector machine (SVM)-based method for accurately identifying cell types in scATAC-seq datasets by enhancing peak signal strength and imputing signals through patterns of co-accessibility. We applied svmATAC to several scATAC-seq data from human immune cells, human hematopoietic system cells, and peripheral blood mononuclear cells. The benchmark results showed that svmATAC is free of literature-based markers and robust across datasets in different libraries and platforms. The source code of svmATAC is available at https://github.com/mrcuizhe/svmATAC under the MIT license.

Author(s):  
Ting Luo ◽  
Fengping Zheng ◽  
Kang Wang ◽  
Yong Xu ◽  
Huixuan Xu ◽  
...  

Abstract Background Immune aberrations in end-stage renal disease (ESRD) are characterized by systemic inflammation and immune deficiency. The mechanistic understanding of this phenomenon remains limited. Methods We generated 12 981 and 9578 single-cell transcriptomes of peripheral blood mononuclear cells (PBMCs) that were pooled from 10 healthy volunteers and 10 patients with ESRD by single-cell RNA sequencing. Unsupervised clustering and annotation analyses were performed to cluster and identify cell types. The analysis of hallmark pathway and regulon activity was performed in the main cell types. Results We identified 14 leukocytic clusters that corresponded to six known PBMC types. The comparison of cells from ESRD patients and healthy individuals revealed multiple changes in biological processes. We noticed an ESRD-related increase in inflammation response, complement cascade and cellular metabolism, as well as a strong decrease in activity related to cell cycle progression in relevant cell types in ESRD. Furthermore, a list of cell type-specific candidate transcription factors (TFs) driving the ESRD-associated transcriptome changes was identified. Conclusions We generated a distinctive, high-resolution map of ESRD-derived PBMCs. These results revealed cell type-specific ESRD-associated pathways and TFs. Notably, the pooled sample analysis limits the generalization of our results. The generation of larger single-cell datasets will complement the current map and drive advances in therapies that manipulate immune cell function in ESRD.


2019 ◽  
Author(s):  
Florian Wagner

AbstractClustering of cells by cell type is arguably the most common and repetitive task encountered during the analysis of single-cell RNA-Seq data. However, as popular clustering methods operate largely independently of visualization techniques, the fine-tuning of clustering parameters can be unintuitive and time-consuming. Here, I propose Galapagos, a simple and effective clustering workflow based on t-SNE and DBSCAN that does not require a gene selection step. In practice, Galapagos only involves the fine-tuning of two parameters, which is straightforward, as clustering is performed directly on the t-SNE visualization results. Using peripheral blood mononuclear cells as a model tissue, I validate the effectiveness of Galapagos in different ways. First, I show that Galapagos generates clusters corresponding to all main cell types present. Then, I demonstrate that the t-SNE results are robust to parameter choices and initialization points. Next, I employ a simulation approach to show that clustering with Galapagos is accurate and robust to the high levels of technical noise present. Finally, to demonstrate Galapagos’ accuracy on real data, I compare clustering results to true cell type identities established using CITE-Seq data. In this context, I also provide an example of the primary limitation of Galapagos, namely the difficulty to resolve related cell types in cases where t-SNE fails to clearly separate the cells. Galapagos helps to make clustering scRNA-Seq data more intuitive and reproducible, and can be implemented in most programming languages with only a few lines of code.


2021 ◽  
Author(s):  
Yuriko Harigaya ◽  
Zhaojun Zhang ◽  
Hongpan Zhang ◽  
Chongzhi Zang ◽  
Nancy Zhang ◽  
...  

Abstract Epigenetic control of gene expression is highly cell-type- and context-specific. Yet, despite its complexity, gene regulatory logic can be broken down into modular components consisting of a transcription factor (TF) activating or repressing the expression of a target gene through its binding to a cis-regulatory region. Recent advances in joint profiling of transcription and chromatin accessibility with single-cell resolution offer unprecedented opportunities to interrogate such regulatory logic. Here, we propose a nonparametric approach, TRIPOD, to detect and characterize three-way relationships between a TF, its target gene, and the accessibility of the TF’s binding site, using single-cell RNA and ATAC multiomic data. We apply TRIPOD to interrogate cell-type-specific regulatory logic in peripheral blood mononuclear cells and contrast our results to detections from enhancer databases, cis-eQTL studies, ChIP-seq experiments, and TF knockdown/knockout studies. We then apply TRIPOD to mouse embryonic brain data during neurogenesis and gliogenesis and identified known and novel putative regulatory relationships, validated by ChIP-seq and PLAC-seq. Finally, we demonstrate TRIPOD on SHARE-seq data of differentiating mouse hair follicle cells and identify lineage-specific regulation supported by histone marks for gene activation and super-enhancer annotations.


2021 ◽  
Author(s):  
Yuriko Harigaya ◽  
Zhaojun Zhang ◽  
Hongpan Zhang ◽  
Chongzhi Zang ◽  
Nancy R Zhang ◽  
...  

Epigenetic control of gene expression is highly cell-type- and context-specific. Yet, despite its complexity, gene regulatory logic can be broken down into modular components consisting of a transcription factor (TF) activating or repressing the expression of a target gene through its binding to a cis-regulatory region. Recent advances in joint profiling of transcription and chromatin accessibility with single-cell resolution offer unprecedented opportunities to interrogate such regulatory logic. Here, we propose a nonparametric approach, TRIPOD, to detect and characterize three-way relationships between a TF, its target gene, and the accessibility of the TF's binding site, using single-cell RNA and ATAC multiomic data. We apply TRIPOD to interrogate cell-type-specific regulatory logic in peripheral blood mononuclear cells and contrast our results to detections from enhancer databases, cis-eQTL studies, ChIP-seq experiments, and TF knockdown/knockout studies. We then apply TRIPOD to mouse embryonic brain data during neurogenesis and gliogenesis and identified known and novel putative regulatory relationships, validated by ChIP-seq and PLAC-seq. Finally, we demonstrate TRIPOD on SHARE-seq data of differentiating mouse hair follicle cells and identify lineage-specific regulation supported by histone marks for gene activation and super-enhancer annotations.


2018 ◽  
Author(s):  
Xi Chen ◽  
Ricardo J Miragaia ◽  
Kedar Nath Natarajan ◽  
Sarah A Teichmann

AbstractThe assay for transposase-accessible chromatin using sequencing (ATAC-seq) is widely used to identify regulatory regions throughout the genome. However, very few studies have been performed at the single cell level (scATAC-seq) due to technical challenges. Here we developed a simple and robust plate-based scATAC-seq method, combining upfront bulk Tn5 tagging with single-nuclei sorting. We demonstrated that our method worked robustly across various systems, including fresh and cryopreserved cells from primary tissues. By profiling over 3,000 splenocytes, we identify distinct immune cell types and reveal cell type-specific regulatory regions and related transcription factors.


2020 ◽  
Vol 6 (51) ◽  
pp. eaba9031
Author(s):  
Laiyi Fu ◽  
Lihua Zhang ◽  
Emmanuel Dollinger ◽  
Qinke Peng ◽  
Qing Nie ◽  
...  

Characterizing genome-wide binding profiles of transcription factors (TFs) is essential for understanding biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining them at a single-cell level remains elusive. Here, we report scFAN (single-cell factor analysis network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pretrained on genome-wide bulk assay for transposase-accessible chromatin sequencing (ATAC-seq), DNA sequence, and chromatin immunoprecipitation sequencing (ChIP-seq) data and uses single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by both studying sequence motifs enriched within predicted binding peaks and using predicted TFs for discovering cell types. We develop a new metric “TF activity score” to characterize each cell and show that activity scores can reliably capture cell identities. scFAN allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.


2021 ◽  
Author(s):  
Cantong Zhang ◽  
Xiaoping Hong ◽  
Haiyan Yu ◽  
Hongwei Wu ◽  
Huixuan Xu ◽  
...  

Abstract Rheumatoid arthritis is a chronic autoinflammatory disease with an elusive etiology. Assays for transposase-accessible chromatin with single-cell sequencing (scATAC-seq) contribute to the progress in epigenetic studies. However, the impact of epigenetic technology on autoimmune diseases has not been objectively analyzed. Therefore, scATAC-seq was performed to generate a high-resolution map of accessible loci in peripheral blood mononuclear cells (PBMCs) of RA patients at the single-cell level. The purpose of our project was to discover the transcription factors (TFs) that were involved in the pathogenesis of RA at single-cell resolution. In our research, we obtained 22 accessible chromatin patterns. Then, 10 key TFs were involved in the RA pathogenesis by regulating the activity of MAP kinase. Consequently, two genes (PTPRC, SPAG9) regulated by 10 key TFs were found that may be associated with RA disease pathogenesis and these TFs were obviously enriched in RA patients (p<0.05, FC>1.2). With further qPCR validation on PTPRC and SPAG9 in monocytes, we found differential expression of these two genes, which were regulated by eight TFs (ZNF384, HNF1B, DMRTA2, MEF2A, NFE2L1, CREB3L4 (var. 2), FOSL2::JUNB (var. 2), MEF2B). What is more, the eight TFs showed highly accessible binding sites in RA patients. These findings demonstrate the value of using scATAC-seq to reveal transcriptional regulatory variation in RA-derived PBMCs, providing insights on therapy from an epigenetic perspective.


2021 ◽  
Author(s):  
Zhibin Li ◽  
chengcheng Sun ◽  
Fei Wang ◽  
Xiran Wang ◽  
Jiacheng Zhu ◽  
...  

Background: Immune cells play important roles in mediating immune response and host defense against invading pathogens. However, insights into the molecular mechanisms governing circulating immune cell diversity among multiple species are limited. Methods: In this study, we compared the single-cell transcriptomes of 77 957 immune cells from 12 species using single-cell RNA-sequencing (scRNA-seq). Distinct molecular profiles were characterized for different immune cell types, including T cells, B cells, natural killer cells, monocytes, and dendritic cells. Results: The results revealed the heterogeneity and compositions of circulating immune cells among 12 different species. Additionally, we explored the conserved and divergent cellular cross-talks and genetic regulatory networks among vertebrate immune cells. Notably, the ligand and receptor pair VIM-CD44 was highly conserved among the immune cells. Conclusions: This study is the first to provide a comprehensive analysis of the cross-species single-cell atlas for peripheral blood mononuclear cells (PBMCs). This research should advance our understanding of the cellular taxonomy and fundamental functions of PBMCs, with important implications in evolutionary biology, developmental biology, and immune system disorders


2020 ◽  
Vol 4 (Supplement_1) ◽  
Author(s):  
Frederique Murielle Ruf-Zamojski ◽  
Michel A Zamojski ◽  
German Nudelman ◽  
Yongchao Ge ◽  
Natalia Mendelev ◽  
...  

Abstract The pituitary gland is a critical regulator of the neuroendocrine system. To further our understanding of the classification, cellular heterogeneity, and regulatory landscape of pituitary cell types, we performed and computationally integrated single cell (SC)/single nucleus (SN) resolution experiments capturing RNA expression, chromatin accessibility, and DNA methylation state from mouse dissociated whole pituitaries. Both SC and SN transcriptome analysis and promoter accessibility identified the five classical hormone-producing cell types (somatotropes, gonadotropes (GT), lactotropes, thyrotropes, and corticotropes). GT cells distinctively expressed transcripts for Cga, Fshb, Lhb, Nr5a1, and Gnrhr in SC RNA-seq and SN RNA-seq. This was matched in SN ATAC-seq with GTs specifically showing open chromatin at the promoter regions for the same genes. Similarly, the other classically defined anterior pituitary cells displayed transcript expression and chromatin accessibility patterns characteristic of their own cell type. This integrated analysis identified additional cell-types, such as a stem cell cluster expressing transcripts for Sox2, Sox9, Mia, and Rbpms, and a broadly accessible chromatin state. In addition, we performed bulk ATAC-seq in the LβT2b gonadotrope-like cell line. While the FSHB promoter region was closed in the cell line, we identified a region upstream of Fshb that became accessible by the synergistic actions of GnRH and activin A, and that corresponded to a conserved region identified by a polycystic ovary syndrome (PCOS) single nucleotide polymorphism (SNP). Although this locus appears closed in deep sequencing bulk ATAC-seq of dissociated mouse pituitary cells, SN ATAC-seq of the same preparation showed that this site was specifically open in mouse GT, but closed in 14 other pituitary cell type clusters. This discrepancy highlighted the detection limit of a bulk ATAC-seq experiment in a subpopulation, as GT represented ~5% of this dissociated anterior pituitary sample. These results identified this locus as a candidate for explaining the dual dependence of Fshb expression on GnRH and activin/TGFβ signaling, and potential new evidence for upstream regulation of Fshb. The pituitary epigenetic landscape provides a resource for improved cell type identification and for the investigation of the regulatory mechanisms driving cell-to-cell heterogeneity. Additional authors not listed due to abstract submission restrictions: N. Seenarine, M. Amper, N. Jain (ISMMS).


2021 ◽  
Author(s):  
Risa Karakida Kawaguchi ◽  
Ziqi Tang ◽  
Stephan Fischer ◽  
Rohit Tripathy ◽  
Peter K. Koo ◽  
...  

Background: Single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) measures genome-wide chromatin accessibility for the discovery of cell-type specific regulatory networks. ScATAC-seq combined with single-cell RNA sequencing (scRNA-seq) offers important avenues for ongoing research, such as novel cell-type specific activation of enhancer and transcription factor binding sites as well as chromatin changes specific to cell states. On the other hand, scATAC-seq data is known to be challenging to interpret due to its high number of zeros as well as the heterogeneity derived from different protocols. Because of the stochastic lack of marker gene activities, cell type identification by scATAC-seq remains difficult even at a cluster level. Results: In this study, we exploit reference knowledge obtained from external scATAC-seq or scRNA-seq datasets to define existing cell types and uncover the genomic regions which drive cell-type specific gene regulation. To investigate the robustness of existing cell-typing methods, we collected 7 scATAC-seq datasets targeting mouse brain for a meta-analytic comparison of neuronal cell-type annotation, including a reference atlas generated by the BRAIN Initiative Cell Census Network (BICCN). By comparing the area under the receiver operating characteristics curves (AUROCs) for the three major cell types (inhibitory, excitatory, and non-neuronal cells), cell-typing performance by single markers is found to be highly variable even for known marker genes due to study-specific biases. However, the signal aggregation of a large and redundant marker gene set, optimized via multiple scRNA-seq data, achieves the highest cell-typing performances among 5 existing marker gene sets, from the individual cell to cluster level. That gene set also shows a high consistency with the cluster-specific genes from inhibitory subtypes in two well-annotated datasets, suggesting applicability to rare cell types. Next, we demonstrate a comprehensive assessment of scATAC-seq cell typing using exhaustive combinations of the marker gene sets with supervised learning methods including machine learning classifiers and joint clustering methods. Our results show that the combinations using robust marker gene sets systematically ranked at the top, not only with model based prediction using a large reference data but also with a simple summation of expression strengths across markers. To demonstrate the utility of this robust cell typing approach, we trained a deep neural network to predict chromatin accessibility in each subtype using only DNA sequence. Through model interpretation methods, we identify key motifs enriched about robust gene sets for each neuronal subtype. Conclusions: Through the meta-analytic evaluation of scATAC-seq cell-typing methods, we develop a novel method set to exploit the BICCN reference atlas. Our study strongly supports the value of robust marker gene selection as a feature selection tool and cross-dataset comparison between scATAC-seq datasets to improve alignment of scATAC-seq to known biology. With this novel, high quality epigenetic data, genomic analysis of regulatory regions can reveal sequence motifs that drive cell type-specific regulatory programs.


Sign in / Sign up

Export Citation Format

Share Document