scholarly journals AtacWorks: A deep convolutional neural network toolkit for epigenomics

2019 ◽  
Author(s):  
Avantika Lal ◽  
Zachary D. Chiang ◽  
Nikolai Yakovenko ◽  
Fabiana M. Duarte ◽  
Johnny Israeli ◽  
...  

AbstractWe introduce AtacWorks (https://github.com/clara-genomics/AtacWorks), a method to denoise and identify accessible chromatin regions from low-coverage or low-quality ATAC-seq data. AtacWorks uses a deep neural network to learn a mapping between noisy ATAC-seq data and corresponding higher-coverage or higher-quality data. To demonstrate the utility of AtacWorks, we train a model on data from four human blood cell types and show that this model accurately denoises chromatin accessibility at base-pair resolution and identifies peaks from low-coverage bulk sequencing of unseen cell types and experimental conditions. We use the same framework to obtain high-quality results from as few as 50 aggregate single-cell ATAC-seq profiles, and also from data with a low signal-to-noise ratio. We further show that AtacWorks can be adapted for cross-modality prediction of transcription factor footprints and ChIP-seq peaks from low input ATAC-seq. Finally, we demonstrate the applications of our approach to single-cell genomics by using AtacWorks to identify regulatory regions that are differentially-accessible between rare lineage-primed subpopulations of hematopoietic stem cells.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Avantika Lal ◽  
Zachary D. Chiang ◽  
Nikolai Yakovenko ◽  
Fabiana M. Duarte ◽  
Johnny Israeli ◽  
...  

AbstractATAC-seq is a widely-applied assay used to measure genome-wide chromatin accessibility; however, its ability to detect active regulatory regions can depend on the depth of sequencing coverage and the signal-to-noise ratio. Here we introduce AtacWorks, a deep learning toolkit to denoise sequencing coverage and identify regulatory peaks at base-pair resolution from low cell count, low-coverage, or low-quality ATAC-seq data. Models trained by AtacWorks can detect peaks from cell types not seen in the training data, and are generalizable across diverse sample preparations and experimental platforms. We demonstrate that AtacWorks enhances the sensitivity of single-cell experiments by producing results on par with those of conventional methods using ~10 times as many cells, and further show that this framework can be adapted to enable cross-modality inference of protein-DNA interactions. Finally, we establish that AtacWorks can enable new biological discoveries by identifying active regulatory regions associated with lineage priming in rare subpopulations of hematopoietic stem cells.


Author(s):  
Elliott Swanson ◽  
Cara Lord ◽  
Julian Reading ◽  
Alexander T. Heubeck ◽  
Adam K. Savage ◽  
...  

AbstractSingle-cell measurements of cellular characteristics have been instrumental in understanding the heterogeneous pathways that drive differentiation, cellular responses to extracellular signals, and human disease states. scATAC-seq has been particularly challenging due to the large size of the human genome and processing artefacts resulting from DNA damage that are an inherent source of background signal. Downstream analysis and integration of scATAC-seq with other single-cell assays is complicated by the lack of clear phenotypic information linking chromatin state and cell type. Using the heterogeneous mixture of cells in human peripheral blood as a test case, we developed a novel scATAC-seq workflow that increases the signal-to-noise ratio and allows simultaneous measurement of cell surface markers: Integrated Cellular Indexing of Chromatin Landscape and Epitopes (ICICLE-seq). We extended this approach using a droplet-based multiomics platform to develop a trimodal assay to simultaneously measure Transcriptomic state (scRNA-seq), cell surface Epitopes, and chromatin Accessibility (scATAC-seq) from thousands of single cells, which we term TEA-seq. Together, these multimodal single-cell assays provide a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.


2021 ◽  
Author(s):  
Ziqi Zhang ◽  
Chengkai Yang ◽  
Xiuwei Zhang

Single cell multi-omics studies allow researchers to understand cell differentiation and development mechanisms in a more comprehensive manner. Single cell ATAC-sequencing (scATAC-seq) measures the chromatin accessibility of cells, and computational methods have been proposed to integrate scATAC-seq with scRNA-seq data of cells from the same cell types. This computational task is particularly challenging when the two modalities are not profiled from the same cells. Some existing methods first transform the scATAC-seq data into scRNA-seq data and integrate two scRNA-seq datasets, but how to perform the transformation is still a difficult problem. In addition, most of the existing methods to integrate scRNA-seq and scATAC-seq data focus on preserving distinct cell clusters before and after the integration, and it is not clear whether these methods can preserve the continuous trajectories for cells from continuous development or differentiation processes. We propose scDART, a scalable deep learning framework that embeds the two data modalities of single cells, scRNA-seq and scATAC-seq data, into a shared low-dimensional latent space while preserving cell trajectory structures. Furthermore, scDART learns a nonlinear function represented by a neural network encoding the cross-modality relationship simultaneously when learning the latent space representations of the integrated dataset. We test scDART on both real and simulated datasets, and compare it with the state-of-the-art methods. We show that scDART is able to integrate scRNA-seq and scATAC-seq data well while preserving the continuous cell trajectories. scDART also predicts scRNA-seq data accurately from the scATAC-seq data using the neural network module that represents cross-modality relationships.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Kip D. Zimmerman ◽  
Mark A. Espeland ◽  
Carl D. Langefeld

AbstractCells from the same individual share common genetic and environmental backgrounds and are not statistically independent; therefore, they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within-sample correlation. Here, we document this dependence across a range of cell types and show that pseudo-bulk aggregation methods are conservative and underpowered relative to mixed models. To compute differential expression within a specific cell type across treatment groups, we propose applying generalized linear mixed models with a random effect for individual, to properly account for both zero inflation and the correlation structure among measures from cells within an individual. Finally, we provide power estimates across a range of experimental conditions to assist researchers in designing appropriately powered studies.


2020 ◽  
Author(s):  
Róbert Pálovics ◽  
Andreas Keller ◽  
Nicholas Schaum ◽  
Weilun Tan ◽  
Tobias Fehlmann ◽  
...  

Slowing or reversing biological ageing would have major implications for mitigating disease risk and maintaining vitality. While an increasing number of interventions show promise for rejuvenation, the effectiveness on disparate cell types across the body and the molecular pathways susceptible to rejuvenation remain largely unexplored. We performed single-cell RNA-sequencing on 13 organs to reveal cell type specific responses to young or aged blood in heterochronic parabiosis. Adipose mesenchymal stromal cells, hematopoietic stem cells, hepatocytes, and endothelial cells from multiple tissues appear especially responsive. On the pathway level, young blood invokes novel gene sets in addition to reversing established ageing patterns, with the global rescue of genes encoding electron transport chain subunits pinpointing a prominent role of mitochondrial function in parabiosis-mediated rejuvenation. Intriguingly, we observed an almost universal loss of gene expression with age that is largely mimicked by parabiosis: aged blood reduces global gene expression, and young blood restores it. Altogether, these data lay the groundwork for a systemic understanding of the interplay between blood-borne factors and cellular integrity.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Elliott Swanson ◽  
Cara Lord ◽  
Julian Reading ◽  
Alexander T Heubeck ◽  
Palak C Genge ◽  
...  

Single-cell measurements of cellular characteristics have been instrumental in understanding the heterogeneous pathways that drive differentiation, cellular responses to signals, and human disease. Recent advances have allowed paired capture of protein abundance and transcriptomic state, but a lack of epigenetic information in these assays has left a missing link to gene regulation. Using the heterogeneous mixture of cells in human peripheral blood as a test case, we developed a novel scATAC-seq workflow that increases signal-to-noise and allows paired measurement of cell surface markers and chromatin accessibility: integrated cellular indexing of chromatin landscape and epitopes, called ICICLE-seq. We extended this approach using a droplet-based multiomics platform to develop a trimodal assay that simultaneously measures transcriptomics (scRNA-seq), epitopes, and chromatin accessibility (scATAC-seq) from thousands of single cells, which we term TEA-seq. Together, these multimodal single-cell assays provide a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.


2020 ◽  
Vol 4 (Supplement_1) ◽  
Author(s):  
Frederique Murielle Ruf-Zamojski ◽  
Michel A Zamojski ◽  
German Nudelman ◽  
Yongchao Ge ◽  
Natalia Mendelev ◽  
...  

Abstract The pituitary gland is a critical regulator of the neuroendocrine system. To further our understanding of the classification, cellular heterogeneity, and regulatory landscape of pituitary cell types, we performed and computationally integrated single cell (SC)/single nucleus (SN) resolution experiments capturing RNA expression, chromatin accessibility, and DNA methylation state from mouse dissociated whole pituitaries. Both SC and SN transcriptome analysis and promoter accessibility identified the five classical hormone-producing cell types (somatotropes, gonadotropes (GT), lactotropes, thyrotropes, and corticotropes). GT cells distinctively expressed transcripts for Cga, Fshb, Lhb, Nr5a1, and Gnrhr in SC RNA-seq and SN RNA-seq. This was matched in SN ATAC-seq with GTs specifically showing open chromatin at the promoter regions for the same genes. Similarly, the other classically defined anterior pituitary cells displayed transcript expression and chromatin accessibility patterns characteristic of their own cell type. This integrated analysis identified additional cell-types, such as a stem cell cluster expressing transcripts for Sox2, Sox9, Mia, and Rbpms, and a broadly accessible chromatin state. In addition, we performed bulk ATAC-seq in the LβT2b gonadotrope-like cell line. While the FSHB promoter region was closed in the cell line, we identified a region upstream of Fshb that became accessible by the synergistic actions of GnRH and activin A, and that corresponded to a conserved region identified by a polycystic ovary syndrome (PCOS) single nucleotide polymorphism (SNP). Although this locus appears closed in deep sequencing bulk ATAC-seq of dissociated mouse pituitary cells, SN ATAC-seq of the same preparation showed that this site was specifically open in mouse GT, but closed in 14 other pituitary cell type clusters. This discrepancy highlighted the detection limit of a bulk ATAC-seq experiment in a subpopulation, as GT represented ~5% of this dissociated anterior pituitary sample. These results identified this locus as a candidate for explaining the dual dependence of Fshb expression on GnRH and activin/TGFβ signaling, and potential new evidence for upstream regulation of Fshb. The pituitary epigenetic landscape provides a resource for improved cell type identification and for the investigation of the regulatory mechanisms driving cell-to-cell heterogeneity. Additional authors not listed due to abstract submission restrictions: N. Seenarine, M. Amper, N. Jain (ISMMS).


Blood ◽  
2020 ◽  
Vol 136 (7) ◽  
pp. 845-856 ◽  
Author(s):  
Qin Zhu ◽  
Peng Gao ◽  
Joanna Tober ◽  
Laura Bennett ◽  
Changya Chen ◽  
...  

Abstract Hematopoietic stem and progenitor cells (HSPCs) in the bone marrow are derived from a small population of hemogenic endothelial (HE) cells located in the major arteries of the mammalian embryo. HE cells undergo an endothelial to hematopoietic cell transition, giving rise to HSPCs that accumulate in intra-arterial clusters (IAC) before colonizing the fetal liver. To examine the cell and molecular transitions between endothelial (E), HE, and IAC cells, and the heterogeneity of HSPCs within IACs, we profiled ∼40 000 cells from the caudal arteries (dorsal aorta, umbilical, vitelline) of 9.5 days post coitus (dpc) to 11.5 dpc mouse embryos by single-cell RNA sequencing and single-cell assay for transposase-accessible chromatin sequencing. We identified a continuous developmental trajectory from E to HE to IAC cells, with identifiable intermediate stages. The intermediate stage most proximal to HE, which we term pre-HE, is characterized by increased accessibility of chromatin enriched for SOX, FOX, GATA, and SMAD motifs. A developmental bottleneck separates pre-HE from HE, with RUNX1 dosage regulating the efficiency of the pre-HE to HE transition. A distal candidate Runx1 enhancer exhibits high chromatin accessibility specifically in pre-HE cells at the bottleneck, but loses accessibility thereafter. Distinct developmental trajectories within IAC cells result in 2 populations of CD45+ HSPCs; an initial wave of lymphomyeloid-biased progenitors, followed by precursors of hematopoietic stem cells (pre-HSCs). This multiomics single-cell atlas significantly expands our understanding of pre-HSC ontogeny.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zhe Cui ◽  
Ya Cui ◽  
Yan Gao ◽  
Tao Jiang ◽  
Tianyi Zang ◽  
...  

Single-cell Assay Transposase Accessible Chromatin sequencing (scATAC-seq) has been widely used in profiling genome-wide chromatin accessibility in thousands of individual cells. However, compared with single-cell RNA-seq, the peaks of scATAC-seq are much sparser due to the lower copy numbers (diploid in humans) and the inherent missing signals, which makes it more challenging to classify cell type based on specific expressed gene or other canonical markers. Here, we present svmATAC, a support vector machine (SVM)-based method for accurately identifying cell types in scATAC-seq datasets by enhancing peak signal strength and imputing signals through patterns of co-accessibility. We applied svmATAC to several scATAC-seq data from human immune cells, human hematopoietic system cells, and peripheral blood mononuclear cells. The benchmark results showed that svmATAC is free of literature-based markers and robust across datasets in different libraries and platforms. The source code of svmATAC is available at https://github.com/mrcuizhe/svmATAC under the MIT license.


2021 ◽  
Author(s):  
Risa Karakida Kawaguchi ◽  
Ziqi Tang ◽  
Stephan Fischer ◽  
Rohit Tripathy ◽  
Peter K. Koo ◽  
...  

Background: Single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) measures genome-wide chromatin accessibility for the discovery of cell-type specific regulatory networks. ScATAC-seq combined with single-cell RNA sequencing (scRNA-seq) offers important avenues for ongoing research, such as novel cell-type specific activation of enhancer and transcription factor binding sites as well as chromatin changes specific to cell states. On the other hand, scATAC-seq data is known to be challenging to interpret due to its high number of zeros as well as the heterogeneity derived from different protocols. Because of the stochastic lack of marker gene activities, cell type identification by scATAC-seq remains difficult even at a cluster level. Results: In this study, we exploit reference knowledge obtained from external scATAC-seq or scRNA-seq datasets to define existing cell types and uncover the genomic regions which drive cell-type specific gene regulation. To investigate the robustness of existing cell-typing methods, we collected 7 scATAC-seq datasets targeting mouse brain for a meta-analytic comparison of neuronal cell-type annotation, including a reference atlas generated by the BRAIN Initiative Cell Census Network (BICCN). By comparing the area under the receiver operating characteristics curves (AUROCs) for the three major cell types (inhibitory, excitatory, and non-neuronal cells), cell-typing performance by single markers is found to be highly variable even for known marker genes due to study-specific biases. However, the signal aggregation of a large and redundant marker gene set, optimized via multiple scRNA-seq data, achieves the highest cell-typing performances among 5 existing marker gene sets, from the individual cell to cluster level. That gene set also shows a high consistency with the cluster-specific genes from inhibitory subtypes in two well-annotated datasets, suggesting applicability to rare cell types. Next, we demonstrate a comprehensive assessment of scATAC-seq cell typing using exhaustive combinations of the marker gene sets with supervised learning methods including machine learning classifiers and joint clustering methods. Our results show that the combinations using robust marker gene sets systematically ranked at the top, not only with model based prediction using a large reference data but also with a simple summation of expression strengths across markers. To demonstrate the utility of this robust cell typing approach, we trained a deep neural network to predict chromatin accessibility in each subtype using only DNA sequence. Through model interpretation methods, we identify key motifs enriched about robust gene sets for each neuronal subtype. Conclusions: Through the meta-analytic evaluation of scATAC-seq cell-typing methods, we develop a novel method set to exploit the BICCN reference atlas. Our study strongly supports the value of robust marker gene selection as a feature selection tool and cross-dataset comparison between scATAC-seq datasets to improve alignment of scATAC-seq to known biology. With this novel, high quality epigenetic data, genomic analysis of regulatory regions can reveal sequence motifs that drive cell type-specific regulatory programs.


Sign in / Sign up

Export Citation Format

Share Document