scholarly journals BROCKMAN: Deciphering variance in epigenomic regulators byk-mer factorization

2017 ◽  
Author(s):  
Carl G. de Boer ◽  
Aviv Regev

AbstractBackgroundVariation in chromatin organization across single cells can help shed important light on the mechanisms controlling gene expression, but scale, noise, and sparsity pose significant challenges for interpretation of single cell chromatin data. Here, we develop BROCKMAN (Brockman Representation Of Chromatin byK-mers in Mark-Associated Nucleotides), an approach to infer variation in transcription factor (TF) activity across samples through unsupervised analysis of the variation in DNA sequences associated with an epigenomic mark.ResultsBROCKMAN represents each sample as a vector of epigenomic-mark-associated DNA word frequencies, and decomposes the resulting matrix to find hidden structure in the data, followed by unsupervised grouping of samples and identification of the TFs that distinguish groups. Applied to single cell ATAC-seq, BROCKMAN readily distinguished cell types, treatments, batch effects, experimental artifacts, and cycling cells. We show that each variable component in thek-mer landscape reflects a set of co-varying TFs, which are often known to physically interact. For example, in K562 cells, AP-1 TFs were central determinant of variability in chromatin accessibility through their variable expression levels and diverse interactions with other TFs. We provide a theoretical basis for why cooperative TF binding – and any associated epigenomic mark – is inherently more variable than non-cooperative binding.ConclusionsBROCKMAN and related approaches will help gain a mechanistic understanding of thetransdeterminants of chromatin variability between cells, treatments, and individuals.

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Elliott Swanson ◽  
Cara Lord ◽  
Julian Reading ◽  
Alexander T Heubeck ◽  
Palak C Genge ◽  
...  

Single-cell measurements of cellular characteristics have been instrumental in understanding the heterogeneous pathways that drive differentiation, cellular responses to signals, and human disease. Recent advances have allowed paired capture of protein abundance and transcriptomic state, but a lack of epigenetic information in these assays has left a missing link to gene regulation. Using the heterogeneous mixture of cells in human peripheral blood as a test case, we developed a novel scATAC-seq workflow that increases signal-to-noise and allows paired measurement of cell surface markers and chromatin accessibility: integrated cellular indexing of chromatin landscape and epitopes, called ICICLE-seq. We extended this approach using a droplet-based multiomics platform to develop a trimodal assay that simultaneously measures transcriptomics (scRNA-seq), epitopes, and chromatin accessibility (scATAC-seq) from thousands of single cells, which we term TEA-seq. Together, these multimodal single-cell assays provide a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.


Author(s):  
Elliott Swanson ◽  
Cara Lord ◽  
Julian Reading ◽  
Alexander T. Heubeck ◽  
Adam K. Savage ◽  
...  

AbstractSingle-cell measurements of cellular characteristics have been instrumental in understanding the heterogeneous pathways that drive differentiation, cellular responses to extracellular signals, and human disease states. scATAC-seq has been particularly challenging due to the large size of the human genome and processing artefacts resulting from DNA damage that are an inherent source of background signal. Downstream analysis and integration of scATAC-seq with other single-cell assays is complicated by the lack of clear phenotypic information linking chromatin state and cell type. Using the heterogeneous mixture of cells in human peripheral blood as a test case, we developed a novel scATAC-seq workflow that increases the signal-to-noise ratio and allows simultaneous measurement of cell surface markers: Integrated Cellular Indexing of Chromatin Landscape and Epitopes (ICICLE-seq). We extended this approach using a droplet-based multiomics platform to develop a trimodal assay to simultaneously measure Transcriptomic state (scRNA-seq), cell surface Epitopes, and chromatin Accessibility (scATAC-seq) from thousands of single cells, which we term TEA-seq. Together, these multimodal single-cell assays provide a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.


Author(s):  
Chongyuan Luo ◽  
Hanqing Liu ◽  
Fangming Xie ◽  
Ethan J. Armand ◽  
Kimberly Siletti ◽  
...  

ABSTRACTSingle-cell technologies enable measure of unique cellular signatures, but are typically limited to a single modality. Computational approaches allow integration of diverse single-cell datasets, but their efficacy is difficult to validate in the absence of authentic multi-omic measurements. To comprehensively assess the molecular phenotypes of single cells in tissues, we devised single-nucleus methylCytosine, Chromatin accessibility and Transcriptome sequencing (snmC2T-seq) and applied it to post-mortem human frontal cortex tissue. We developed a computational framework to validate fine-grained cell types using multi-modal information and assessed the effectiveness of computational integration methods. Correlation analysis in individual cells revealed distinct relations between methylation and gene expression. Our integrative approach enabled joint analyses of the methylome, transcriptome, chromatin accessibility and conformation for 63 human cortical cell types. We reconstructed regulatory lineages for cortical cell populations and found specific enrichment of genetic risk for neuropsychiatric traits, enabling prediction of cell types with causal roles in disease.


2020 ◽  
Author(s):  
Ying Lei ◽  
Mengnan Cheng ◽  
Zihao Li ◽  
Zhenkun Zhuang ◽  
Liang Wu ◽  
...  

Non-human primates (NHP) provide a unique opportunity to study human neurological diseases, yet detailed characterization of the cell types and transcriptional regulatory features in the NHP brain is lacking. We applied a combinatorial indexing assay, sci-ATAC-seq, as well as single-nuclei RNA-seq, to profile chromatin accessibility in 43,793 single cells and transcriptomics in 11,477 cells, respectively, from prefrontal cortex, primary motor cortex and the primary visual cortex of adult cynomolgus monkey Macaca fascularis. Integrative analysis of these two datasets, resolved regulatory elements and transcription factors that specify cell type distinctions, and discovered area-specific diversity in chromatin accessibility and gene expression within excitatory neurons. We also constructed the dynamic landscape of chromatin accessibility and gene expression of oligodendrocyte maturation to characterize adult remyelination. Furthermore, we identified cell type-specific enrichment of differentially spliced gene isoforms and disease-associated single nucleotide polymorphisms. Our datasets permit integrative exploration of complex regulatory dynamics in macaque brain tissue at single-cell resolution.


2019 ◽  
Author(s):  
Song Chen ◽  
Blue B Lake ◽  
Kun Zhang

Linked profiling of transcriptome and chromatin accessibility from single cells can provide unprecedented insights into cellular status. Here we developed a droplet-based Single-Nucleus chromatin Accessibility and mRNA Expression sequencing (SNARE-seq) assay, that we used to profile neonatal and adult mouse cerebral cortices. To demonstrate the strength of single-cell dual-omics profiling, we reconstructed transcriptome and epigenetic landscapes of cell types, uncovered lineage-specific accessible sites, and connected dynamics of promoter accessibility with transcription during neurogenesis.


2021 ◽  
Author(s):  
Elisabeth Rebboah ◽  
Fairlie Reese ◽  
Katherine Williams ◽  
Gabriela Balderrama-Gutierrez ◽  
Cassandra McGill ◽  
...  

AbstractAlternative RNA isoforms are defined by promoter choice, alternative splicing, and polyA site selection. Although differential isoform expression is known to play a large regulatory role in eukaryotes, it has proved challenging to study with standard short-read RNA-seq because of the uncertainties it leaves about the full-length structure and precise termini of transcripts. The rise in throughput and quality of long-read sequencing now makes it possible, in principle, to unambiguously identify most transcript isoforms from beginning to end. However, its application to single-cell RNA-seq has been limited by throughput and expense. Here, we develop and characterize long-read Split-seq (LR-Split-seq), which uses a combinatorial barcoding-based method for sequencing single cells and nuclei with long reads. We show that LR-Split-seq can associate isoforms with cell types with relative economy and design flexibility. We characterize LR-Split-seq for whole cells and nuclei by using the well-studied mouse C2C12 system in which mononucleated myoblast cells differentiate and fuse into multinucleated myotubes. We show that the overall results are reproducible when comparing long- and short-read data from the same cell or nucleus. We find substantial evidence of differential isoform expression during differentiation including alternative transcription start site (TSS) usage. We integrate the resulting isoform expression dynamics with snATAC-seq chromatin accessibility to validate TSS-driven isoform choices. LR-Split-seq provides an affordable method for identifying cluster-specific isoforms in single cells that can be further quantified with companion deep short-read scRNA-seq from the same cell populations.


2020 ◽  
Author(s):  
Steven J. Wu ◽  
Scott N. Furlan ◽  
Anca B. Mihalas ◽  
Hatice S. Kaya-Okur ◽  
Abdullah H. Feroze ◽  
...  

Single-cell analysis has become a powerful approach for the molecular characterization of complex tissues. Methods for quantifying gene expression1 and chromatin accessibility2 of single cells are now well-established, but analysis of chromatin regions with specific histone modifications has been technically challenging. Here, we adapt the recently published CUT&Tag method3 to scalable single-cell platforms to profile chromatin landscapes in single cells (scCUT&Tag) from complex tissues. We focus on profiling Polycomb Group (PcG) silenced regions marked by H3K27 trimethylation (H3K27me3) in single cells as an orthogonal approach to chromatin accessibility for identifying cell states. We show that scCUT&Tag profiling of H3K27me3 distinguishes cell types in human blood and allows the generation of cell-type-specific PcG landscapes from heterogeneous tissues. Furthermore, we use scCUT&Tag to profile H3K27me3 in a brain tumor patient before and after treatment, identifying cell types in the tumor microenvironment and heterogeneity in PcG activity in the primary sample and after treatment.


2020 ◽  
Vol 6 (51) ◽  
pp. eaba9031
Author(s):  
Laiyi Fu ◽  
Lihua Zhang ◽  
Emmanuel Dollinger ◽  
Qinke Peng ◽  
Qing Nie ◽  
...  

Characterizing genome-wide binding profiles of transcription factors (TFs) is essential for understanding biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining them at a single-cell level remains elusive. Here, we report scFAN (single-cell factor analysis network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pretrained on genome-wide bulk assay for transposase-accessible chromatin sequencing (ATAC-seq), DNA sequence, and chromatin immunoprecipitation sequencing (ChIP-seq) data and uses single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by both studying sequence motifs enriched within predicted binding peaks and using predicted TFs for discovering cell types. We develop a new metric “TF activity score” to characterize each cell and show that activity scores can reliably capture cell identities. scFAN allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.


2020 ◽  
Author(s):  
Laiyi Fu ◽  
Lihua Zhang ◽  
Emmanuel Dollinger ◽  
Qinke Peng ◽  
Qing Nie ◽  
...  

AbstractCharacterizing genome-wide binding profiles of transcription factor (TF) is essential for understanding many biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining binding profiles at a single cell level remains elusive. Here we report scFAN (Single Cell Factor Analysis Network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pre-trained on genome-wide bulk ATAC-seq, DNA sequence and ChIP-seq data, and utilizes single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by studying sequence motifs enriched within predicted binding peaks and investigating the effectiveness of predicted TF peaks for discovering cell types. We develop a new metric “TF activity score” to characterize each cell, and show that the activity scores can reliably capture cell identities. The method allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.


2021 ◽  
Author(s):  
Ziqi Zhang ◽  
Chengkai Yang ◽  
Xiuwei Zhang

Single cell multi-omics studies allow researchers to understand cell differentiation and development mechanisms in a more comprehensive manner. Single cell ATAC-sequencing (scATAC-seq) measures the chromatin accessibility of cells, and computational methods have been proposed to integrate scATAC-seq with scRNA-seq data of cells from the same cell types. This computational task is particularly challenging when the two modalities are not profiled from the same cells. Some existing methods first transform the scATAC-seq data into scRNA-seq data and integrate two scRNA-seq datasets, but how to perform the transformation is still a difficult problem. In addition, most of the existing methods to integrate scRNA-seq and scATAC-seq data focus on preserving distinct cell clusters before and after the integration, and it is not clear whether these methods can preserve the continuous trajectories for cells from continuous development or differentiation processes. We propose scDART, a scalable deep learning framework that embeds the two data modalities of single cells, scRNA-seq and scATAC-seq data, into a shared low-dimensional latent space while preserving cell trajectory structures. Furthermore, scDART learns a nonlinear function represented by a neural network encoding the cross-modality relationship simultaneously when learning the latent space representations of the integrated dataset. We test scDART on both real and simulated datasets, and compare it with the state-of-the-art methods. We show that scDART is able to integrate scRNA-seq and scATAC-seq data well while preserving the continuous cell trajectories. scDART also predicts scRNA-seq data accurately from the scATAC-seq data using the neural network module that represents cross-modality relationships.


Sign in / Sign up

Export Citation Format

Share Document