scholarly journals ChromVAR: Inferring transcription factor variation from single-cell epigenomic data

2017 ◽  
Author(s):  
Alicia N. Schep ◽  
Beijing Wu ◽  
Jason D. Buenrostro ◽  
William J. Greenleaf

AbstractSingle cell ATAC-seq (scATAC) yields sparse data that makes application of conventional computational approaches for data analysis challenging or impossible. We developed chromVAR, an R package for analyzing sparse chromatin accessibility data by estimating the gain or loss of accessibility within sets of peaks sharing the same motif or annotation while controlling for known technical biases. chromVAR enables accurate clustering of scATAC-seq profiles and enables characterization of known, or the de novo identification of novel, sequence motifs associated with variation in chromatin accessibility across single cells or other sparse epigenomic data sets.

2020 ◽  
Author(s):  
Laiyi Fu ◽  
Lihua Zhang ◽  
Emmanuel Dollinger ◽  
Qinke Peng ◽  
Qing Nie ◽  
...  

AbstractCharacterizing genome-wide binding profiles of transcription factor (TF) is essential for understanding many biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining binding profiles at a single cell level remains elusive. Here we report scFAN (Single Cell Factor Analysis Network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pre-trained on genome-wide bulk ATAC-seq, DNA sequence and ChIP-seq data, and utilizes single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by studying sequence motifs enriched within predicted binding peaks and investigating the effectiveness of predicted TF peaks for discovering cell types. We develop a new metric “TF activity score” to characterize each cell, and show that the activity scores can reliably capture cell identities. The method allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Shengquan Chen ◽  
Guanao Yan ◽  
Wenyu Zhang ◽  
Jinzhao Li ◽  
Rui Jiang ◽  
...  

AbstractThe recent advancements in single-cell technologies, including single-cell chromatin accessibility sequencing (scCAS), have enabled profiling the epigenetic landscapes for thousands of individual cells. However, the characteristics of scCAS data, including high dimensionality, high degree of sparsity and high technical variation, make the computational analysis challenging. Reference-guided approaches, which utilize the information in existing datasets, may facilitate the analysis of scCAS data. Here, we present RA3 (Reference-guided Approach for the Analysis of single-cell chromatin Accessibility data), which utilizes the information in massive existing bulk chromatin accessibility and annotated scCAS data. RA3 simultaneously models (1) the shared biological variation among scCAS data and the reference data, and (2) the unique biological variation in scCAS data that identifies distinct subpopulations. We show that RA3 achieves superior performance when used on several scCAS datasets, and on references constructed using various approaches. Altogether, these analyses demonstrate the wide applicability of RA3 in analyzing scCAS data.


2018 ◽  
Author(s):  
Martin Pirkl ◽  
Niko Beerenwinkel

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.


2020 ◽  
Author(s):  
Shengquan Chen ◽  
Guanao Yan ◽  
Wenyu Zhang ◽  
Jinzhao Li ◽  
Rui Jiang ◽  
...  

AbstractThe recent advancements in single-cell technologies, including single-cell chromatin accessibility sequencing (scCAS), have enabled profiling the epigenetic landscapes for thousands of individual cells. However, the characteristics of scCAS data, including high dimensionality, high degree of sparsity and high technical variation, make the computational analysis challenging. Reference-guided approach, which utilizes the information in existing datasets, may facilitate the analysis of scCAS data. We present RA3 (Reference-guided Approach for the Analysis of single-cell chromatin Acessibility data), which utilizes the information in massive existing bulk chromatin accessibility and annotated scCAS data. RA3 simultaneously models 1) the shared biological variation among scCAS data and the reference data, and 2) the unique biological variation in scCAS data that identifies distinct subpopulations. We show that RA3 achieves superior performance in many scCAS datasets. We also present several approaches to construct the reference data to demonstrate the wide applicability of RA3.


2018 ◽  
Author(s):  
Min Jung ◽  
Daniel Wells ◽  
Jannette Rusch ◽  
Suhaira Ahmed ◽  
Jonathan Marchini ◽  
...  

AbstractBy removing the confounding factor of cellular heterogeneity, single cell genomics can revolutionize the study of development and disease, but methods are needed to simplify comparison among individuals. To develop such a framework, we assayed the transcriptome in 62,600 single cells from the testes of wildtype mice, and mice with gonadal defects due to disruption of the genes Mlh3, Hormad1, Cul4a or Cnp. The resulting expression atlas of distinct cell clusters revealed novel markers and new insights into testis gene regulation. By jointly analysing mutant and wildtype cells using a model-based factor analysis method, SDA, we decomposed our data into 46 components that identify novel meiotic gene regulatory programmes, mutant-specific pathological processes, and technical effects. Moreover, we identify, de novo, DNA sequence motifs associated with each component, and show that SDA can be used to impute expression values from single cell data. Analysis of SDA components also led us to identify a rare population of macrophages within the seminiferous tubules of Mlh3-/- and Hormad1-/- testes, an area typically associated with immune privilege. We provide a web application to enable interactive exploration of testis gene expression and components at http://www.stats.ox.ac.uk/~wells/testisAtlas.html


2021 ◽  
Author(s):  
Wolfgang Kopp ◽  
Altuna Akalin ◽  
Uwe Ohler

Advances in single-cell technologies enable the routine interrogation of chromatin accessibility for tens of thousands of single cells, shedding light on gene regulatory processes at an unprecedented resolution. Meanwhile, size, sparsity and high dimensionality of the resulting data continue to pose challenges for its computational analysis, and specifically the integration of data from different sources. We have developed a dedicated computational approach, a variational auto-encoder using a noise model specifically designed for single-cell ATAC-seq data, which facilitates simultaneous dimensionality reduction and batch correction via an adversarial learning strategy. We showcase both its individual advantages on carefully chosen real and simulated data sets, as well as the benefits for detailed cell type characterization via integrating multiple complex datasets.


2019 ◽  
Author(s):  
Qiangyuan Zhu ◽  
Yichi Niu ◽  
Michael Gundry ◽  
Kuanwei Sheng ◽  
Muchun Niu ◽  
...  

AbstractIn the studies of single-cell genomics, the large endeavor has been focused on the detection of the permanent changes in the genome. On the other hand, spontaneous DNA damage frequently occurs and results in transient single-stranded changes to the genome until they are repaired. So far, successful profiling of these dynamic changes has not been demonstrated by single-cell whole-genome amplification methods. Here we reported a novel single-cell WGA method: Linearly Produced Semiamplicon based Split Amplification Reaction (LPSSAR), which allows, for the first time, the genome-wide detection of the DNA damage associated single nucleotide variants (dSNVs) in single human cells. The sequence-based detection of dSNVs allows the direct characterization of the major damage signature that occurred in human cells. In the analysis of the abundance of dSNVs along the genome, we observed two modules of dSNV abundance, instead of a homogeneous abundance of dSNVs. Interestingly, we found that the two modules are associated with the A/B topological compartments of the genome. This result suggests that the genome topology directly influences genome stability. Furthermore, with the detection of a large number of dSNVs in single cells, we showed that only under a stringent filtering condition, can we distinguish the de novo mutations from the dSNVs and achieve a reliable estimation of the total level of de novo mutations in a single cell.


2018 ◽  
Author(s):  
Changlin Wan ◽  
Wennan Chang ◽  
Yu Zhang ◽  
Fenil Shah ◽  
Xiaoyu Lu ◽  
...  

ABSTRACTA key challenge in modeling single-cell RNA-seq (scRNA-seq) data is to capture the diverse gene expression states regulated by different transcriptional regulatory inputs across single cells, which is further complicated by a large number of observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model that stems from the kinetic relationships between the transcriptional regulatory inputs and metabolism of mRNA and gene expression abundance in a cell. LTMG infers the expression multi-modalities across single cell entities, representing a gene’s diverse expression states; meanwhile the dropouts and low expressions are treated as left truncated, specifically representing an expression state that is under suppression. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of single-cell data sets, comparing to three other state of the art models. In addition, our systems kinetic approach of handling the low and zero expressions and correctness of the identified multimodality are validated on several independent experimental data sets. Application on data of complex tissues demonstrated the capability of LTMG in extracting varied expression states specific to cell types or cell functions. Based on LTMG, a differential gene expression test and a co-regulation module identification method, namely LTMG-DGE and LTMG-GCR, are further developed. We experimentally validated that LTMG-DGE is equipped with higher sensitivity and specificity in detecting differentially expressed genes, compared with other five popular methods, and that LTMG-GCR is capable to retrieve the gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


2017 ◽  
Author(s):  
Joshua D. Welch ◽  
Alexander J. Hartemink ◽  
Jan F. Prins

AbstractSingle cell genomic techniques promise to yield key insights into the dynamic interplay between gene expression and epigenetic modification. However, the experimental difficulty of performing multiple measurements on the same cell currently limits efforts to combine multiple genomic data sets into a united picture of single cell variation. We show that it is possible to construct cell trajectories, reflecting the changes that occur in a sequential biological process, from single cell ATAC-seq, bisulfite sequencing, and ChIP-seq data. In addition, we present an approach called MATCHER that computationally circumvents the experimental difficulties inherent in performing multiple genomic measurements on a single cell by inferring correspondence between single cell transcriptomic and epigenetic measurements performed on different cells of the same type. MATCHER works by first learning a separate manifold for the trajectory of each kind of genomic data, then aligning the manifolds to infer a shared trajectory in which cells measured using different techniques are directly comparable. Using scM&T-seq data, we confirm that MATCHER accurately predicts true single cell correlations between DNA methylation and gene expression without using known cell correspondence information. We also used MATCHER to infer correlations among gene expression, chromatin accessibility, and histone modifications in single mouse embryonic stem cells. These results reveal the dynamic interplay between epigenetic changes and gene expression underlying the transition from pluripotency to differentiation priming. Our work is a first step toward a united picture of heterogeneous transcriptomic and epigenetic states in single cells.


2018 ◽  
Author(s):  
Anja Mezger ◽  
Sandy Klemm ◽  
Ishminder Mann ◽  
Kara Brower ◽  
Alain Mir ◽  
...  

We have developed a high-throughput single-cell ATAC-seq (assay for transposition of accessible chromatin) method to measure physical access to DNA in whole cells. Our approach integrates fluorescence imaging and addressable reagent deposition across a massively parallel (5184) nano-well array, yielding a nearly 20-fold improvement in throughput (up to ~1800 cells/chip, 4-5 hour on-chip processing time) and cost (~98¢ per cell) compared to prior microfluidic implementations. We applied this method to measure regulatory variation in Peripheral Blood Mononuclear Cells (PBMCs) and show robust,de-novoclustering of single cells by hematopoietic cell type.


Sign in / Sign up

Export Citation Format

Share Document