SACSANN: identifying sequence-based determinants of chromosomal compartments

Genomic organization is critical for proper gene regulation and based on a hierarchical model, where chromosomes are segmented into megabase-sized, cell-type-specific transcriptionally active (A) and inactive (B) compartments. Here, we describe SACSANN, a machine learning pipeline consisting of stacked artificial neural networks that predicts compartment annotation solely from genomic sequence-based features such as predicted transcription factor binding sites and transposable elements. SACSANN provides accurate and cell-type specific compartment predictions, while identifying key genomic sequence determinants that associate with A/B compartments. Models are shown to be largely transferable across analogous human and mouse cell types. By enabling the study of chromosome compartmentalization in species for which no Hi-C data is available, SACSANN paves the way toward the study of 3D genome evolution. SACSANN is publicly available on GitHub: https://github.com/BlanchetteLab/SACSANN

Download Full-text

Predicting gene regulatory networks from cell atlases

10.1101/2020.08.21.261735 ◽

2020 ◽

Author(s):

Andreas Fønss Møller ◽

Kedar Nath Natarajan

Keyword(s):

Single Cell ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Cell Types ◽

Mouse Cell ◽

Cell Type ◽

Integrated Network ◽

Mouse Tissues ◽

Cell Type Specific ◽

Gene Regulatory

AbstractRecent single-cell RNA-sequencing atlases have surveyed and identified major cell-types across different mouse tissues. Here, we computationally reconstruct gene regulatory networks from 3 major mouse cell atlases to capture functional regulators critical for cell identity, while accounting for a variety of technical differences including sampled tissues, sequencing depth and author assigned cell-type labels. Extracting the regulatory crosstalk from mouse atlases, we identify and distinguish global regulons active in multiple cell-types from specialised cell-type specific regulons. We demonstrate that regulon activities accurately distinguish individual cell types, despite differences between individual atlases. We generate an integrated network that further uncovers regulon modules with coordinated activities critical for cell-types, and validate modules using available experimental data. Inferring regulatory networks during myeloid differentiation from wildtype and Irf8 KO cells, we uncover functional contribution of Irf8 regulon activity and composition towards monocyte lineage. Our analysis provides an avenue to further extract and integrate the regulatory crosstalk from single-cell expression data.SummaryIntegrated single-cell gene regulatory network from three mouse cell atlases captures global and cell-type specific regulatory modules and crosstalk, important for cellular identity.

Download Full-text

Probing transcription factor combinatorics in different promoter classes and in enhancers

10.1101/197418 ◽

2017 ◽

Author(s):

Jimmy Vandel ◽

Océane Cassan ◽

Sophie Lèbre ◽

Charles-Henri Lecellier ◽

Laurent Bréhélin

Keyword(s):

Binding Sites ◽

Target Genes ◽

Cell Types ◽

Model Parameters ◽

Binding Motif ◽

Cell Type ◽

Regulatory Regions ◽

New Approach ◽

Cell Type Specific ◽

Common Target

In eukaryotic cells, transcription factors (TFs) are thought to act in a combinatorial way, by competing and collaborating to regulate common target genes. However, several questions remain regarding the conservation of these combina-tions among different gene classes, regulatory regions and cell types. We propose a new approach named TFcoop to infer the TF combinations involved in the binding of a tar-get TF in a particular cell type. TFcoop aims to predict the binding sites of the target TF upon the binding affinity of all identified cooperating TFs. The set of cooperating TFs and model parameters are learned from ChIP-seq data of the target TF. We used TFcoop to investigate the TF combina-tions involved in the binding of 106 TFs on 41 cell types and in four regulatory regions: promoters of mRNAs, lncRNAs and pri-miRNAs, and enhancers. We first assess that TFcoop is accurate and outperforms simple PWM methods for pre-dicting TF binding sites. Next, analysis of the learned models sheds light on important properties of TF combinations in different promoter classes and in enhancers. First, we show that combinations governing TF binding on enhancers are more cell-type specific than that governing binding in pro-moters. Second, for a given TF and cell type, we observe that TF combinations are different between promoters and en-hancers, but similar for promoters of mRNAs, lncRNAs and pri-miRNAs. Analysis of the TFs cooperating with the dif-ferent targets show over-representation of pioneer TFs and a clear preference for TFs with binding motif composition similar to that of the target. Lastly, our models accurately dis-tinguish promoters associated with specific biological processes.

Download Full-text

Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin

10.1101/2021.09.23.461564 ◽

2021 ◽

Author(s):

Meghana Kshirsagar ◽

Han Yuan ◽

Juan Lavista Ferres ◽

Christina Leslie

Keyword(s):

Binding Sites ◽

Motif Discovery ◽

De Novo ◽

Cell Types ◽

Open Chromatin ◽

Cell Type ◽

Unsupervised Deep Learning ◽

Cell Type Specific ◽

Accessible Chromatin

AbstractDetermining the cell type-specific and genome-wide binding locations of transcription factors (TFs) is an important step towards decoding gene regulatory programs. Profiling by the assay for transposase-accessible chromatin using sequencing (ATAC-seq) reveals open chromatin sites that are potential binding sites for TFs but does not identify which TFs occupy a given site. We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. Our approach automatically learns distinct groups of kmer patterns that correspond to cell type-specific in vivo binding signals. Latent factors found by BindVAE generally map to TFs that are expressed in the input cell type. BindVAE finds different TF binding sites in different cell types and can learn composite patterns for TFs involved in co-operative binding. BindVAE therefore provides a novel unsupervised approach to deconvolve the complex TF binding signals in chromatin accessible sites.

Download Full-text

Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model

10.1101/2020.04.10.036145 ◽

2020 ◽

Author(s):

Li Tang ◽

Matthew C. Hill ◽

Jun Wang ◽

Jianxin Wang ◽

James F. Martin ◽

...

Keyword(s):

Machine Learning ◽

Long Range ◽

Cell Types ◽

Learning Model ◽

Cell Type ◽

Structural Genomic ◽

Risk Variants ◽

Ensemble Machine Learning ◽

Machine Learning Model ◽

Cell Type Specific

AbstractTranscriptional enhancers commonly work over long genomic distances to precisely regulate spatiotemporal gene expression patterns. Dissecting the promoters physically contacted by these distal regulatory elements is essential for understanding developmental processes as well as the role of disease-associated risk variants. Modern proximity-ligation assays, like HiChIP and ChIA-PET, facilitate the accurate identification of long-range contacts between enhancers and promoters. However, these assays are technically challenging, expensive, and time-consuming, making it difficult to investigate enhancer topologies, especially in uncharacterized cell types. To overcome these shortcomings, we therefore designed LoopPredictor, an ensemble machine learning model, to predict genome topology for cell types which lack long-range contact maps. To enrich for functional enhancer-promoter loops over common structural genomic contacts, we trained LoopPredictor with both H3K27ac and YY1 HiChIP data. What’s more, the integration of several related multi-omics features facilitated identifying and annotating the predicted loops. LoopPredictor is able to efficiently identify cell type-specific enhancer mediated loops, and promoter-promoter interactions, with a modest feature input requirement. Comparable to experimentally generated H3K27ac HiChIP data, we found that LoopPredictor was able to identify functional enhancer loops. Furthermore, to explore the cross-species prediction capability of LoopPredictor, we fed mouse multi-omics features into a model trained on human data and found that the predicted enhancer loops outputs were highly conserved. LoopPredictor enables the dissection of cell type-specific long-range gene regulation, and can accelerate the identification of distal disease-associated risk variants.

Download Full-text

Annotations capturing cell type-specific TF binding explain a large fraction of disease heritability

Human Molecular Genetics ◽

10.1093/hmg/ddz226 ◽

2019 ◽

Vol 29 (7) ◽

pp. 1057-1067 ◽

Cited By ~ 3

Author(s):

Bryce van de Geijn ◽

Hilary Finucane ◽

Steven Gazal ◽

Farhad Hormozdiari ◽

Tiffany Amariuta ◽

...

Keyword(s):

Binding Sites ◽

Complex Traits ◽

Complex Disease ◽

Specific Binding ◽

Large Fraction ◽

Cell Types ◽

Genome Wide Association ◽

Cell Type ◽

Genome Wide ◽

Cell Type Specific

Abstract Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10−14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10−11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.

Download Full-text

Single-cell multi-omic profiling of chromatin conformation and DNA methylome

10.1101/503235 ◽

2018 ◽

Cited By ~ 1

Author(s):

Dong-Sung Lee ◽

Chongyuan Luo ◽

Jingtian Zhou ◽

Sahaana Chandran ◽

Angeline Rivkin ◽

...

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Cell Types ◽

Mouse Cell ◽

Regulatory Sequences ◽

Chromatin Conformation ◽

Cell Type ◽

Dynamic Regulation ◽

Single Nucleus ◽

Cell Type Specific

AbstractRecent advances in the development of single cell epigenomic assays have facilitated the analysis of gene regulatory landscapes in complex biological systems. Methods for detection of single-cell epigenomic variation such as DNA methylation sequencing and ATAC-seq hold tremendous promise for delineating distinct cell types and identifying their critical cis-regulatory sequences. Emerging evidence has shown that in addition to cis-regulatory sequences, dynamic regulation of 3D chromatin conformation is a critical mechanism for the modulation of gene expression during development and disease. It remains unclear whether single-cell Chromatin Conformation Capture (3C) or Hi-C profiles are suitable for cell type identification and allow the reconstruction of cell-type specific chromatin conformation maps. To address these challenges, we have developed a multi-omic method single-nucleus methyl-3C sequencing (sn-m3C-seq) to profile chromatin conformation and DNA methylation from the same cell. We have shown that bulk m3C-seq and sn-m3C-seq accurately capture chromatin organization information and robustly separate mouse cell types. We have developed a fluorescent-activated nuclei sorting strategy based on DNA content that eliminates nuclei multiplets caused by crosslinking. The sn-m3C-seq method allows high-resolution cell-type classification using two orthogonal types of epigenomic information and the reconstruction of cell-type specific chromatin conformation maps.

Download Full-text

NetTIME: improving multitask transcription factor binding site prediction with base-pair resolution

10.1101/2021.05.29.446316 ◽

2021 ◽

Author(s):

Ren Yi ◽

Kyunghyun Cho ◽

Richard Bonneau

Keyword(s):

Transcription Factor ◽

Base Pair ◽

Binding Sites ◽

Learning Strategy ◽

Cell Types ◽

Multitask Learning ◽

Cell Type ◽

Binding Prediction ◽

Single Task ◽

Cell Type Specific

Machine learning models for predicting cell-type-specific transcription factor (TF) binding sites have become increasingly more accurate thanks to the increased availability of next-generation sequencing data and more standardized model evaluation criteria. However, knowledge transfer from data-rich to data-limited TFs and cell types remains crucial for improving TF binding prediction models because available binding labels are highly skewed towards a small collection of TFs and cell types. Transfer prediction of TF binding sites can potentially benefit from a multitask learning approach; however, existing methods typically use shallow single-task models to generate low-resolution predictions. Here we propose NetTIME, a multitask learning framework for predicting cell-type-specific transcription factor binding sites with base-pair resolution. We show that the multitask learning strategy for TF binding prediction is more efficient than the single-task approach due to the increased data availability. NetTIME trains high-dimensional embedding vectors to distinguish TF and cell-type identities. We show that this approach is critical for the success of the multitask learning strategy and allows our model to make accurate transfer predictions within and beyond the training panels of TFs and cell types. We additionally train a linear-chain conditional random field (CRF) to classify binding predictions and show that this CRF eliminates the need for setting a probability threshold and reduces classification noise. We compare our method's predictive performance with several state-of-the-art methods, including DeepBind, BindSpace, and Catchitt, and show that our method outperforms previous methods under both supervised and transfer learning settings.

Download Full-text

Profiling of accessible chromatin regions across multiple plant species and cell types reveals common gene regulatory principles and new control modules

10.1101/167932 ◽

2017 ◽

Cited By ~ 1

Author(s):

Kelsey A. Maher ◽

Marko Bajic ◽

Kaisa Kajala ◽

Mauricio Reynoso ◽

Germain Pauluzzi ◽

...

Keyword(s):

Hair Cell ◽

Plant Species ◽

Stress Responses ◽

Binding Sites ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Cell Type ◽

Cell Type Specific ◽

Accessible Chromatin

ABSTRACTThe transcriptional regulatory structure of plant genomes remains poorly defined relative to animals. It is unclear how many cis-regulatory elements exist, where these elements lie relative to promoters, and how these features are conserved across plant species. We employed the Assay for Transposase-Accessible Chromatin (ATAC-seq) in four plant species (Arabidopsis thaliana, Medicago truncatula, Solanum lycopersicum, and Oryza sativa) to delineate open chromatin regions and transcription factor (TF) binding sites across each genome. Despite 10-fold variation in intergenic space among species, the majority of open chromatin regions lie within 3 kb upstream of a transcription start site in all species. We find a common set of four TFs that appear to regulate conserved gene sets in the root tips of all four species, suggesting that TF-gene networks are generally conserved. Comparative ATAC-seq profiling of Arabidopsis root hair and non-hair cell types revealed extensive similarity as well as many cell type-specific differences. Analyzing TF binding sites in differentially accessible regions identified a MYB-driven regulatory module unique to the hair cell, which appears to control both cell fate regulators and abiotic stress responses. Our analyses revealed common regulatory principles among species and shed light on the mechanisms producing cell type-specific transcriptomes during development.

Download Full-text

Connectivity characterization of the mouse basolateral amygdalar complex

Nature Communications ◽

10.1038/s41467-021-22915-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Houri Hintiryan ◽

Ian Bowman ◽

David L. Johnson ◽

Laura Korobkova ◽

Muye Zhu ◽

...

Keyword(s):

Granular Cell ◽

Cell Types ◽

Projection Neurons ◽

Cell Type ◽

Connectivity Map ◽

Analysis Techniques ◽

Domain Specific ◽

Cell Type Specific ◽

Unique Domain

AbstractThe basolateral amygdalar complex (BLA) is implicated in behaviors ranging from fear acquisition to addiction. Optogenetic methods have enabled the association of circuit-specific functions to uniquely connected BLA cell types. Thus, a systematic and detailed connectivity profile of BLA projection neurons to inform granular, cell type-specific interrogations is warranted. Here, we apply machine-learning based computational and informatics analysis techniques to the results of circuit-tracing experiments to create a foundational, comprehensive BLA connectivity map. The analyses identify three distinct domains within the anterior BLA (BLAa) that house target-specific projection neurons with distinguishable morphological features. We identify brain-wide targets of projection neurons in the three BLAa domains, as well as in the posterior BLA, ventral BLA, posterior basomedial, and lateral amygdalar nuclei. Inputs to each nucleus also are identified via retrograde tracing. The data suggests that connectionally unique, domain-specific BLAa neurons are associated with distinct behavior networks.

Download Full-text

Shisa6 mediates cell-type specific regulation of depression in the nucleus accumbens

Molecular Psychiatry ◽

10.1038/s41380-021-01217-8 ◽

2021 ◽

Author(s):

Hee-Dae Kim ◽

Jing Wei ◽

Tanessa Call ◽

Nicole Teru Quintus ◽

Alexander J. Summers ◽

...

Keyword(s):

Nucleus Accumbens ◽

Molecular Mechanisms ◽

Cell Types ◽

Current Treatment ◽

Specific Cell ◽

Excitatory Synapses ◽

Cell Type ◽

Cell Type Specific ◽

Specific Regulation ◽

Circuit Function

AbstractDepression is the leading cause of disability and produces enormous health and economic burdens. Current treatment approaches for depression are largely ineffective and leave more than 50% of patients symptomatic, mainly because of non-selective and broad action of antidepressants. Thus, there is an urgent need to design and develop novel therapeutics to treat depression. Given the heterogeneity and complexity of the brain, identification of molecular mechanisms within specific cell-types responsible for producing depression-like behaviors will advance development of therapies. In the reward circuitry, the nucleus accumbens (NAc) is a key brain region of depression pathophysiology, possibly based on differential activity of D1- or D2- medium spiny neurons (MSNs). Here we report a circuit- and cell-type specific molecular target for depression, Shisa6, recently defined as an AMPAR component, which is increased only in D1-MSNs in the NAc of susceptible mice. Using the Ribotag approach, we dissected the transcriptional profile of D1- and D2-MSNs by RNA sequencing following a mouse model of depression, chronic social defeat stress (CSDS). Bioinformatic analyses identified cell-type specific genes that may contribute to the pathogenesis of depression, including Shisa6. We found selective optogenetic activation of the ventral tegmental area (VTA) to NAc circuit increases Shisa6 expression in D1-MSNs. Shisa6 is specifically located in excitatory synapses of D1-MSNs and increases excitability of neurons, which promotes anxiety- and depression-like behaviors in mice. Cell-type and circuit-specific action of Shisa6, which directly modulates excitatory synapses that convey aversive information, identifies the protein as a potential rapid-antidepressant target for aberrant circuit function in depression.

Download Full-text