scholarly journals RoboCOP: Jointly computing chromatin occupancy profiles for numerous factors from chromatin accessibility data

2020 ◽  
Author(s):  
Sneha Mitra ◽  
Jianling Zhong ◽  
David M. MacAlpine ◽  
Alexander J. Hartemink

AbstractChromatin is the tightly packaged structure of DNA and protein within the nucleus of a cell. The arrangement of different protein complexes along the DNA modulates and is modulated by gene expression. Measuring the binding locations and level of occupancy of different transcription factors (TFs) and nucleosomes is therefore crucial to understanding gene regulation. Antibody-based methods for assaying chromatin occupancy are capable of identifying the binding sites of specific DNA binding factors, but only one factor at a time. On the other hand, epigenomic accessibility data like ATAC-seq, DNase-seq, and MNase-seq provide insight into the chromatin landscape of all factors bound along the genome, but with minimal insight into the identities of those factors. Here, we present RoboCOP, a multivariate state space model that integrates chromatin information from epigenomic accessibility data with nucleotide sequence to compute genome-wide probabilistic scores of nucleosome and TF occupancy, for hundreds of different factors at once. We apply RoboCOP to MNase-seq data to elucidate the protein-binding landscape of nucleosomes and 150 TFs across the yeast genome. Using available protein-binding datasets from the literature, we show that our model predicts the binding of these factors genome-wide more accurately than existing methods.

2021 ◽  
pp. gr.276080.121
Author(s):  
Christopher T Coey ◽  
David J. Clark

Sequence-specific DNA-binding transcription factors are central to gene regulation. They are often associated with consensus binding sites that predict far more genomic sites than are bound in vivo. One explanation is that most sites are blocked by nucleosomes, such that only sites in nucleosome-depleted regulatory regions are bound. We compared the binding of the yeast transcription factor Gcn4 in vivo using published ChIP-seq data (546 sites) and in vitro, using a modified SELEX method ("G-SELEX"), which utilizes short genomic DNA fragments to quantify binding at all sites. We confirm that Gcn4 binds strongly to an AP-1-like sequence (TGACTCA) and weakly to half-sites. However, Gcn4 binds only some of the 1078 exact matches to this sequence, even in vitro. We show that there are only 166 copies of the high-affinity RTGACTCAY site (exact match) in the yeast genome, all occupied in vivo, largely independently of whether they are located in nucleosome-depleted or nucleosomal regions. Generally, RTGACTCAR/YTGACTCAY sites are bound much more weakly and YTGACTCAR sites are unbound, with biological implications for determining induction levels. We conclude that, to a first approximation, Gcn4 binding can be predicted using the high-affinity site, without reference to chromatin structure. We propose that transcription factor binding sites should be defined more precisely using quantitative data, allowing more accurate genome-wide prediction of binding sites and greater insight into gene regulation.


2020 ◽  
Vol 64 (4) ◽  
pp. R45-R56 ◽  
Author(s):  
Andrea Hanel ◽  
Henna-Riikka Malmberg ◽  
Carsten Carlberg

Molecular endocrinology of vitamin D is based on the activation of the transcription factor vitamin D receptor (VDR) by the vitamin D metabolite 1α,25-dihydroxyvitamin D3. This nuclear vitamin D-sensing process causes epigenome-wide effects, such as changes in chromatin accessibility as well as in the contact of VDR and its supporting pioneer factors with thousands of genomic binding sites, referred to as vitamin D response elements. VDR binding enhancer regions loop to transcription start sites of hundreds of vitamin D target genes resulting in changes of their expression. Thus, vitamin D signaling is based on epigenome- and transcriptome-wide shifts in VDR-expressing tissues. Monocytes are the most responsive cell type of the immune system and serve as a paradigm for uncovering the chromatin model of vitamin D signaling. In this review, an alternative approach for selecting vitamin D target genes is presented, which are most relevant for understanding the impact of vitamin D endocrinology on innate immunity. Different scenarios of the regulation of primary upregulated vitamin D target genes are presented, in which vitamin D-driven super-enhancers comprise a cluster of persistent (constant) and/or inducible (transient) VDR-binding sites. In conclusion, the spatio-temporal VDR binding in the context of chromatin is most critical for the regulation of vitamin D target genes.


2019 ◽  
Author(s):  
Igor Mačinković ◽  
Ina Theofel ◽  
Tim Hundertmark ◽  
Kristina Kovač ◽  
Stephan Awe ◽  
...  

Abstract CoREST has been identified as a subunit of several protein complexes that generate transcriptionally repressive chromatin structures during development. However, a comprehensive analysis of the CoREST interactome has not been carried out. We use proteomic approaches to define the interactomes of two dCoREST isoforms, dCoREST-L and dCoREST-M, in Drosophila. We identify three distinct histone deacetylase complexes built around a common dCoREST/dRPD3 core: A dLSD1/dCoREST complex, the LINT complex and a dG9a/dCoREST complex. The latter two complexes can incorporate both dCoREST isoforms. By contrast, the dLSD1/dCoREST complex exclusively assembles with the dCoREST-L isoform. Genome-wide studies show that the three dCoREST complexes associate with chromatin predominantly at promoters. Transcriptome analyses in S2 cells and testes reveal that different cell lineages utilize distinct dCoREST complexes to maintain cell-type-specific gene expression programmes: In macrophage-like S2 cells, LINT represses germ line-related genes whereas other dCoREST complexes are largely dispensable. By contrast, in testes, the dLSD1/dCoREST complex prevents transcription of germ line-inappropriate genes and is essential for spermatogenesis and fertility, whereas depletion of other dCoREST complexes has no effect. Our study uncovers three distinct dCoREST complexes that function in a lineage-restricted fashion to repress specific sets of genes thereby maintaining cell-type-specific gene expression programmes.


2019 ◽  
Author(s):  
Jacob Schreiber ◽  
Jeffrey Bilmes ◽  
William Stafford Noble

AbstractMotivationRecent efforts to describe the human epigenome have yielded thousands of uniformly processed epigenomic and transcriptomic data sets. These data sets characterize a rich variety of biological activity in hundreds of human cell lines and tissues (“biosamples”). Understanding these data sets, and specifically how they differ across biosamples, can help explain many cellular mechanisms, particularly those driving development and disease. However, due primarily to cost, the total number of assays that can be performed is limited. Previously described imputation approaches, such as Avocado, have sought to overcome this limitation by predicting genome-wide epigenomics experiments using learned associations among available epigenomic data sets. However, these previous imputations have focused primarily on measurements of histone modification and chromatin accessibility, despite other biological activity being crucially important.ResultsWe applied Avocado to a data set of 3,814 tracks of data derived from the ENCODE compendium, spanning 400 human biosamples and 84 assays. The resulting imputations cover measurements of chromatin accessibility, histone modification, transcription, and protein binding. We demonstrate the quality of these imputations by comprehensively evaluating the model’s predictions and by showing significant improvements in protein binding performance compared to the top models in an ENCODE-DREAM challenge. Additionally, we show that the Avocado model allows for efficient addition of new assays and biosamples to a pre-trained model, achieving high accuracy at predicting protein binding, even with only a single track of training data.AvailabilityTutorials and source code are available under an Apache 2.0 license at https://github.com/jmschrei/[email protected] or [email protected]


Author(s):  
Merve Dede ◽  
Megan McLaughlin ◽  
Eiru Kim ◽  
Traver Hart

AbstractMajor efforts on pooled library CRISPR knockout screening across hundreds of cell lines have identified genes whose disruption leads to fitness defects, a critical step in identifying candidate cancer targets. However, the number of essential genes detected from these monogenic knockout screens are very low compared to the number of constitutively expressed genes in a cell, raising the question of why there are so few essential genes. Through a systematic analysis of screen data in cancer cell lines generated by the Cancer Dependency Map, we observed that half of all constitutively-expressed genes are never hits in any CRISPR screen, and that these never-essentials are highly enriched for paralogs. We investigated paralog buffering through systematic dual-gene CRISPR knockout screening by testing algorithmically defined ~400 candidate paralog pairs with the enCas12a multiplex knockout system in three cell lines. We observed 24 synthetic lethal paralog pairs which have escaped detection by monogenic knockout screens at stringent thresholds. Nineteen of 24 (79%) synthetic lethal interactions were present in at least two out of three cell lines and 14 of 24 (58%) were present in all three cell lines tested, including alternate subunits of stable protein complexes as well as functionally redundant enzymes. Together these observations strongly suggest that paralogs represent a targetable set of genetic dependencies that are systematically under-represented among cell-essential genes due to genetic buffering in monogenic CRISPR-based mammalian functional genomics approaches.


2019 ◽  
Vol 17 (5) ◽  
pp. 1081-1089 ◽  
Author(s):  
Rohit Kumar ◽  
Kristoffer Peterson ◽  
Majda Misini Ignjatović ◽  
Hakon Leffler ◽  
Ulf Ryde ◽  
...  

Analysis of a ligand induced-aglycone-binding pocket in galectin-3 provides detailed insight into interactions of fluorinated phenyl moieties with arginine-containing protein binding sites and the complex interplay of different energetic components in defining the binding affinity.


Sign in / Sign up

Export Citation Format

Share Document