Genome-wide association between transcription factor expression and chromatin accessibility reveals chromatin state regulators

Mapping Intimacies ◽

10.1101/043414 ◽

2016 ◽

Author(s):

David Felix Lamparter ◽

Daniel Marbach ◽

Rico Rueedi ◽

Sven Bergmann ◽

Zoltan Kutalik

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Chromatin Accessibility ◽

Binding Motif ◽

Open Chromatin ◽

Data Sets ◽

Transcription Factor Binding Motif ◽

Data Set ◽

Data Driven Approach ◽

Transcription Factor Expression

To better understand genome regulation, it is important to uncover the role of transcription factors in the process of chromatin structure establishment and maintenance. Here we present a data-driven approach to systematically characterize transcription factors that are relevant for this process. Our method uses a linear mixed modeling approach to combine data sets of transcription factor binding motif enrichments in open chromatin and gene expression across the same set of cell lines. Applying this approach to the ENCODE data set we confirm already known and imply numerous novel transcription factors in playing a role in the establishment or maintenance of open chromatin.

Get full-text (via PubEx)

Detecting differential transcription factor activity from ATAC-seq data

10.1101/315622 ◽

2018 ◽

Cited By ~ 2

Author(s):

Ignacio J. Tripodi ◽

Mary A. Allen ◽

Robin D. Dowell

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Chromatin Accessibility ◽

Open Chromatin ◽

Nucleotide Polymorphisms ◽

Transcription Factor Activity ◽

Factor Activity ◽

Genome Wide ◽

Recognition Motifs ◽

Differential Transcription

AbstractTranscription factors are managers of the cellular factory, and key components to many diseases. Many non-coding single nucleotide polymorphisms affect transcription factors, either by directly altering the protein or its functional activity at individual binding sites. Here we first briefly summarize high throughput approaches to studying transcription factor activity. We then demonstrate, using published chromatin accessibility data (specifically ATAC-seq), that the genome wide profile of TF recognition motifs relative to regions of open chromatin can determine the key transcription factor altered by a perturbation. Our method of determining which TF are altered by a perturbation is simple, quick to implement and can be used when biological samples are limited. In the future, we envision this method could be applied to determining which TFs show altered activity in response to a wide variety of drugs and diseases.

Get full-text (via PubEx)

Chromatin Accessibility Profiling Reveals Cis-Regulatory Heterogeneity and Novel Transcription Factor Dependencies in Multiple Myeloma

Blood ◽

10.1182/blood-2018-99-119941 ◽

2018 ◽

Vol 132 (Supplement 1) ◽

pp. 1313-1313

Author(s):

Christopher J. Ott ◽

Raphael Szalat ◽

Matthew Lawlor ◽

Mehmet Kemal Samur ◽

Yan Xu ◽

...

Keyword(s):

Multiple Myeloma ◽

Transcription Factor ◽

Transcription Factors ◽

Board Of Directors ◽

Plasma Cell ◽

Therapeutic Target ◽

Chromatin Accessibility ◽

Open Chromatin ◽

Advisory Committees ◽

Equity Ownership

Abstract Multiple myeloma (MM) is a plasma cell malignancy characterized by clinical and genomic heterogeneity. Recurrent IgH translocations, copy number abnormalities and somatic mutations have been reported to participate in myelomagenesis; however no universal driver of the disease has been identified. Here, we hypothesize that transcriptional deregulation is critical for MM pathogenesis and the maintenance of the MM cell state. In order to capture signatures of transcription factor engagement with the myeloma epigenome, we performed the assay for transposase-accessible chromatin sequencing (ATAC sequencing), deep RNA sequencing in 23 primary myeloma samples and 5 normal plasma cell samples (NPC) from healthy donors along with whole genome sequencing and H3K27ac ChIP-seq in a cohort of these primary MM samples. We identified 22,603 variable accessible loci between MM and NPC and correlated impact of these on expression of associated genes using RNA-seq data. Together with robust differential analysis of open chromatin regions and nuclease-accessibility footprints to identify discrete transcription factor binding events, we have discerned the myeloma-specific open chromatin landscape, identified transcription factor dependencies and potential new myeloma drivers. In our dataset we observe a vast number of loci with heterogeneous chromatin states across the sample cohort, and the majority of the open chromatin sites identified are unique to a single sample. However, distinct variable chromatin accessibility signatures indicative of the MM chromatin state when compared to normal plasma cells were observed. Remarkably, we observed more frequent recurrent loss of variable accessible loci compared to gains. In addition, specific open chromatin profiles evident in hyperdiploid and non-hyperdiploid MM were also identified. Accessibility footprinting revealed MM-specific enrichment for transcription factors known to be essential for MM cell survival including Interferon Regulatory Factors (IRFs), Nuclear Factor Kappa B (NFkB), Ikaros, and Sp1. Interestingly, we also identify the myocyte enhancer factor 2 (MEF2) family of transcription factors as being specifically enriched in open chromatin regions in MM cells. Using a CRISPR-Cas9 knockout system, we identify the MEF2 family member MEF2C as essential for MM cell proliferation and survival. MEF2C is significantly overexpressed at the RNA level in our study as well as in several independent cohorts and is a central enhancer-localized transcription factor in MM core regulatory circuitry as determined by H3K27ac ChIP-sequencing profiles of primary MM samples. In order to evaluate MEF2C as a therapeutic target, we used small molecule inhibitors targeting MEF2C activity via inhibition of MEF2C phosphorylation using inhibitors of salt-induced kinases (SIK) and microtubule-associated protein/microtubule affinity regulating kinases (MARK). SIK/MARK have been described to specifically activate MEF2C. SIK and MARK inhibition resulted in both dose- and time-dependent inhibition of MM cell growth and survival in a panel of 12 MM cell lines with various genotypic and phenotypic characteristics, revealing a potential approach to targeting the dysregulated gene regulatory state of myeloma. To conclude, here we identify here an altered chromatin accessibility landscape in multiple myeloma that likely contributes to oncogenic transcription states through the activity of transcription factors such as MEF2C, representing a new MM dependency and potential therapeutic target. Disclosures Anderson: Millennium Takeda: Consultancy; C4 Therapeutics: Equity Ownership, Other: Scientific founder; Bristol Myers Squibb: Consultancy; Gilead: Membership on an entity's Board of Directors or advisory committees; Celgene: Consultancy; OncoPep: Equity Ownership, Other: Scientific founder. Young:Camp4 Therapeutics: Consultancy, Equity Ownership, Membership on an entity's Board of Directors or advisory committees; Syros Pharmaceuticals: Consultancy, Equity Ownership, Membership on an entity's Board of Directors or advisory committees; Omega Therapeutics: Consultancy, Equity Ownership, Membership on an entity's Board of Directors or advisory committees. Munshi:OncoPep: Other: Board of director.

Get full-text (via PubEx)

RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections

10.1101/065565 ◽

2016 ◽

Cited By ~ 1

Author(s):

Jaime Abraham Castro-Mondragon ◽

Sébastien Jaeger ◽

Denis Thieffry ◽

Morgane Thomas-Chollier ◽

Jacques van Helden

Keyword(s):

Transcription Factor ◽

Motif Discovery ◽

Binding Motif ◽

Data Sets ◽

Transcription Factor Binding Motif ◽

Biologically Relevant ◽

Manual Curation ◽

Versatile Tool ◽

Multiple Motif ◽

Multiple Trees

ABSTRACTTranscription Factor (TF) databases contain multitudes of motifs from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq peaks) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant collections of motifs. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools and highlights biologically relevant variations of similar motifs. By clustering 24 entire databases (>7,500 motifs), we show that matrix-clustering correctly groups motifs belonging to the same TF families, and can drastically reduce motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines.

Get full-text (via PubEx)

The developing medulla is subdivided into the concentric zones by the expression of conserved transcription factors labeled in green, magenta and blue. The transcription factor expression and neuronal types are specified according to the neuronal birth or

Development Growth & Differentiation ◽

10.1111/dgd.12085 ◽

2014 ◽

Vol 56 (7) ◽

pp. i-i

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Factor Expression ◽

Neuronal Types ◽

Transcription Factor Expression

Get full-text (via PubEx)

HOT or not: examining the basis of high-occupancy target regions

10.1101/107680 ◽

2017 ◽

Cited By ~ 3

Author(s):

Katarzyna Wreczycka ◽

Vedran Franke ◽

Bora Uyar ◽

Ricardo Wurmus ◽

Altuna Akalin

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Binding Sites ◽

Cell Types ◽

Data Sets ◽

Gene Promoters ◽

Quadruplex Dna ◽

Golden Standard ◽

Multiple Cell ◽

Multiple Species

AbstractHigh-occupancy target (HOT) regions are the segments of the genome with unusually high number of transcription factor binding sites. These regions are observed in multiple species and thought to have biological importance due to high transcription factor occupancy. Furthermore, they coincide with house-keeping gene promoters and the associated genes are stably expressed across multiple cell types. Despite these features, HOT regions are solemnly defined using ChIP-seq experiments and shown to lack canonical motifs for transcription factors that are thought to be bound there. Although, ChIP-seq experiments are the golden standard for finding genome-wide binding sites of a protein, they are not noise free. Here, we show that HOT regions are likely to be ChIP-seq artifacts and they are similar to previously proposed “hyper-ChIPable” regions. Using ChIP-seq data sets for knocked-out transcription factors, we demonstrate presence of false positive signals on HOT regions. We observe sequence characteristics and genomic features that are discriminatory of HOT regions, such as GC/CpG-rich k-mers and enrichment of RNA-DNA hybrids (R-loops) and DNA tertiary structures (G-quadruplex DNA). The artificial ChIP-seq enrichment on HOT regions could be associated to these discriminatory features. Furthermore, we propose strategies to deal with such artifacts for the future ChIP-seq studies.

Get full-text (via PubEx)

Prediction of condition-specific regulatory genes using machine learning

Nucleic Acids Research ◽

10.1093/nar/gkaa264 ◽

2020 ◽

Vol 48 (11) ◽

pp. e62-e62 ◽

Cited By ~ 2

Author(s):

Qi Song ◽

Jiyoung Lee ◽

Shamima Akter ◽

Matthew Rogers ◽

Ruth Grene ◽

...

Keyword(s):

Machine Learning ◽

Transcription Factors ◽

Single Cell ◽

Control Cell ◽

Genomic Data ◽

Regulatory Genes ◽

Genomic Research ◽

Open Chromatin ◽

Data Set ◽

Better Than

Abstract Recent advances in genomic technologies have generated data on large-scale protein–DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5–25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.

Get full-text (via PubEx)

Transcription Factor Binding Motif (TFBM)

Dictionary of Bioinformatics and Computational Biology ◽

10.1002/9780471650126.dob1114 ◽

2004 ◽

Author(s):

Jacques van Helden

Keyword(s):

Transcription Factor ◽

Transcription Factor Binding ◽

Binding Motif ◽

Transcription Factor Binding Motif ◽

Factor Binding

Get full-text (via PubEx)

A Quantitative Analysis of the Impact on Chromatin Accessibility by Histone Modifications and Binding of Transcription Factors in DNase I Hypersensitive Sites

BioMed Research International ◽

10.1155/2013/914971 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 7

Author(s):

Peng Cui ◽

Jing Li ◽

Bo Sun ◽

Menghuan Zhang ◽

Baofeng Lian ◽

...

Keyword(s):

Transcription Factors ◽

Quantitative Analysis ◽

Histone Modifications ◽

Chromatin Accessibility ◽

Dnase I ◽

Open Chromatin ◽

Dnase I Hypersensitive Sites ◽

Biological Phenomena ◽

Hypersensitive Sites ◽

The Impact

It is known that chromatin features such as histone modifications and the binding of transcription factors exert a significant impact on the “openness” of chromatin. In this study, we present a quantitative analysis of the genome-wide relationship between chromatin features and chromatin accessibility in DNase I hypersensitive sites. We found that these features show distinct preference to localize in open chromatin. In order to elucidate the exact impact, we derived quantitative models to directly predict the “openness” of chromatin using histone modification features and transcription factor binding features, respectively. We show that these two types of features are highly predictive for chromatin accessibility in a statistical viewpoint. Moreover, our results indicate that these features are highly redundant and only a small number of features are needed to achieve a very high predictive power. Our study provides new insights into the true biological phenomena and the combinatorial effects of chromatin features to differential DNase I hypersensitivity.

Get full-text (via PubEx)

Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples

10.1101/533273 ◽

2019 ◽

Cited By ~ 7

Author(s):

Jacob Schreiber ◽

Jeffrey Bilmes ◽

William Stafford Noble

Keyword(s):

Biological Activity ◽

Protein Binding ◽

Histone Modification ◽

Chromatin Accessibility ◽

Training Data ◽

Data Sets ◽

Cellular Mechanisms ◽

Data Set ◽

Genome Wide

AbstractMotivationRecent efforts to describe the human epigenome have yielded thousands of uniformly processed epigenomic and transcriptomic data sets. These data sets characterize a rich variety of biological activity in hundreds of human cell lines and tissues (“biosamples”). Understanding these data sets, and specifically how they differ across biosamples, can help explain many cellular mechanisms, particularly those driving development and disease. However, due primarily to cost, the total number of assays that can be performed is limited. Previously described imputation approaches, such as Avocado, have sought to overcome this limitation by predicting genome-wide epigenomics experiments using learned associations among available epigenomic data sets. However, these previous imputations have focused primarily on measurements of histone modification and chromatin accessibility, despite other biological activity being crucially important.ResultsWe applied Avocado to a data set of 3,814 tracks of data derived from the ENCODE compendium, spanning 400 human biosamples and 84 assays. The resulting imputations cover measurements of chromatin accessibility, histone modification, transcription, and protein binding. We demonstrate the quality of these imputations by comprehensively evaluating the model’s predictions and by showing significant improvements in protein binding performance compared to the top models in an ENCODE-DREAM challenge. Additionally, we show that the Avocado model allows for efficient addition of new assays and biosamples to a pre-trained model, achieving high accuracy at predicting protein binding, even with only a single track of training data.AvailabilityTutorials and source code are available under an Apache 2.0 license at https://github.com/jmschrei/[email protected] or [email protected]

Get full-text (via PubEx)

Expresso: A database and web server for exploring the interaction of transcription factors and their target genes in Arabidopsis thaliana using ChIP-Seq peak data

F1000Research ◽

10.12688/f1000research.10041.1 ◽

2017 ◽

Vol 6 ◽

pp. 372 ◽

Cited By ~ 7

Author(s):

Delasa Aghamirzaie ◽

Karthik Raja Velmurugan ◽

Shuchi Wu ◽

Doaa Altarawy ◽

Lenwood S. Heath ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Transcription Factors ◽

Chromatin Immunoprecipitation ◽

Target Genes ◽

Regulation Of Gene Expression ◽

Data Sets ◽

Motif Analysis ◽

Chromatin Immunoprecipitation Sequencing

Motivation: The increasing availability of chromatin immunoprecipitation sequencing (ChIP-Seq) data enables us to learn more about the action of transcription factors in the regulation of gene expression. Even though in vivo transcriptional regulation often involves the concerted action of more than one transcription factor, the format of each individual ChIP-Seq dataset usually represents the action of a single transcription factor. Therefore, a relational database in which available ChIP-Seq datasets are curated is essential. Results: We present Expresso (database and webserver) as a tool for the collection and integration of available Arabidopsis ChIP-Seq peak data, which in turn can be linked to a user’s gene expression data. Known target genes of transcription factors were identified by motif analysis of publicly available GEO ChIP-Seq data sets. Expresso currently provides three services: 1) Identification of target genes of a given transcription factor; 2) Identification of transcription factors that regulate a gene of interest; 3) Computation of correlation between the gene expression of transcription factors and their target genes. Availability: Expresso is freely available at http://bioinformatics.cs.vt.edu/expresso/

Get full-text (via PubEx)