scholarly journals Learning the histone codes of gene regulation with large genomic windows and three-dimensional chromatin interactions using transformer

2021 ◽  
Author(s):  
Dohoon Lee ◽  
Jeewon Yang ◽  
Sun Kim

The quantitative characterization of the transcriptional control by histone modifications (HMs) has been challenged by many computational studies, but still most of them exploit only partial aspects of intricate mechanisms involved in gene regulation, leaving a room for improvement. We present Chromoformer, a new transformer-based deep learning architecture that achieves the state-of-the-art performance in the quantitative deciphering of the histone codes of gene regulation. The core essence of Chromoformer architecture lies in the three variants of attention operation, each specialized to model individual hierarchy of three-dimensional (3D) transcriptional regulation including (1) histone codes at core promoters, (2) pairwise interaction between a core promoter and a distal cis-regulatory element mediated by 3D chromatin interactions, and (3) the collective effect of the pairwise cis-regulations. In-depth interpretation of the trained model behavior based on attention scores suggests that Chromoformer adaptively exploits the distant dependencies between HMs associated with transcription initiation and elongation. We also demonstrate that the quantitative kinetics of transcription factories and polycomb group bodies, in which the coordinated gene regulation occurs through spatial sequestration of genes with regulatory elements, can be captured by Chromoformer. Together, our study shows the great power of attention-based deep learning as a versatile modeling approach for the complex epigenetic landscape of gene regulation and highlights its potential as an effective toolkit that facilitates scientific discoveries in computational epigenetics.

2017 ◽  
Author(s):  
Sarah Rennie ◽  
Maria Dalby ◽  
Marta Lloret-Llinares ◽  
Stylianos Bakoulis ◽  
Christian Dalager Vaagensø ◽  
...  

ABSTRACTMammalian gene promoters and enhancers share many properties. They are composed of a unified promoter architecture of divergent transcripton initiation and gene promoters may exhibit enhancer function. However, it is currently unclear how expression strength of a regulatory element relates to its enhancer strength and if the unifying architecture is conserved across Metazoa. Here we investigate the transcription initiation landscape and its associated RNA decay in D. melanogaster. Surprisingly, we find that the majority of active gene-distal enhancers and a considerable fraction of gene promoters are divergently transcribed. We observe quantitative relationships between enhancer potential, expression level and core promoter strength, providing an explanation for indirectly related histone modifications that are reflecting expression levels. Lowly abundant unstable RNAs initiated from weak core promoters are key characteristics of gene-distal developmental enhancers, while the housekeeping enhancer strengths of gene promoters reflect their expression strengths. The different layers of regulation mediated by gene-distal enhancers and gene promoters are also reflected in chromatin interaction data. Our results suggest a unified promoter architecture of many D. melanogaster regulatory elements, that is universal across Metazoa, whose regulatory functions seem to be related to their core promoter elements.


2017 ◽  
Author(s):  
Mahmoud M. Ibrahim ◽  
Aslihan Karabacak ◽  
Alexander Glahs ◽  
Ena Kolundzic ◽  
Antje Hirsekorn ◽  
...  

AbstractDivergent transcription from promoters and enhancers is pervasive in many species, but it remains unclear if it is a general and passive feature of all eukaryotic cis regulatory elements. To address this, we define promoters and enhancers in C. elegans, D. melanogaster and H. sapiens using ATAC-Seq and investigate the determinants of their transcription initiation directionalities by analyzing genome-wide nascent, cap-selected, polymerase run-on assays. All three species initiate divergent transcription from separate core promoter sequences. Sequence asymmetry downstream of forward and reverse initiation sites, known to be important for termination and stability in H. sapiens, is unique in each species. Chromatin states of divergent promoters are not entirely conserved, but in all three species, the levels of histone modifications on the +1 nucleosome are independent from those on the -1 nucleosome, arguing for independent initiation events. This is supported by an integrative model of H3K4me3 levels and core promoter sequence that is highly predictive of promoter directionality and of two types of promoters: those with balanced initiation directionality and those with skewed directionality. Lastly, D. melanogaster enhancers display variation in chromatin architecture depending on enhancer location, and D. melanogaster promoter regions with dual enhancer/promoter potential are enriched for divergent transcription. Our results point to a high degree of variation in regulatory element transcription initiation directionality within and between metazoans, and to non-passive regulatory mechanisms of transcription initiation directionality in those species.


2021 ◽  
Author(s):  
Jesus Victorino ◽  
Isabel Rollan ◽  
Raquel Rouco ◽  
Javier Adan ◽  
Miguel Manzanares

Cis-regulatory elements control gene expression in time and space and their disruption can lead to pathologies. Reporter assays allow the functional validation of enhancers and other regulatory elements, and such assays by means of the generation of transgenic mice provide a powerful tool to study gene regulation in development and disease. However, these experiments are time-consuming and, thus, their performance is very limited. Here, we increase the throughput of in vivo mouse reporter assays by using a piggyBac transposon-based system, and use it to decode the regulatory landscape of atrial fibrillation, a prevalent cardiac arrhythmia. We systematically interrogated ten human loci associated to atrial fibrillation in the search for regulatory elements. We found five new cardiac-specific enhancers and implicated novel genes in arrhythmia through genome editing and three-dimensional chromatin analysis by 4C-seq. Of note, functional dissection of the 7q31 locus identified a bivalent regulatory element in the second intron of the CAV1 gene differentially acting upon four genes. Our system also detected negative regulatory elements thanks to which we identified a ubiquitous silencer in the 16q22 locus that regulates ZFHX3 and can outcompete heart enhancers. Our study characterizes the function of new genetic elements that might be of relevance for the better understanding of gene regulation in cardiac arrhythmias. Thus, we have established a new framework for the efficient dissection of the genetic contribution to common human diseases.


Genetics ◽  
2002 ◽  
Vol 161 (2) ◽  
pp. 733-746
Author(s):  
Jeffrey W Southworth ◽  
James A Kennison

Abstract The Sex combs reduced (Scr) gene specifies the identities of the labial and first thoracic segments in Drosophila melanogaster. In imaginal cells, some Scr mutations allow cis-regulatory elements on one chromosome to stimulate expression of the promoter on the homolog, a phenomenon that was named transvection by Ed Lewis in 1954. Transvection at the Scr gene is blocked by rearrangements that disrupt pairing, but is zeste independent. Silencing of the Scr gene in the second and third thoracic segments, which requires the Polycomb group proteins, is disrupted by most chromosomal aberrations within the Scr gene. Some chromosomal aberrations completely derepress Scr even in the presence of normal levels of all Polycomb group proteins. On the basis of the pattern of chromosomal aberrations that disrupt Scr gene silencing, we propose a model in which two cis-regulatory elements interact to stabilize silencing of any promoter or cis-regulatory element physically between them. This model also explains the anomalous behavior of the Scx allele of the flanking homeotic gene, Antennapedia. This allele, which is associated with an insertion near the Antennapedia P1 promoter, inactivates the Antennapedia P1 and P2 promoters in cis and derepresses the Scr promoters both in cis and on the homologous chromosome.


1991 ◽  
Vol 11 (2) ◽  
pp. 641-654
Author(s):  
C Hinkley ◽  
M Perry

Xenopus oocytes, arrested in G2 before the first meiotic division, accumulate histone mRNA and protein in the absence of chromosomal DNA replication and therefore represent an attractive biological system in which to examine histone gene expression uncoupled from the cell cycle. Previous studies have shown that sequences necessary for maximal levels of transcription in oocytes are present within 200 bp at the 5' end of the transcription initiation site for genes encoding each of the five major Xenopus histone classes. We have defined by site-directed mutagenesis individual regulatory sequences and characterized DNA-binding proteins required for histone H2B gene transcription in injected oocytes. The Xenopus H2B gene has a relatively simple promoter containing several transcriptional regulatory elements, including TFIID, CBP, and ATF/CREB binding sites, required for maximal transcription. A sequence (CTTTACAT) in the H2B promoter resembling the conserved octamer motif (ATTTGCAT), the target for cell-cycle regulation of a human H2B gene, is not required for transcription in oocytes. Nonetheless, substitution of a consensus octamer motif for the variant octamer element activates H2B transcription. Oocyte factors, presumably including the ubiquitous Oct-1 factor, specifically bind to the consensus octamer motif but not to the variant sequence. Our results demonstrate that a transcriptional regulatory element involved in lymphoid-specific expression of immunoglobulin genes and in S-phase-specific activation of mammalian H2B histone genes can activate transcription in nondividing amphibian oocytes.


1988 ◽  
Vol 8 (7) ◽  
pp. 2896-2909 ◽  
Author(s):  
E A Sternberg ◽  
G Spizz ◽  
W M Perry ◽  
D Vizard ◽  
T Weil ◽  
...  

Terminal differentiation of skeletal myoblasts is accompanied by induction of a series of tissue-specific gene products, which includes the muscle isoenzyme of creatine kinase (MCK). To begin to define the sequences and signals involved in MCK regulation in developing muscle cells, the mouse MCK gene has been isolated. Sequence analysis of 4,147 bases of DNA surrounding the transcription initiation site revealed several interesting structural features, some of which are common to other muscle-specific genes and to cellular and viral enhancers. To test for sequences required for regulated expression, a region upstream of the MCK gene from -4800 to +1 base pairs, relative to the transcription initiation site, was linked to the coding sequences of the bacterial chloramphenicol acetyltransferase (CAT) gene. Introduction of this MCK-CAT fusion gene into C2 muscle cells resulted in high-level expression of CAT activity in differentiated myotubes and no detectable expression in proliferating undifferentiated myoblasts or in nonmyogenic cell lines. Deletion mutagenesis of sequences between -4800 and the transcription start site showed that the region between -1351 and -1050 was sufficient to confer cell type-specific and developmentally regulated expression on the MCK promoter. This upstream regulatory element functioned independently of position, orientation, or distance from the promoter and therefore exhibited the properties of a classical enhancer. This upstream enhancer also was able to confer muscle-specific regulation on the simian virus 40 promoter, although it exhibited a 3- to 5-fold preference for its own promoter. In contrast to the cell type- and differentiation-specific expression of the upstream enhancer, the MCK promoter was able to function in myoblasts and myotubes and in nonmyogenic cell lines when combined with the simian virus 40 enhancer. An additional positive regulatory element was identified within the first intron of the MCK gene. Like the upstream enhancer, this intragenic element functioned independently of position, orientation, and distance with respect to the MCK promoter and was active in differentiated myotubes but not in myoblasts. These results demonstrate that expression of the MCK gene in developing muscle cells is controlled by complex interactions among multiple upstream and intragenic regulatory elements that are functional only in the appropriate cellular context.


2003 ◽  
Vol 370 (3) ◽  
pp. 771-784 ◽  
Author(s):  
Cristina PÉREZ-GÓMEZ ◽  
José M. MATÉS ◽  
Pedro M. GÓMEZ-FABRE ◽  
Antonio del CASTILLO-OLIVARES ◽  
Francisco J. ALONSO ◽  
...  

In mammals, glutaminase (GA) is expressed in most tissues, but the regulation of organ-specific expression is largely unknown. Therefore, as an essential step towards studying the regulation of GA expression, the human liver-type GA (hLGA) gene has been characterized. LGA genomic sequences were isolated using the genome walking technique. Analysis and comparison of these sequences with two LGA cDNA clones and the Human Genome Project database, allowed the determination of the genomic organization of the LGA gene. The gene has 18 exons and is approx. 18kb long. All exon/intron junction sequences conform to the GT/AG rule. Progressive deletion analysis of LGA promoter—luciferase constructs indicated that the core promoter is located between nt −141 and +410, with several potential regulatory elements: CAAT, GC, TATA-like, Ras-responsive element binding protein and specificity protein 1 (Sp1) sites. The minimal promoter was mapped within +107 and +410, where only an Sp1 binding site is present. Mutation experiments suggested that two CAAT recognition elements near the transcription-initiation site (-138 and −87), play a crucial role for optimal promoter activity. Electrophoretic mobility-shift assays confirmed the importance of CAAT- and TATA-like boxes to enhance basal transcription, and demonstrated that HNF-1 motif is a significant distal element for transcriptional regulation of the hLGA gene.


2020 ◽  
Author(s):  
Nadezda A. Fursova ◽  
Anne H. Turberfield ◽  
Neil P. Blackledge ◽  
Emma L. Findlater ◽  
Anna Lastuvkova ◽  
...  

AbstractHistone-modifying systems play fundamental roles in gene regulation and the development of multicellular organisms. Histone modifications that are enriched at gene regulatory elements have been heavily studied, but the function of modifications that are found more broadly throughout the genome remains poorly understood. This is exemplified by histone H2A mono-ubiquitylation (H2AK119ub1) which is enriched at Polycomb-repressed gene promoters, but also covers the genome at lower levels. Here, using inducible genetic perturbations and quantitative genomics, we discover that the BAP1 deubiquitylase plays an essential role in constraining H2AK119ub1 throughout the genome. Removal of BAP1 leads to pervasive accumulation of H2AK119ub1, which causes widespread reductions in gene expression. We show that elevated H2AK119ub1 represses gene expression by counteracting transcription initiation from gene regulatory elements, causing reductions in transcription-associated histone modifications. Furthermore, failure to constrain pervasive H2AK119ub1 compromises Polycomb complex occupancy at a subset of Polycomb target genes leading to their derepression, therefore explaining the original genetic characterisation of BAP1 as a Polycomb group gene. Together, these observations reveal that the transcriptional potential of the genome can be modulated by regulating the levels of a pervasive histone modification, without the need for elaborate gene-specific targeting mechanisms.


1990 ◽  
Vol 10 (3) ◽  
pp. 930-938
Author(s):  
G L Semenza ◽  
R C Dureza ◽  
M D Traystman ◽  
J D Gearhart ◽  
S E Antonarakis

Erythropoietin (EPO) is the primary humoral regulator of mammalian erythropoiesis. The single-copy EPO gene is normally expressed in liver and kidney, and increased transcription is induced by anemia or cobalt chloride administration. To identify cis-acting DNA sequences responsible for regulated expression, transgenic mice were generated by microinjection of a 4-kilobase-pair (kb) (tgEPO4) or 10-kb (tgEPO10) cloned DNA fragment containing the human EPO gene, 0.7 kb of 3'-flanking sequence, and either 0.4 or 6 kb of 5'-flanking sequence, respectively. tgEPO4 mice expressed the transgene in liver, where expression was inducible by anemia or cobalt chloride, kidney, where expression was not inducible, and other tissues that do not normally express EPO. Human EPO RNA in tgEPO10 mice was detected only in liver of anemic or cobalt-treated mice. Both tgEPO4 and tgEPO10 mice were polycythemic, demonstrating that the human EPO RNA transcribed in liver is functional. These results suggest that (i) a liver inducibility element maps within 4 kb encompassing the gene, 0.4 kb of 5'-flanking sequence, and 0.7 kb of 3'-flanking sequence; (ii) a negative regulatory element is located between 0.4 and 6 kb 5' to the gene; and (iii) sequences required for inducible kidney expression are located greater than 6 kb 5' or 0.7 kb 3' to the gene. RNase protection analysis revealed that human EPO RNA in anemic transgenic mouse liver and hypoxic human hepatoma cells is initiated from several sites, only a subset of which is utilized in nonanemic transgenic liver and human fetal liver.


1988 ◽  
Vol 8 (7) ◽  
pp. 2896-2909 ◽  
Author(s):  
E A Sternberg ◽  
G Spizz ◽  
W M Perry ◽  
D Vizard ◽  
T Weil ◽  
...  

Terminal differentiation of skeletal myoblasts is accompanied by induction of a series of tissue-specific gene products, which includes the muscle isoenzyme of creatine kinase (MCK). To begin to define the sequences and signals involved in MCK regulation in developing muscle cells, the mouse MCK gene has been isolated. Sequence analysis of 4,147 bases of DNA surrounding the transcription initiation site revealed several interesting structural features, some of which are common to other muscle-specific genes and to cellular and viral enhancers. To test for sequences required for regulated expression, a region upstream of the MCK gene from -4800 to +1 base pairs, relative to the transcription initiation site, was linked to the coding sequences of the bacterial chloramphenicol acetyltransferase (CAT) gene. Introduction of this MCK-CAT fusion gene into C2 muscle cells resulted in high-level expression of CAT activity in differentiated myotubes and no detectable expression in proliferating undifferentiated myoblasts or in nonmyogenic cell lines. Deletion mutagenesis of sequences between -4800 and the transcription start site showed that the region between -1351 and -1050 was sufficient to confer cell type-specific and developmentally regulated expression on the MCK promoter. This upstream regulatory element functioned independently of position, orientation, or distance from the promoter and therefore exhibited the properties of a classical enhancer. This upstream enhancer also was able to confer muscle-specific regulation on the simian virus 40 promoter, although it exhibited a 3- to 5-fold preference for its own promoter. In contrast to the cell type- and differentiation-specific expression of the upstream enhancer, the MCK promoter was able to function in myoblasts and myotubes and in nonmyogenic cell lines when combined with the simian virus 40 enhancer. An additional positive regulatory element was identified within the first intron of the MCK gene. Like the upstream enhancer, this intragenic element functioned independently of position, orientation, and distance with respect to the MCK promoter and was active in differentiated myotubes but not in myoblasts. These results demonstrate that expression of the MCK gene in developing muscle cells is controlled by complex interactions among multiple upstream and intragenic regulatory elements that are functional only in the appropriate cellular context.


Sign in / Sign up

Export Citation Format

Share Document