scholarly journals ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation

2021 ◽  
Author(s):  
Ramzan Umarov ◽  
Yu Li ◽  
Takahiro Arakawa ◽  
Satoshi Takizawa ◽  
Xin Gao ◽  
...  

Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring "false positive" predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.

2021 ◽  
Vol 17 (9) ◽  
pp. e1009376
Author(s):  
Ramzan Umarov ◽  
Yu Li ◽  
Takahiro Arakawa ◽  
Satoshi Takizawa ◽  
Xin Gao ◽  
...  

Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring “false positive” predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.


2019 ◽  
Author(s):  
Struan C Murray ◽  
Philipp Lorenz ◽  
Françoise S Howe ◽  
Meredith Wouters ◽  
Thomas Brown ◽  
...  

AbstractH3K4me3 is a near-universal histone modification found predominantly at the 5’ region of genes, with a well-documented association with gene activity. H3K4me3 has been ascribed roles as both an instructor of gene expression and also a downstream consequence of expression, yet neither has been convincingly proven on a genome-wide scale. Here we test these relationships using a combination of bioinformatics, modelling and experimental data from budding yeast in which the levels of H3K4me3 have been massively ablated. We find that loss of H3K4me3 has no effect on the levels of nascent transcription or transcript in the population. Moreover, we observe no change in the rates of transcription initiation, elongation, mRNA export or turnover, or in protein levels, or cell-to-cell variation of mRNA. Loss of H3K4me3 also has no effect on the large changes in gene expression patterns that follow galactose induction. Conversely, loss of RNA polymerase from the nucleus has no effect on the pattern of H3K4me3 deposition and little effect on its levels, despite much larger changes to other chromatin features. Furthermore, large genome-wide changes in transcription, both in response to environmental stress and during metabolic cycling, are not accompanied by corresponding changes in H3K4me3. Thus, despite the correlation between H3K4me3 and gene activity, neither appear to be necessary to maintain levels of the other, nor to influence their changes in response to environmental stimuli. When we compare gene classes with very different levels of H3K4me3 but highly similar transcription levels we find that H3K4me3-marked genes are those whose expression is unresponsive to environmental changes, and that their histones are less acetylated and dynamically turned-over. Constitutive genes are generally well-expressed, which may alone explain the correlation between H3K4me3 and gene expression, while the biological role of H3K4me3 may have more to do with this distinction in gene class.


2013 ◽  
Vol 368 (1632) ◽  
pp. 20130022 ◽  
Author(s):  
Noboru Jo Sakabe ◽  
Marcelo A. Nobrega

The complex expression patterns observed for many genes are often regulated by distal transcription enhancers. Changes in the nucleotide sequences of enhancers may therefore lead to changes in gene expression, representing a central mechanism by which organisms evolve. With the development of the experimental technique of chromatin immunoprecipitation (ChIP), in which discrete regions of the genome bound by specific proteins can be identified, it is now possible to identify transcription factor binding events (putative cis -regulatory elements) in entire genomes. Comparing protein–DNA binding maps allows us, for the first time, to attempt to identify regulatory differences and infer global patterns of change in gene expression across species. Here, we review studies that used genome-wide ChIP to study the evolution of enhancers. The trend is one of high divergence of cis -regulatory elements between species, possibly compensated by extensive creation and loss of regulatory elements and rewiring of their target genes. We speculate on the meaning of the differences observed and discuss that although ChIP experiments identify the biochemical event of protein–DNA interaction, it cannot determine whether the event results in a biological function, and therefore more studies are required to establish the effect of divergence of binding events on species-specific gene expression.


2021 ◽  
Author(s):  
Jakub Jankowski ◽  
Hye Kyung Lee ◽  
Julia Wilflingseder ◽  
Lothar Hennighausen

SummaryRecently, a short, interferon-inducible isoform of Angiotensin-Converting Enzyme 2 (ACE2), dACE2 was identified. ACE2 is a SARS-Cov-2 receptor and changes in its renal expression have been linked to several human nephropathies. These changes were never analyzed in context of dACE2, as its expression was not investigated in the kidney. We used Human Primary Proximal Tubule (HPPT) cells to show genome-wide gene expression patterns after cytokine stimulation, with emphasis on the ACE2/dACE2 locus. Putative regulatory elements controlling dACE2 expression were identified using ChIP-seq and RNA-seq. qRT-PCR differentiating between ACE2 and dACE2 revealed 300- and 600-fold upregulation of dACE2 by IFNα and IFNβ, respectively, while full length ACE2 expression was almost unchanged. JAK inhibitor ruxolitinib ablated STAT1 and dACE2 expression after interferon treatment. Finally, with RNA-seq, we identified a set of genes, largely immune-related, induced by cytokine treatment. These gene expression profiles provide new insights into cytokine response of proximal tubule cells.


Blood ◽  
2013 ◽  
Vol 122 (21) ◽  
pp. SCI-10-SCI-10
Author(s):  
John Stamatoyannopoulos

Abstract Regulatory elements control the anatomic and cellular contexts, timing, and magnitude of gene expression patterns. Under the ENCODE and Roadmap Epigenomics Projects, human regulatory DNA has been mapped using a variety of approaches in over 300 cell and tissue types and developmental states. Collectively, the human genome encodes several million regulatory elements, most of which are located at some distance from promoters. The vast majority of these elements exhibit exquisite cell-and lineage-selective activation patterns, providing novel insights into the coordination of gene expression patterns. Genomic footprinting is a new and powerful technology that enables simultaneous profiling of the occupancy of hundreds of sequence-specific transcription factors within regulatory regions. These profiles in turn enable construction of transcription factor regulatory networks that are providing new insights into how cell-and lineage-specific gene expression programs arise. Hundreds of genetic variants associated with a wide range of hematological traits and disorders localize within regulatory regions. Many such variants disrupt specific transcription factor-DNA interactions, exposing pathophysiologically relevant transcriptional regulatory pathways. Disclosures: No relevant conflicts of interest to declare.


2021 ◽  
Vol 11 ◽  
Author(s):  
Emily Zboril ◽  
Hannah Yoo ◽  
Lizhen Chen ◽  
Zhijie Liu

While improved tumor treatment has significantly reduced the overall mortality rates, invasive progression including recurrence, therapy resistance and metastasis contributes to the majority of deaths caused by cancer. Enhancers are essential distal DNA regulatory elements that control temporal- or spatial-specific gene expression patterns during development and other biological processes. Genome-wide sequencing has revealed frequent alterations of enhancers in cancers and reprogramming of distal enhancers has emerged as one of the important features for tumors. In this review, we will discuss tumor progression-associated enhancer dynamics, its transcription factor (TF) drivers and how enhancer reprogramming modulates gene expression during cancer invasive progression. Additionally, we will explore recent advancements in contemporary technology including single-cell sequencing, spatial transcriptomics and CUT&RUN, which have permitted integrated studies of enhancer reprogramming in vivo. Given the essential roles of enhancer dynamics and its drivers in controlling cancer progression and treatment outcome, understanding these changes will be paramount in mitigating invasive events and discovering novel therapeutic targets.


Author(s):  
Yanrong Ji ◽  
Zhihan Zhou ◽  
Han Liu ◽  
Ramana V Davuluri

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruifeng Cui ◽  
Xiaoge Wang ◽  
Waqar Afzal Malik ◽  
Xuke Lu ◽  
Xiugui Chen ◽  
...  

Abstract Background The Raffinose synthetase (RAFS) genes superfamily is critical for the synthesis of raffinose, which accumulates in plant leaves under abiotic stress. However, it remains unclear whether RAFS contributes to resistance to abiotic stress in plants, specifically in the Gossypium species. Results In this study, we identified 74 RAFS genes from G. hirsutum, G. barbadense, G. arboreum and G. raimondii by using a series of bioinformatic methods. Phylogenetic analysis showed that the RAFS gene family in the four Gossypium species could be divided into four major clades; the relatively uniform distribution of the gene number in each species ranged from 12 to 25 based on species ploidy, most likely resulting from an ancient whole-genome polyploidization. Gene motif analysis showed that the RAFS gene structure was relatively conservative. Promoter analysis for cis-regulatory elements showed that some RAFS genes might be regulated by gibberellins and abscisic acid, which might influence their expression levels. Moreover, we further examined the functions of RAFS under cold, heat, salt and drought stress conditions, based on the expression profile and co-expression network of RAFS genes in Gossypium species. Transcriptome analysis suggested that RAFS genes in clade III are highly expressed in organs such as seed, root, cotyledon, ovule and fiber, and under abiotic stress in particular, indicating the involvement of genes belonging to clade III in resistance to abiotic stress. Gene co-expressed network analysis showed that GhRFS2A-GhRFS6A, GhRFS6D, GhRFS7D and GhRFS8A-GhRFS11A were key genes, with high expression levels under salt, drought, cold and heat stress. Conclusion The findings may provide insights into the evolutionary relationships and expression patterns of RAFS genes in Gossypium species and a theoretical basis for the identification of stress resistance materials in cotton.


2021 ◽  
pp. 002203452110120
Author(s):  
C. Gluck ◽  
S. Min ◽  
A. Oyelakin ◽  
M. Che ◽  
E. Horeth ◽  
...  

The parotid, submandibular, and sublingual glands represent a trio of oral secretory glands whose primary function is to produce saliva, facilitate digestion of food, provide protection against microbes, and maintain oral health. While recent studies have begun to shed light on the global gene expression patterns and profiles of salivary glands, particularly those of mice, relatively little is known about the location and identity of transcriptional control elements. Here we have established the epigenomic landscape of the mouse submandibular salivary gland (SMG) by performing chromatin immunoprecipitation sequencing experiments for 4 key histone marks. Our analysis of the comprehensive SMG data sets and comparisons with those from other adult organs have identified critical enhancers and super-enhancers of the mouse SMG. By further integrating these findings with complementary RNA-sequencing based gene expression data, we have unearthed a number of molecular regulators such as members of the Fox family of transcription factors that are enriched and likely to be functionally relevant for SMG biology. Overall, our studies provide a powerful atlas of cis-regulatory elements that can be leveraged for better understanding the transcriptional control mechanisms of the mouse SMG, discovery of novel genetic switches, and modulating tissue-specific gene expression in a targeted fashion.


Sign in / Sign up

Export Citation Format

Share Document