scholarly journals Multi-landmark alignment of genomic signals reveals conserved expression patterns across transcription start sites

2021 ◽  
Author(s):  
Jose M. G. Vilar ◽  
Leonor Saiz

The prevalent one-dimensional alignment of genomic signals to a reference landmark is a cornerstone of current methods to study transcription and its DNA-dependent processes but it is prone to mask potential relations among multiple DNA elements. We developed a systematic approach to align genomic signals to multiple locations simultaneously by expanding the dimensionality of the genomic-coordinate space. We analyzed transcription in human and uncovered a complex dependence on the relative position of neighboring transcription start sites (TSSs) that is consistently conserved among cell types. The dependence ranges from enhancement to suppression of transcription depending on the relative distances to the TSSs, their intragenic position, and the transcriptional activity of the gene. Our results reveal a conserved hierarchy of alternative TSS usage within a previously unrecognized level of genomic organization and provide a general methodology to analyze complex functional relationships among multiple types of DNA elements.

2020 ◽  
Vol 295 (12) ◽  
pp. 3990-4000 ◽  
Author(s):  
Sandeep Singh ◽  
Karol Szlachta ◽  
Arkadi Manukyan ◽  
Heather M. Raimer ◽  
Manikarna Dinda ◽  
...  

DNA double-stranded breaks (DSBs) are strongly associated with active transcription, and promoter-proximal pausing of RNA polymerase II (Pol II) is a critical step in transcriptional regulation. Mapping the distribution of DSBs along actively expressed genes and identifying the location of DSBs relative to pausing sites can provide mechanistic insights into transcriptional regulation. Using genome-wide DNA break mapping/sequencing techniques at single-nucleotide resolution in human cells, we found that DSBs are preferentially located around transcription start sites of highly transcribed and paused genes and that Pol II promoter-proximal pausing sites are enriched in DSBs. We observed that DSB frequency at pausing sites increases as the strength of pausing increases, regardless of whether the pausing sites are near or far from annotated transcription start sites. Inhibition of topoisomerase I and II by camptothecin and etoposide treatment, respectively, increased DSBs at the pausing sites as the concentrations of drugs increased, demonstrating the involvement of topoisomerases in DSB generation at the pausing sites. DNA breaks generated by topoisomerases are short-lived because of the religation activity of these enzymes, which these drugs inhibit; therefore, the observation of increased DSBs with increasing drug doses at pausing sites indicated active recruitment of topoisomerases to these sites. Furthermore, the enrichment and locations of DSBs at pausing sites were shared among different cell types, suggesting that Pol II promoter-proximal pausing is a common regulatory mechanism. Our findings support a model in which topoisomerases participate in Pol II promoter-proximal pausing and indicated that DSBs at pausing sites contribute to transcriptional activation.


2021 ◽  
Author(s):  
Jill E Moore ◽  
Xiao-Ou Zhang ◽  
Shaimae I Elhajjajy ◽  
Kaili Fan ◽  
Fairlie Reese ◽  
...  

Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks were primarily proximal to GENCODE-annotated TSSs and were concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3' ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations were supported by epigenomic and other transcriptomic datasets. To demonstrate the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI GWAS catalog and identified new candidate GWAS genes. Overall, our work demonstrates the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.


2020 ◽  
Vol 319 (2) ◽  
pp. G189-G196
Author(s):  
Michael P. Verzi ◽  
Ramesh A. Shivdasani

To fulfill the lifelong need to supply diverse epithelial cells, intestinal stem cells (ISCs) rely on executing accurate transcriptional programs. This review addresses the mechanisms that control those programs. Genes that define cell behaviors and identities are regulated principally through thousands of dispersed enhancers, each individually <1 kb long and positioned from a few to hundreds of kilobases away from transcription start sites, upstream or downstream from coding genes or within introns. Wnt, Notch, and other epithelial control signals feed into these cis-regulatory DNA elements, which are also common loci of polymorphisms and mutations that confer disease risk. Cell-specific gene activity requires promoters to interact with the correct combination of signal-responsive enhancers. We review the current state of knowledge in ISCs regarding active enhancers, the nucleosome modifications that may enable appropriate and hinder inappropriate enhancer-promoter contacts, and the roles of lineage-restricted transcription factors.


2014 ◽  
Vol 934 ◽  
pp. 182-187
Author(s):  
Qiu Fu Shan ◽  
Ji Hua Feng ◽  
Ying Lu ◽  
Zen Hui Shan ◽  
Pan Feng Chen

Some significant differences about nucleosome positioning of different expression patterns gene have been found while researching the nucleosome positioning of Drosophila embryogenesis. The difference from the previous study was the restricted expression pattern gene incorporating H2A.Z into the-1 nucleosome in the upstream of Transcription Start Sites (TSS). Interestingly, compared with the nucleosome positioning of yeast genes, this nucleosome arrangement at gene of restricted expression pattern is similar with the characteristic found in yeast.


2021 ◽  
Author(s):  
Juexiao Zhou ◽  
bin zhang ◽  
Haoyang Li ◽  
Longxi Zhou ◽  
Zhongxiao Li ◽  
...  

Abstract The accurate annotation of transcription start sites (TSSs) and their usage is critical for the mechanistic understanding of gene regulation under different biological contexts. To fulfil this, on one hand, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner. On the other hand, various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences. Most of these computational tools cast the problem as a binary classification task on a balanced dataset and thus result in drastic false positive predictions when applied on the genome-scale. To address these issues, we present DeeReCT-TSS, a deep-learning-based method that is capable of TSSs identification across the whole genome based on both DNA sequences and conventional RNA-seq data. We show that by effectively incorporating these two sources of information, DeeReCT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types. Furthermore, we develop a meta-learning-based extension for simultaneous transcription start site (TSS) annotation on 10 cell types, which enables the identification of cell-type-specific TSS. Finally, we demonstrate the high precision of DeeReCT-TSS on two independent datasets from the ENCODE project by correlating our predicted TSSs with experimentally defined TSS chromatin states. Our application, pre-trained models and data are available at https://github.com/JoshuaChou2018/DeeReCT-TSS_release.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Benpeng Miao ◽  
Shuhua Fu ◽  
Cheng Lyu ◽  
Paul Gontarz ◽  
Ting Wang ◽  
...  

Abstract Background Transposable elements (TEs) are a significant component of eukaryotic genomes and play essential roles in genome evolution. Mounting evidence indicates that TEs are highly transcribed in early embryo development and contribute to distinct biological functions and tissue morphology. Results We examine the epigenetic dynamics of mouse TEs during the development of five tissues: intestine, liver, lung, stomach, and kidney. We found that TEs are associated with over 20% of open chromatin regions during development. Close to half of these accessible TEs are only activated in a single tissue and a specific developmental stage. Most accessible TEs are rodent-specific. Across these five tissues, 453 accessible TEs are found to create the transcription start sites of downstream genes in mouse, including 117 protein-coding genes and 144 lincRNA genes, 93.7% of which are mouse-specific. Species-specific TE-derived transcription start sites are found to drive the expression of tissue-specific genes and change their tissue-specific expression patterns during evolution. Conclusion Our results suggest that TE insertions increase the regulatory potential of the genome, and some TEs have been domesticated to become a crucial component of gene and regulate tissue-specific expression during mouse tissue development.


Genetics ◽  
2021 ◽  
Author(s):  
Lily Li ◽  
Rachel Waymack ◽  
Mario Gad ◽  
Zeba Wunderlich

Abstract Proper development depends on precise spatiotemporal gene expression patterns. Most developmental genes are regulated by multiple enhancers and often by multiple core promoters that generate similar transcripts. We hypothesize that multiple promoters may be required either because enhancers prefer a specific promoter or because multiple promoters serve as a redundancy mechanism. To test these hypotheses, we studied the expression of the knirps locus in the early Drosophila melanogaster embryo, which is mediated by multiple enhancers and core promoters. We found that one of these promoters resembles a typical “sharp” developmental promoter, while the other resembles a “broad” promoter usually associated with housekeeping genes. Using synthetic reporter constructs, we found that some, but not all, enhancers in the locus show a preference for one promoter, indicating that promoters provide both redundancy and specificity. By analyzing the reporter dynamics, we identified specific burst properties during the transcription process, namely burst size and frequency, that are most strongly tuned by the combination of promoter and enhancer. Using locus-sized reporters, we discovered that enhancers with no promoter preference in a synthetic setting have a preference in the locus context. Our results suggest that the presence of multiple promoters in a locus is due both to enhancer preference and a need for redundancy and that “broad” promoters with dispersed transcription start sites are common among developmental genes. They also imply that it can be difficult to extrapolate expression measurements from synthetic reporters to the locus context, where other variables shape a gene’s overall expression pattern.


2020 ◽  
Author(s):  
Sivan Leviyang

AbstractStimulation of cells by type I interferons (IFN) leads to the differential expression of 100s of genes known as interferon stimulated genes, ISGs. The collection of ISGs differentially expressed under IFN stimulation, referred to as the IFN signature, varies across cell types. Non-canonical IFN signaling has been clearly associated with variation in IFN signature across cell types, but the existence of variation in canonical signaling and its impact on IFN signatures is less clear. The canonical IFN signaling pathway involves binding of the transcription factor ISGF3 to IFN-stimulated response elements, ISREs. We examined ISRE binding patterns under IFN stimulation across six cell types using existing ChIPseq datasets available on the GEO and ENCODE databases. We find that ISRE binding is cell specific, particularly for ISREs distal to transcription start sites, potentially associated with enhancer elements, while ISRE binding in promoter regions is more conserved. Given variation of ISRE binding across cell types, we investigated associations between the cell type, homeostatic state and ISRE binding patterns. Taking a machine learning approach and using existing ATACseq and ChIPseq datasets available on GEO and ENCODE, we show that the epigenetic state of an ISRE locus at homeostasis and the DNA sequence of the ISRE locus are predictive of the ISRE’s binding under IFN stimulation in a cell type, specific manner, particularly for ISRE distal to transcription start sites.


2017 ◽  
Author(s):  
Alejandro Reyes ◽  
Wolfgang Huber

Most human genes have multiple transcription start and polyadenylation sites, as well as alternatively spliced exons. Although such transcript isoform diversity contributes to the differentiation between cell types, the importance of contributions from the different isoform generating processes is unclear. To address this question, we used 798 samples from the Genotype-Tissue Expression (GTEx) to investigate cell type dependent differences in exon usage of over 18,000 protein-coding genes in 23 cell types. We found tissue-dependent isoform usage in about half of expressed genes. Overall, tissue-dependent splicing accounted only for a minority of tissue-dependent exon usage, most of which was consistent with alternative transcription start and termination sites. We verified this result on a second, independent dataset, Cap Analysis of Gene Expression (CAGE) data from the FANTOM consortium, which confirmed widespread tissue-dependent usage of alternative transcription start sites. Our analysis identifies transcription start and termination sites as the principal drivers of isoform diversity across tissues. Moreover, our results indicate that most tissue-dependent splicing involves untranslated exons and therefore may not have consequences at the proteome level.


2021 ◽  
Author(s):  
Lily Li ◽  
Rachel Waymack ◽  
Mario Elabd ◽  
Zeba Wunderlich

Proper development depends on precise spatiotemporal gene expression patterns. Most genes are regulated by multiple enhancers and often by multiple core promoters that generate similar transcripts. We hypothesize that these multiple promoters may be required either because enhancers prefer a specific promoter or because multiple promoters serve as a redundancy mechanism. To test these hypotheses, we studied the expression of the knirps locus in the early Drosophila melanogaster embryo, which is mediated by multiple enhancers and core promoters. We found that one of these promoters resembles a typical sharp developmental promoter, while the other resembles a broad promoter usually associated with housekeeping genes. Using synthetic reporter constructs, we found that some, but not all, enhancers in the locus show a preference for one promoter. By analyzing the dynamics of these reporters, we identified specific burst properties during the transcription process, namely burst size and frequency, that are most strongly tuned by the specific combination of promoter and enhancer. Using locus-sized reporters, we discovered that even enhancers that show no promoter preference in a synthetic setting have a preference in the locus context. Our results suggest that the presence of multiple promoters in a locus is both due to enhancer preference and a need for redundancy and that broad promoters with dispersed transcription start sites are common among developmental genes. Our results also imply that it can be difficult to extrapolate expression measurements from synthetic reporters to the locus context, where many variables shape a genes overall expression pattern.


Sign in / Sign up

Export Citation Format

Share Document