scholarly journals Decoding functional regulatory maps via genomic evolutionary footprints in 63 green plants

2018 ◽  
Author(s):  
Feng Tian ◽  
De-Chang Yang ◽  
Yu-Qi Meng ◽  
Jinpu Jin ◽  
Ge Gao

AbstractSystematic identification of functional transcriptional regulatory interactions is essential for understanding regulatory systems. Here, we firstly established genome-wide conservation landscapes for 63 green plants of seven lineages and then developed an algorithm FunTFBS to screen for functional regulatory elements and interactions by coupling base-varied binding affinities of transcription factors with the evolutionary footprints on their binding sites. Using the FunTFBS and the conservation landscapes, we further identified over two million functional interactions for 21,346 TFs, charting functional regulatory maps of these 63 plants. Our work provides plant community with valuable resources to decode plant transcriptional regulatory system and genome sequences.

Author(s):  
Feng Tian ◽  
De-Chang Yang ◽  
Yu-Qi Meng ◽  
Jinpu Jin ◽  
Ge Gao

Abstract With the goal of charting plant transcriptional regulatory maps (i.e. transcription factors (TFs), cis-elements and interactions between them), we have upgraded the TF-centred database PlantTFDB (http://planttfdb.cbi.pku.edu.cn/) to a plant regulatory data and analysis platform PlantRegMap (http://plantregmap.cbi.pku.edu.cn/) over the past three years. In this version, we updated the annotations for the previously collected TFs and set up a new section, ‘extended TF repertoires’ (TFext), to allow users prompt access to the TF repertoires of newly sequenced species. In addition to our regular TF updates, we are dedicated to updating the data on cis-elements and functional interactions between TFs and cis-elements. We established genome-wide conservation landscapes for 63 representative plants and then developed an algorithm, FunTFBS, to screen for functional regulatory elements and interactions by coupling the base-varied binding affinities of TFs with the evolutionary footprints on their binding sites. Using the FunTFBS algorithm and the conservation landscapes, we further identified over 20 million functional TF binding sites (TFBSs) and two million functional interactions for 21 346 TFs, charting the functional regulatory maps of these 63 plants. These resources are publicly available at PlantRegMap (http://plantregmap.cbi.pku.edu.cn/) and a cloud-based mirror (http://plantregmap.gao-lab.org/), providing the plant research community with valuable resources for decoding plant transcriptional regulatory systems.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Sebastian Carrasco Pro ◽  
Katia Bulekova ◽  
Brian Gregor ◽  
Adam Labadorf ◽  
Juan Ignacio Fuxman Bass

Abstract Single nucleotide variants (SNVs) located in transcriptional regulatory regions can result in gene expression changes that lead to adaptive or detrimental phenotypic outcomes. Here, we predict gain or loss of binding sites for 741 transcription factors (TFs) across the human genome. We calculated ‘gainability’ and ‘disruptability’ scores for each TF that represent the likelihood of binding sites being created or disrupted, respectively. We found that functional cis-eQTL SNVs are more likely to alter TF binding sites than rare SNVs in the human population. In addition, we show that cancer somatic mutations have different effects on TF binding sites from different TF families on a cancer-type basis. Finally, we discuss the relationship between these results and cancer mutational signatures. Altogether, we provide a blueprint to study the impact of SNVs derived from genetic variation or disease association on TF binding to gene regulatory regions.


2020 ◽  
Author(s):  
◽  
Alwyn Clark Go

Speciation occurs when reproductive barriers prevent the exchange of genetic information between individuals. A common form of reproductive barrier between species capable of interbreeding is hybrid sterility. Genomic incompatibilities between the divergent genomes of different species contribute to a reduction in hybrid fitness. These incompatibilities continue to accumulate after speciation, therefore, young divergent taxa with incomplete reproductive isolation are important in understating the genetics leading to speciation. Here, I use two Drosophila subspecies pairs. The first is D. willistoni consisting of D. w. willistoni and D. w. winge. The second subspecies pair is D. pseudoobscura, which is composed of D. p. pseudoobscura and D. p. bogotana. Both subspecies pairs are at the early stages of speciation and show incomplete reproductive isolation through unidirectional hybrid male sterility. In this thesis, I performed an exploratory survey of genome-wide expression analysis using RNA-sequencing on D. willistoni and determined the extent of regulatory divergence between the subspecies using allele-specific expression analysis. I found that misexpressed genes showed a degree of tissue specificity and that the sterile male hybrids had a higher proportion of misexpressed genes in the testes relative to the fertile hybrids. The analysis of regulatory divergence between this subspecies pair found a large (66-70%) proportion of genes with conserved regulatory elements. Of the genes showing evidence or regulatory divergence between subspecies, cis-regulatory divergence was more common than other types. In the D. pseudoobscura subspecies pair, I compared sequence and expression divergence and found no support for directional selection driving gene misexpression in their hybrids. Allele-specific expression analysis revealed that compensatory cis-trans mutations partly explained gene misexpression in the hybrids. The remaining hybrid misexpression occurs due to interacting gene networks or possible co-option of cis-regulatory elements by divergent transacting factors. Overall, the results of this thesis highlight the role of regulatory interactions in a hybrid genome and how these interactions could lead to hybrid breakdown by disrupting gene interaction networks.


Development ◽  
1993 ◽  
Vol 117 (2) ◽  
pp. 585-596 ◽  
Author(s):  
J.A. Langeland ◽  
S.B. Carroll

The hairy (h) gene is one of two pair-rule loci whose striped expression is directly regulated by combinations of gap proteins acting through discrete upstream regulatory fragments, which span several kilobases. We have undertaken a comparative study of the molecular biology of h pair-rule expression in order to identify conserved elements in this complex regulatory system, which should provide important clues concerning the mechanism of stripe formation. A molecular comparison of the h locus in Drosophila virilis and Drosophila melanogaster reveals a conserved overall arrangement of the upstream regulatory elements that control individual pair-rule stripes. We demonstrate that upstream fragments from D. virilis will direct the proper expression of stripes in D. melanogaster, indicating that these are true functional homologs of the stripe-producing D. melanogaster regulatory elements, and that the network of trans-acting proteins that act upon these regulatory elements is highly conserved. We also demonstrate that the spatial relationships between specific h stripes and selected gap proteins are highly conserved. We find several tracts of extensive nucleotide sequence conservation within homologous stripe-specific regulatory fragments, which have facilitated the identification of functional subelements within the D. melanogaster regulatory fragment for h stripe 5. Some of the conserved nucleotide tracts within this regulatory fragment contain consensus binding sites for potential trans-regulatory (gap and other) proteins, while many appear devoid of known binding sites. This comparative approach, coupled with the analysis of reporter gene expression in gap mutant embryos suggests that the Kr and gt proteins establish the anterior and posterior borders of h stripe 5, respectively, through spatial repression. Other, as yet unidentified, proteins are certain to play a role in stripe activation, presumably acting through other conserved sequence tracts.


Genes ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 209 ◽  
Author(s):  
Elizaveta Radion ◽  
Olesya Sokolova ◽  
Sergei Ryazansky ◽  
Pavel Komarov ◽  
Yuri Abramov ◽  
...  

Piwi-interacting RNAs (piRNAs) control transposable element (TE) activity in the germline. piRNAs are produced from single-stranded precursors transcribed from distinct genomic loci, enriched by TE fragments and termed piRNA clusters. The specific chromatin organization and transcriptional regulation of Drosophila germline-specific piRNA clusters ensure transcription and processing of piRNA precursors. TEs harbour various regulatory elements that could affect piRNA cluster integrity. One of such elements is the suppressor-of-hairy-wing (Su(Hw))-mediated insulator, which is harboured in the retrotransposon gypsy. To understand how insulators contribute to piRNA cluster activity, we studied the effects of transgenes containing gypsy insulators on local organization of endogenous piRNA clusters. We show that transgene insertions interfere with piRNA precursor transcription, small RNA production and the formation of piRNA cluster-specific chromatin, a hallmark of which is Rhino, the germline homolog of the heterochromatin protein 1 (HP1). The mutations of Su(Hw) restored the integrity of piRNA clusters in transgenic strains. Surprisingly, Su(Hw) depletion enhanced the production of piRNAs by the domesticated telomeric retrotransposon TART, indicating that Su(Hw)-dependent elements protect TART transcripts from piRNA processing machinery in telomeres. A genome-wide analysis revealed that Su(Hw)-binding sites are depleted in endogenous germline piRNA clusters, suggesting that their functional integrity is under strict evolutionary constraints.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (8) ◽  
pp. e1009689
Author(s):  
Savannah D. Savadel ◽  
Thomas Hartwig ◽  
Zachary M. Turpin ◽  
Daniel L. Vera ◽  
Pei-Yau Lung ◽  
...  

Elucidating the transcriptional regulatory networks that underlie growth and development requires robust ways to define the complete set of transcription factor (TF) binding sites. Although TF-binding sites are known to be generally located within accessible chromatin regions (ACRs), pinpointing these DNA regulatory elements globally remains challenging. Current approaches primarily identify binding sites for a single TF (e.g. ChIP-seq), or globally detect ACRs but lack the resolution to consistently define TF-binding sites (e.g. DNAse-seq, ATAC-seq). To address this challenge, we developed MNase-defined cistrome-Occupancy Analysis (MOA-seq), a high-resolution (< 30 bp), high-throughput, and genome-wide strategy to globally identify putative TF-binding sites within ACRs. We used MOA-seq on developing maize ears as a proof of concept, able to define a cistrome of 145,000 MOA footprints (MFs). While a substantial majority (76%) of the known ATAC-seq ACRs intersected with the MFs, only a minority of MFs overlapped with the ATAC peaks, indicating that the majority of MFs were novel and not detected by ATAC-seq. MFs were associated with promoters and significantly enriched for TF-binding and long-range chromatin interaction sites, including for the well-characterized FASCIATED EAR4, KNOTTED1, and TEOSINTE BRANCHED1. Importantly, the MOA-seq strategy improved the spatial resolution of TF-binding prediction and allowed us to identify 215 motif families collectively distributed over more than 100,000 non-overlapping, putatively-occupied binding sites across the genome. Our study presents a simple, efficient, and high-resolution approach to identify putative TF footprints and binding motifs genome-wide, to ultimately define a native cistrome atlas.


2018 ◽  
Author(s):  
Xinchen Wang ◽  
David B. Goldstein

AbstractNon-coding transcriptional regulatory elements are critical for controlling the spatiotemporal expression of genes. Here, we demonstrate that the number of bases in enhancers linked to a gene reflects its disease pathogenicity. Moreover, genes with redundant enhancer domains are depleted of cis-acting genetic variants that disrupt gene expression, and are buffered against the effects of disruptive non-coding mutations. Our results demonstrate that dosage-sensitive genes have evolved robustness to the disruptive effects of genetic variation by expanding their regulatory domains. This resolves a puzzle in the genetic literature about why disease genes are depleted of cis-eQTLs, suggesting that eQTL information may implicate the wrong genes at genome-wide association study loci, and establishes a framework for identifying non-coding regulatory variation with phenotypic consequences.


2013 ◽  
Vol 42 (5) ◽  
pp. 2833-2847 ◽  
Author(s):  
Peng Jiang ◽  
Mona Singh

Abstract Combinatorial interplay among transcription factors (TFs) is an important mechanism by which transcriptional regulatory specificity is achieved. However, despite the increasing number of TFs for which either binding specificities or genome-wide occupancy data are known, knowledge about cooperativity between TFs remains limited. To address this, we developed a computational framework for predicting genome-wide co-binding between TFs (CCAT, Combinatorial Code Analysis Tool), and applied it to Drosophila melanogaster to uncover cooperativity among TFs during embryo development. Using publicly available TF binding specificity data and DNaseI chromatin accessibility data, we first predicted genome-wide binding sites for 324 TFs across five stages of D. melanogaster embryo development. We then applied CCAT in each of these developmental stages, and identified from 19 to 58 pairs of TFs in each stage whose predicted binding sites are significantly co-localized. We found that nearby binding sites for pairs of TFs predicted to cooperate were enriched in regions bound in relevant ChIP experiments, and were more evolutionarily conserved than other pairs. Further, we found that TFs tend to be co-localized with other TFs in a dynamic manner across developmental stages. All generated data as well as source code for our front-to-end pipeline are available at http://cat.princeton.edu.


Genetics ◽  
2002 ◽  
Vol 161 (2) ◽  
pp. 793-801
Author(s):  
Wilailak Pooma ◽  
Christos Gersos ◽  
Erich Grotewold

Abstract The understanding of control of gene regulation in higher eukaryotes relies heavily on results derived from non-in vivo studies, but rarely can the significance of these approximations be established in vivo. Here, we investigated the effect of Mutator and Spm insertions on the expression of the flavonoid biosynthetic gene a1, independently regulated by the transcription factors C1 and P. The a1-mum2 and a1-m2 alleles carry Mu1 and Spm insertions, respectively, in a cis-element (ARE) of unknown function located between the P- and C1-binding sites. We show that the insertions of Mu1 and Spm similarly influence the expression of a1 controlled by C1 or P. The P-controlled a1 expression in a1-m2 is Spm dependent, and the mutant phenotype of a1-mum2 is suppressed in the pericarp in the absence of the autonomous MuDR element. Footprints within the ARE affect the regulation of a1 by C1 and P differently, providing evidence that these factors control a1 expression using distinct cis-acting regulatory elements. Together, our findings contribute significantly to one of the best-described plant regulatory systems, while stressing the need to complement with in vivo experiments current approaches used for the study of control of gene expression.


2020 ◽  
Vol 118 (2) ◽  
pp. e2021171118
Author(s):  
Gi Bae Kim ◽  
Ye Gao ◽  
Bernhard O. Palsson ◽  
Sang Yup Lee

A transcription factor (TF) is a sequence-specific DNA-binding protein that modulates the transcription of a set of particular genes, and thus regulates gene expression in the cell. TFs have commonly been predicted by analyzing sequence homology with the DNA-binding domains of TFs already characterized. Thus, TFs that do not show homologies with the reported ones are difficult to predict. Here we report the development of a deep learning-based tool, DeepTFactor, that predicts whether a protein in question is a TF. DeepTFactor uses a convolutional neural network to extract features of a protein. It showed high performance in predicting TFs of both eukaryotic and prokaryotic origins, resulting in F1 scores of 0.8154 and 0.8000, respectively. Analysis of the gradients of prediction score with respect to input suggested that DeepTFactor detects DNA-binding domains and other latent features for TF prediction. DeepTFactor predicted 332 candidate TFs in Escherichia coli K-12 MG1655. Among them, 84 candidate TFs belong to the y-ome, which is a collection of genes that lack experimental evidence of function. We experimentally validated the results of DeepTFactor prediction by further characterizing genome-wide binding sites of three predicted TFs, YqhC, YiaU, and YahB. Furthermore, we made available the list of 4,674,808 TFs predicted from 73,873,012 protein sequences in 48,346 genomes. DeepTFactor will serve as a useful tool for predicting TFs, which is necessary for understanding the regulatory systems of organisms of interest. We provide DeepTFactor as a stand-alone program, available at https://bitbucket.org/kaistsystemsbiology/deeptfactor.


Sign in / Sign up

Export Citation Format

Share Document