scholarly journals Kozak sequence acts as a negative regulator for de novo transcription initiation of newborn coding sequences in the plant genome

Author(s):  
Takayuki Hata ◽  
Soichirou Satoh ◽  
Naoto Takada ◽  
Mitsuhiro Matsuo ◽  
Junichi Obokata

Abstract The manner in which newborn coding sequences and their transcriptional competency emerge during the process of gene evolution remains unclear. Here, we experimentally simulated eukaryotic gene origination processes by mimicking horizontal gene transfer events in the plant genome. We mapped the precise position of the transcription start sites (TSSs) of hundreds of newly introduced promoterless firefly luciferase (LUC) coding sequences in the genome of Arabidopsis thaliana cultured cells. The systematic characterization of the LUC-TSSs revealed that 80% of them occurred under the influence of endogenous promoters, while the remainder underwent de novo activation in the intergenic regions, starting from pyrimidine-purine dinucleotides. These de novo TSSs obeyed unexpected rules; they predominantly occurred ∼100 bp upstream of the LUC inserts and did not overlap with Kozak-containing putative open reading frames (ORFs). These features were the output of the immediate responses to the sequence insertions, rather than a bias in the screening of the LUC gene function. Regarding the wild-type genic TSSs, they appeared to have evolved to lack any ORFs in their vicinities. Therefore, the repulsion by the de novo TSSs of Kozak-containing ORFs described above might be the first selection gate for the occurrence and evolution of TSSs in the plant genome. Based on these results, we characterized the de novo type of TSS identified in the plant genome and discuss its significance in genome evolution.

Author(s):  
Takayuki Hata ◽  
Soichirou Satoh ◽  
Naoto Takada ◽  
Mitsuhiro Matsuo ◽  
Junichi Obokata

ABSTRACTThe manner in which newborn coding sequences and their transcriptional competency emerge during the process of gene evolution remains unclear. Here, we experimentally simulated eukaryotic gene origination processes by mimicking horizontal gene transfer events in the plant genome. We mapped the precise position of the transcription start sites (TSSs) of hundreds of newly introduced promoterless firefly luciferase (LUC) coding sequences in the genome of Arabidopsis thaliana cultured cells. The systematic characterization of the LUC-TSSs revealed that 80% of them occurred under the influence of endogenous promoters, while the remainder underwent de novo activation in the intergenic regions, starting from pyrimidine-purine dinucleotides. These de novo TSSs obeyed unexpected rules; they predominantly occurred ~100 bp upstream of the LUC inserts and did not overlap with Kozak-containing putative open reading frames (ORFs). These features were the output of the immediate responses to the sequence insertions, rather than a bias in the screening of the LUC gene function. Regarding the wild-type genic TSSs, they appeared to have evolved to lack any ORFs in their vicinities. Therefore, the repulsion by the de novo TSSs of Kozak-containing ORFs described above might be the first selection gate for the occurrence and evolution of TSSs in the plant genome. Based on these results, we characterized the de novo type of TSS identified in the plant genome and discuss its significance in genome evolution.


PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0252674
Author(s):  
Takayuki Hata ◽  
Naoto Takada ◽  
Chihiro Hayakawa ◽  
Mei Kazama ◽  
Tomohiro Uchikoba ◽  
...  

The manner in which inserted foreign coding sequences become transcriptionally activated and fixed in the plant genome is poorly understood. To examine such processes of gene evolution, we performed an artificial evolutionary experiment in Arabidopsis thaliana. As a model of gene-birth events, we introduced a promoterless coding sequence of the firefly luciferase (LUC) gene and established 386 T2-generation transgenic lines. Among them, we determined the individual LUC insertion loci in 76 lines and found that one-third of them were transcribed de novo even in the intergenic or inherently unexpressed regions. In the transcribed lines, transcription-related chromatin marks were detected across the newly activated transcribed regions. These results agreed with our previous findings in A. thaliana cultured cells under a similar experimental scheme. A comparison of the results of the T2-plant and cultured cell experiments revealed that the de novo-activated transcription concomitant with local chromatin remodelling was inheritable. During one-generation inheritance, it seems likely that the transcription activities of the LUC inserts trapped by the endogenous genes/transcripts became stronger, while those of de novo transcription in the intergenic/untranscribed regions became weaker. These findings may offer a clue for the elucidation of the mechanism by which inserted foreign coding sequences become transcriptionally activated and fixed in the plant genome.


Author(s):  
Takayuki Hata ◽  
Naoto Takada ◽  
Chihiro Hayakawa ◽  
Mei Kazama ◽  
Tomohiro Uchikoba ◽  
...  

ABSTRACTThe manner in which newborn genes become transcriptionally activated and fixed in the plant genome is poorly understood. To examine such processes of gene evolution, we performed an artificial evolutionary experiment in Arabidopsis thaliana. As a model of gene-birth events, we introduced a promoterless coding sequence of the firefly luciferase (LUC) gene and established 386 T2-generation transgenic lines. Among them, we determined the individual LUC insertion loci in 76 lines and found that one-third of them were transcribed de novo even in the intergenic or inherently unexpressed regions. In the transcribed lines, transcription-related epigenetic marks were detected across the newly activated transcribed regions. These results agreed with our previous findings in A. thaliana cultured cells under a similar experimental scheme. The comparison of the results of the T2-plant and cultured cell experiments revealed that the de novo-activated transcription caused by local chromatin remodelling was inheritable. During one-generation inheritance, it seems likely that the transcription activities of the LUC inserts trapped by the endogenous genes/transcripts became stronger, while those of de novo transcription in the intergenic/untranscribed regions became weaker. These findings may offer a clue for the elucidation of the mechanism via which newborn genes become transcriptionally activated and fixed in the plant genome.


2016 ◽  
Author(s):  
Jorge Ruiz-Orera ◽  
Pol Verdaguer-Grau ◽  
José Luis Villanueva-Cañas ◽  
Xavier Messeguer ◽  
M Mar Albà

AbstractThere is accumulating evidence that some genes have originated de novo from previously non-coding genomic sequences. However, the processes underlying de novo gene birth are still enigmatic. In particular, the appearance of a new functional protein seems highly improbable unless there is already a pool of neutrally evolving peptides that can at some point acquire new functions. Here we show for the first time that such peptides do not only exist but that they are prevalent among the translation products of mouse genes that lack homologues in rat and human. The data suggests that the translation of these peptides is due to the chance occurrence of open reading frames with a favorable codon composition. Our approach combines ribosome profiling experiments, proteomics data and non-synonymous and synonymous nucleotide polymorphism analysis. We propose that effectively neutral processes involving the expression of thousands of transcripts all the way down to proteins provide a basis for de novo gene evolution.


Author(s):  
Soichirou Satoh ◽  
Takayuki Hata ◽  
Naoto Takada ◽  
Makoto Tachikawa ◽  
Mitsuhiro Matsuo ◽  
...  

ABSTRACTHorizontal gene transfer can occur between phylogenetically distant organisms, such as prokaryotes and eukaryotes. In these cases, how do the translocated genes acquire transcriptional competency in the alien genome environment? According to the conventional view, specific loci of the eukaryotic genome are thought to provide transcriptional competency to the incoming coding sequences. To examine this possibility, we randomly introduced the promoterless luciferase (LUC)-coding sequences into the genome of Arabidopsis thaliana cultured cells and performed a genome-wide “transgene location vs. expression” scan. We found that one-third of the 4,504 mapped LUC genes were transcribed. However, only 10% of them were explained by conventional transcriptional fusions with the annotated genes, and the remainder of the genes exhibited novel transcription that occurred independently of the chromatin configuration or transcriptional activity inherent to the given chromosomal locus; rather, their transcriptional activation occurred stochastically at about 30% of each insertion event, but independent of the integration sites. We termed this activation phenomenon as an integration-dependent stochastic transcriptional activation, a new type of response of the plant genome to incoming coding sequences. We discuss the possible roles of this phenomenon in the evolution of eukaryotic genomes.


2014 ◽  
Author(s):  
Dongying Gao ◽  
Yupeng Li ◽  
Brian Abernathy ◽  
Scott Jackson

Terminal-repeat retrotransposons in miniature (TRIMs) are structurally similar to long terminal repeat (LTR) retrotransposons except that they are extremely small and difficult to identify. Thus far, only a few TRIMs have been characterized in the euphyllophytes and the evolutionary and biological impacts and transposition mechanism of TRIMs are poorly understood. In this study, we combined de novo and homology-based methods to annotate TRIMs in 48 plant genome sequences, spanning land plants to algae. We found 156 TRIM families, 146 previously undescribed. Notably, we identified the first TRIMs in a lycophyte and non-vascular plants. The majority of the TRIM families were highly conserved and shared within and between plant families. Even though TRIMs contribute only a small fraction of any plant genome, they are enriched in or near genes and may play important roles in gene evolution. TRIMs were frequently organized into tandem arrays we called TA-TRIMs, another unique feature distinguishing them from LTR retrotransposons. Importantly, we identified putative autonomous retrotransposons that may mobilize specific TRIM elements and detected very recent transpositions of a TRIM in O. sativa. Overall, this comprehensive analysis of TRIMs across the entire plant kingdom provides insight into the evolution and conservation of TRIMs and the functional roles they may play in gene evolution.


Author(s):  
Hisayuki Kudo ◽  
Mitsuhiro Matsuo ◽  
Soichirou Satoh ◽  
Rei Hachisu ◽  
Masayuki Nakamura ◽  
...  

ABSTRACTIn gene-trap screening of plant genomes, promoterless reporter constructs are often expressed without trapping of annotated gene promoters. The molecular basis of this phenomenon, which has been interpreted as the trapping of cryptic promoters, is poorly understood. In this study, using Arabidopsis gene-trap lines in which a firefly luciferase (LUC) open reading frame (ORF) was expressed from intergenic regions, we found that cryptic promoter activation occurs by at least two different mechanisms: one is the capturing of pre-existing promoter-like chromatin marked by H3K4me3 and H2A.Z, and the other is the entirely new formation of promoter chromatin near the 5’ end of the inserted LUC ORF. To discriminate between these, we denoted the former mechanism as “cryptic promoter capturing”, and the latter one as “promoter de novo origination”. The latter finding raises a question as to how inserted LUC ORF sequence is involved in this phenomenon. To examine this, we performed a model experiment with chimeric LUC genes in transgenic plants. Using Arabidopsis psaH1 promoter–LUC constructs, we found that the functional core promoter region, where transcription start sites (TSS) occur, cannot simply be determined by the upstream nor core promoter sequences; rather, its positioning proximal to the inserted LUC ORF sequence was more critical. This result suggests that the insertion of the LUC ORF sequence alters the local distribution of the TSS in the plant genome. The possible impact of the two types of cryptic promoter activation mechanisms on plant genome evolution and endosymbiotic gene transfer is discussed.


2017 ◽  
Author(s):  
Jorge Ruiz-Orera ◽  
José Luis Villanueva-Cañas ◽  
William Blevins ◽  
M.Mar Albà

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.


2019 ◽  
Vol 37 (4) ◽  
pp. 1165-1178 ◽  
Author(s):  
Paco Majic ◽  
Joshua L Payne

Abstract Regulatory networks control the spatiotemporal gene expression patterns that give rise to and define the individual cell types of multicellular organisms. In eumetazoa, distal regulatory elements called enhancers play a key role in determining the structure of such networks, particularly the wiring diagram of “who regulates whom.” Mutations that affect enhancer activity can therefore rewire regulatory networks, potentially causing adaptive changes in gene expression. Here, we use whole-tissue and single-cell transcriptomic and chromatin accessibility data from mouse to show that enhancers play an additional role in the evolution of regulatory networks: They facilitate network growth by creating transcriptionally active regions of open chromatin that are conducive to de novo gene evolution. Specifically, our comparative transcriptomic analysis with three other mammalian species shows that young, mouse-specific intergenic open reading frames are preferentially located near enhancers, whereas older open reading frames are not. Mouse-specific intergenic open reading frames that are proximal to enhancers are more highly and stably transcribed than those that are not proximal to enhancers or promoters, and they are transcribed in a limited diversity of cellular contexts. Furthermore, we report several instances of mouse-specific intergenic open reading frames proximal to promoters showing evidence of being repurposed enhancers. We also show that open reading frames gradually acquire interactions with enhancers over macroevolutionary timescales, helping integrate genes—those that have arisen de novo or by other means—into existing regulatory networks. Taken together, our results highlight a dual role of enhancers in expanding and rewiring gene regulatory networks.


2017 ◽  
Author(s):  
Jorge Ruiz-Orera ◽  
José Luis Villanueva-Cañas ◽  
William Blevins ◽  
M.Mar Albà

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.


Sign in / Sign up

Export Citation Format

Share Document