scholarly journals De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Nikolaos Vakirlis ◽  
Omer Acar ◽  
Brian Hsu ◽  
Nelson Castilho Coelho ◽  
S. Branden Van Oss ◽  
...  

AbstractRecent evidence demonstrates that novel protein-coding genes can arise de novo from non-genic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of non-genic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Here, we systematically characterize how these de novo emerging coding sequences impact fitness in budding yeast. Disruption of emerging sequences is generally inconsequential for fitness in the laboratory and in natural populations. Overexpression of emerging sequences, however, is enriched in adaptive fitness effects compared to overexpression of established genes. We find that adaptive emerging sequences tend to encode putative transmembrane domains, and that thymine-rich intergenic regions harbor a widespread potential to produce transmembrane domains. These findings, together with in-depth examination of the de novo emerging YBR196C-A locus, suggest a novel evolutionary model whereby adaptive transmembrane polypeptides emerge de novo from thymine-rich non-genic regions and subsequently accumulate changes molded by natural selection.

2019 ◽  
Author(s):  
Nikolaos Vakirlis ◽  
Omer Acar ◽  
Brian Hsu ◽  
Nelson Castilho Coelho ◽  
S. Branden Van Oss ◽  
...  

SummaryRecent evidence demonstrates that novel protein-coding genes can arisede novofrom intergenic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of intergenic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Do intergenic translation events yield polypeptides with useful biochemical capacities? The answer to this question remains controversial. Here, we systematically characterized howde novoemerging coding sequences impact fitness. In budding yeast, overexpression of these sequences was enriched in beneficial effects, while their disruption was generally inconsequential. We found that beneficial emerging sequences have a strong tendency to encode putative transmembrane proteins, which appears to stem from a cryptic propensity for transmembrane signals throughout thymine-rich intergenic regions of the genome. These findings suggest that novel genes with useful biochemical capacities, such as transmembrane domains, tend to evolvede novowithin intergenic loci that already harbored a blueprint for these capacities.


2015 ◽  
Vol 370 (1678) ◽  
pp. 20140332 ◽  
Author(s):  
Aoife McLysaght ◽  
Daniele Guerzoni

The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces , Drosophila , Plasmodium , Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations.


2017 ◽  
Author(s):  
Matthieu Legendre ◽  
Elisabeth Fabre ◽  
Olivier Poirot ◽  
Sandra Jeudy ◽  
Audrey Lartigue ◽  
...  

AbstractWith DNA genomes up to 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infectingPandoravirusesremained the most spectacular viruses since their description in 2013. Our isolation of three new strains from distant locations and environments allowed us to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses, led to the discovery of many non-coding transcripts while significantly reducing the former set of predicted protein-coding genes. We found that the Pandoraviridae exhibit an open pan genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggests thatde novogene creation is a strong component in the evolution of the giant Pandoravirus genomes.


2018 ◽  
Author(s):  
Matthieu Legendre ◽  
Jean-Marie Alempic ◽  
Nadège Philippe ◽  
Audrey Lartigue ◽  
Sandra Jeudy ◽  
...  

AbstractWith genomes of up to 2.7 Mb propagated in µm-long oblong particles and initially predicted to encode more than 2000 proteins, members of the Pandoraviridae family display the most extreme features of the known viral world. The mere existence of such giant viruses raises fundamental questions about their origin and the processes governing their evolution. A previous analysis of six newly available isolates, independently confirmed by a study including 3 others, established that the Pandoraviridae pan-genome is open, meaning that each new strain exhibits protein-coding genes not previously identified in other family members. With an average increment of about 60 proteins, the gene repertoire shows no sign of reaching a limit and remains largely coding for proteins without recognizable homologs in other viruses or cells (ORFans). To explain these results, we proposed that most new protein-coding genes were created de novo, from pre-existing non-coding regions of the G+C rich pandoravirus genomes. The comparison of the gene content of a new isolate, P. celtis, closely related (96% identical genome) to the previously described P. quercus is now used to test this hypothesis by studying genomic changes in a microevolution range. Our results confirm that the differences between these two similar gene contents mostly consist of protein-coding genes without known homologs (ORFans), with statistical signatures close to that of intergenic regions. These newborn proteins are under slight negative selection, perhaps to maintain stable folds and prevent protein aggregation pending the eventual emergence of fitness-increasing functions. Our study also unraveled several insertion events mediated by a transposase of the hAT family, 3 copies of which are found in P. celtis and are presumably active. Members of the Pandoraviridae are presently the first viruses known to encode this type of transposase.


2016 ◽  
Author(s):  
Benjamin D Kaehler ◽  
Von Bing Yap ◽  
Gavin A Huttley

Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of non-synonymous substitutions to the rate of neutral evolution, typically assumed to be the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied blindly in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of non-synonymous to synonymous rates of substitution tends to be underestimated over three data sets of insects, mammals, and vertebrates. Our basis for comparison is a non-stationary codon substitution model that allows sequence composition to change. Model selection and model fit results demonstrate that our new model tends to fit the data better. Direct measurement of non-stationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity.


2015 ◽  
Vol 28 (11) ◽  
pp. 1198-1215 ◽  
Author(s):  
Lida Derevnina ◽  
Sebastian Chin-Wo-Reyes ◽  
Frank Martin ◽  
Kelsey Wood ◽  
Lutz Froenicke ◽  
...  

Peronospora tabacina is an obligate biotrophic oomycete that causes blue mold or downy mildew on tobacco (Nicotiana tabacum). It is an economically important disease occurring frequently in tobacco-growing regions worldwide. We sequenced and characterized the genomes of two P. tabacina isolates and mined them for pathogenicity-related proteins and effector-encoding genes. De novo assembly of the genomes using Illumina reads resulted in 4,016 (63.1 Mb, N50 = 79 kb) and 3,245 (55.3 Mb, N50 = 61 kb) scaffolds for isolates 968-J2 and 968-S26, respectively, with an estimated genome size of 68 Mb. The mitochondrial genome has a similar size (approximately 43 kb) and structure to those of other oomycetes, plus several minor unique features. Repetitive elements, primarily retrotransposons, make up approximately 24% of the nuclear genome. Approximately 18,000 protein-coding gene models were predicted. Mining the secretome revealed approximately 120 candidate RxLR, six CRN (candidate effectors that elicit crinkling and necrosis), and 61 WY domain–containing proteins. Candidate RxLR effectors were shown to be predominantly undergoing diversifying selection, with approximately 57% located in variable gene-sparse regions of the genome. Aligning the P. tabacina genome to Hyaloperonospora arabidopsidis and Phytophthora spp. revealed a high level of synteny. Blocks of synteny show gene inversions and instances of expansion in intergenic regions. Extensive rearrangements of the gene-rich genomic regions do not appear to have occurred during the evolution of these highly variable pathogens. These assemblies provide the basis for studies of virulence in this and other downy mildew pathogens.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9552
Author(s):  
Furrukh Mehmood ◽  
Abdullah ◽  
Zartasha Ubaid ◽  
Iram Shahzadi ◽  
Ibrar Ahmed ◽  
...  

Species of the genus Nicotiana (Solanaceae), commonly referred to as tobacco plants, are often cultivated as non-food crops and garden ornamentals. In addition to the worldwide production of tobacco leaves, they are also used as evolutionary model systems due to their complex development history tangled by polyploidy and hybridization. Here, we assembled the plastid genomes of five tobacco species: N. knightiana, N. rustica, N. paniculata, N. obtusifolia and N. glauca. De novo assembled tobacco plastid genomes had the typical quadripartite structure, consisting of a pair of inverted repeat (IR) regions (25,323–25,369 bp each) separated by a large single-copy (LSC) region (86,510–86,716 bp) and a small single-copy (SSC) region (18,441–18,555 bp). Comparative analyses of Nicotiana plastid genomes with currently available Solanaceae genome sequences showed similar GC and gene content, codon usage, simple sequence and oligonucleotide repeats, RNA editing sites, and substitutions. We identified 20 highly polymorphic regions, mostly belonging to intergenic spacer regions (IGS), which could be suitable for the development of robust and cost-effective markers for inferring the phylogeny of the genus Nicotiana and family Solanaceae. Our comparative plastid genome analysis revealed that the maternal parent of the tetraploid N. rustica was the common ancestor of N. paniculata and N. knightiana, and the later species is more closely related to N. rustica. Relaxed molecular clock analyses estimated the speciation event between N. rustica and N. knightiana appeared 0.56 Ma (HPD 0.65–0.46). Biogeographical analysis supported a south-to-north range expansion and diversification for N. rustica and related species, where N. undulata and N. paniculata evolved in North/Central Peru, while N. rustica developed in Southern Peru and separated from N. knightiana, which adapted to the Southern coastal climatic regimes. We further inspected selective pressure on protein-coding genes among tobacco species to determine if this adaptation process affected the evolution of plastid genes. These analyses indicate that four genes involved in different plastid functions, including DNA replication (rpoA) and photosynthesis (atpB, ndhD and ndhF), came under positive selective pressure as a result of specific environmental conditions. Genetic mutations in these genes might have contributed to better survival and superior adaptations during the evolutionary history of tobacco species.


2016 ◽  
Author(s):  
Zoe June Assaf ◽  
Susanne Tilk ◽  
Jane Park ◽  
Mark L. Siegal ◽  
Dmitri A. Petrov

AbstractMutations provide the raw material of evolution, and thus our ability to study evolution depends fundamentally on whether we have precise measurements of mutational rates and patterns. Here we explore the rates and patterns of mutations using i) de novo mutations from Drosophila melanogaster mutation accumulation lines and ii) polymorphisms segregating at extremely low frequencies. The first, mutation accumulation (MA) lines, are the product of maintaining flies in tiny populations for many generations, therefore rendering natural selection ineffective and allowing new mutations to accrue in the genome. In addition to generating a novel dataset of sequenced MA lines, we perform a meta-analysis of all published MA studies in D. melanogaster, which allows more precise estimates of mutational patterns across the genome. In the second half of this work, we identify polymorphisms segregating at extremely low frequencies using several publicly available population genomic data sets from natural populations of D. melanogaster. Extremely rare polymorphisms are difficult to detect with high confidence due to the problem of distinguishing them from sequencing error, however a dataset of true rare polymorphisms would allow the quantification of mutational patterns. This is due to the fact that rare polymorphisms, much like de novo mutations, are on average younger and also relatively unaffected by the filter of natural selection. We identify a high quality set of ~70,000 rare polymorphisms, fully validated with resequencing, and use this dataset to measure mutational patterns in the genome. This includes identifying a high rate of multi-nucleotide mutation events at both short (~5bp) and long (~1kb) genomic distances, showing that mutation drives GC content lower in already GC-poor regions, and finding that the context-dependency of the mutation spectrum predicts long-term evolutionary patterns at four-fold synonymous sites. We also show that de novo mutations from independent mutation accumulation experiments display similar patterns of single nucleotide mutation, and match well the patterns of mutation found in natural populations.


Sign in / Sign up

Export Citation Format

Share Document