scholarly journals Genome Sequence and Architecture of the Tobacco Downy Mildew Pathogen Peronospora tabacina

2015 ◽  
Vol 28 (11) ◽  
pp. 1198-1215 ◽  
Author(s):  
Lida Derevnina ◽  
Sebastian Chin-Wo-Reyes ◽  
Frank Martin ◽  
Kelsey Wood ◽  
Lutz Froenicke ◽  
...  

Peronospora tabacina is an obligate biotrophic oomycete that causes blue mold or downy mildew on tobacco (Nicotiana tabacum). It is an economically important disease occurring frequently in tobacco-growing regions worldwide. We sequenced and characterized the genomes of two P. tabacina isolates and mined them for pathogenicity-related proteins and effector-encoding genes. De novo assembly of the genomes using Illumina reads resulted in 4,016 (63.1 Mb, N50 = 79 kb) and 3,245 (55.3 Mb, N50 = 61 kb) scaffolds for isolates 968-J2 and 968-S26, respectively, with an estimated genome size of 68 Mb. The mitochondrial genome has a similar size (approximately 43 kb) and structure to those of other oomycetes, plus several minor unique features. Repetitive elements, primarily retrotransposons, make up approximately 24% of the nuclear genome. Approximately 18,000 protein-coding gene models were predicted. Mining the secretome revealed approximately 120 candidate RxLR, six CRN (candidate effectors that elicit crinkling and necrosis), and 61 WY domain–containing proteins. Candidate RxLR effectors were shown to be predominantly undergoing diversifying selection, with approximately 57% located in variable gene-sparse regions of the genome. Aligning the P. tabacina genome to Hyaloperonospora arabidopsidis and Phytophthora spp. revealed a high level of synteny. Blocks of synteny show gene inversions and instances of expansion in intergenic regions. Extensive rearrangements of the gene-rich genomic regions do not appear to have occurred during the evolution of these highly variable pathogens. These assemblies provide the basis for studies of virulence in this and other downy mildew pathogens.

2017 ◽  
Author(s):  
Matthieu Legendre ◽  
Elisabeth Fabre ◽  
Olivier Poirot ◽  
Sandra Jeudy ◽  
Audrey Lartigue ◽  
...  

AbstractWith DNA genomes up to 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infectingPandoravirusesremained the most spectacular viruses since their description in 2013. Our isolation of three new strains from distant locations and environments allowed us to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses, led to the discovery of many non-coding transcripts while significantly reducing the former set of predicted protein-coding genes. We found that the Pandoraviridae exhibit an open pan genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggests thatde novogene creation is a strong component in the evolution of the giant Pandoravirus genomes.


2017 ◽  
Author(s):  
John P. Lloyd ◽  
Zing Tsung-Yeh Tsai ◽  
Rosalie P. Sowers ◽  
Nicholas L. Panchy ◽  
Shin-Han Shiu

ABSTRACTWith advances in transcript profiling, the presence of transcriptional activities in intergenic regions has been well established. However, whether intergenic expression reflects transcriptional noise or activity of novel genes remains unclear. We identified intergenic transcribed regions (ITRs) in 15 diverse flowering plant species and found that the amount of intergenic expression correlates with genome size, a pattern that could be expected if intergenic expression is largely nonfunctional. To further assess the functionality of ITRs, we first built machine learning classifiers using Arabidopsis thaliana as a model that accurately distinguish functional sequences (phenotype genes) and likely nonfunctional ones (pseudogenes and unexpressed intergenic regions) by integrating 93 biochemical, evolutionary, and sequence-structure features. Next, by applying the models genome-wide, we found that 4,427 ITRs (38%) and 796 annotated ncRNAs (44%) had features significantly similar to benchmark protein-coding or RNA genes and thus were likely parts of functional genes. Approximately 60% of ITRs and ncRNAs were more similar to nonfunctional sequences and were likely transcriptional noise. The predictive framework established here provides not only a comprehensive look at how functional, genic sequences are distinct from likely nonfunctional ones, but also a new way to differentiate novel genes from genomic regions with noisy transcriptional activities.


2018 ◽  
Author(s):  
Matthieu Legendre ◽  
Jean-Marie Alempic ◽  
Nadège Philippe ◽  
Audrey Lartigue ◽  
Sandra Jeudy ◽  
...  

AbstractWith genomes of up to 2.7 Mb propagated in µm-long oblong particles and initially predicted to encode more than 2000 proteins, members of the Pandoraviridae family display the most extreme features of the known viral world. The mere existence of such giant viruses raises fundamental questions about their origin and the processes governing their evolution. A previous analysis of six newly available isolates, independently confirmed by a study including 3 others, established that the Pandoraviridae pan-genome is open, meaning that each new strain exhibits protein-coding genes not previously identified in other family members. With an average increment of about 60 proteins, the gene repertoire shows no sign of reaching a limit and remains largely coding for proteins without recognizable homologs in other viruses or cells (ORFans). To explain these results, we proposed that most new protein-coding genes were created de novo, from pre-existing non-coding regions of the G+C rich pandoravirus genomes. The comparison of the gene content of a new isolate, P. celtis, closely related (96% identical genome) to the previously described P. quercus is now used to test this hypothesis by studying genomic changes in a microevolution range. Our results confirm that the differences between these two similar gene contents mostly consist of protein-coding genes without known homologs (ORFans), with statistical signatures close to that of intergenic regions. These newborn proteins are under slight negative selection, perhaps to maintain stable folds and prevent protein aggregation pending the eventual emergence of fitness-increasing functions. Our study also unraveled several insertion events mediated by a transposase of the hAT family, 3 copies of which are found in P. celtis and are presumably active. Members of the Pandoraviridae are presently the first viruses known to encode this type of transposase.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Nikolaos Vakirlis ◽  
Omer Acar ◽  
Brian Hsu ◽  
Nelson Castilho Coelho ◽  
S. Branden Van Oss ◽  
...  

AbstractRecent evidence demonstrates that novel protein-coding genes can arise de novo from non-genic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of non-genic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Here, we systematically characterize how these de novo emerging coding sequences impact fitness in budding yeast. Disruption of emerging sequences is generally inconsequential for fitness in the laboratory and in natural populations. Overexpression of emerging sequences, however, is enriched in adaptive fitness effects compared to overexpression of established genes. We find that adaptive emerging sequences tend to encode putative transmembrane domains, and that thymine-rich intergenic regions harbor a widespread potential to produce transmembrane domains. These findings, together with in-depth examination of the de novo emerging YBR196C-A locus, suggest a novel evolutionary model whereby adaptive transmembrane polypeptides emerge de novo from thymine-rich non-genic regions and subsequently accumulate changes molded by natural selection.


Cancers ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1976
Author(s):  
Arantxa Carrasco-León ◽  
Ane Amundarain ◽  
Nahia Gómez-Echarte ◽  
Felipe Prósper ◽  
Xabier Agirre

MM is a hematological neoplasm that is still considered an incurable disease. Besides established genetic alterations, recent studies have shown that MM pathogenesis is also characterized by epigenetic aberrations, such as the gain of de novo active chromatin marks in promoter and enhancer regions and extensive DNA hypomethylation of intergenic regions, highlighting the relevance of these non-coding genomic regions. A recent study described how long non-coding RNAs (lncRNAs) correspond to 82% of the MM transcriptome and an increasing number of studies have demonstrated the importance of deregulation of lncRNAs in MM. In this review we focus on the deregulated lncRNAs in MM, including their biological or functional mechanisms, their role as biomarkers to improve the prognosis and monitoring of MM patients, and their participation in drug resistance. Furthermore, we also discuss the evidence supporting the role of lncRNAs as therapeutic targets through different novel RNA-based strategies.


2019 ◽  
Author(s):  
Nikolaos Vakirlis ◽  
Omer Acar ◽  
Brian Hsu ◽  
Nelson Castilho Coelho ◽  
S. Branden Van Oss ◽  
...  

SummaryRecent evidence demonstrates that novel protein-coding genes can arisede novofrom intergenic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of intergenic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Do intergenic translation events yield polypeptides with useful biochemical capacities? The answer to this question remains controversial. Here, we systematically characterized howde novoemerging coding sequences impact fitness. In budding yeast, overexpression of these sequences was enriched in beneficial effects, while their disruption was generally inconsequential. We found that beneficial emerging sequences have a strong tendency to encode putative transmembrane proteins, which appears to stem from a cryptic propensity for transmembrane signals throughout thymine-rich intergenic regions of the genome. These findings suggest that novel genes with useful biochemical capacities, such as transmembrane domains, tend to evolvede novowithin intergenic loci that already harbored a blueprint for these capacities.


2016 ◽  
Author(s):  
Angelo Poliseno ◽  
Odalisca Breedy ◽  
Michael Eitel ◽  
Gert Wöerheide ◽  
Hector M. Guzman ◽  
...  

AbstractWe sequenced the complete mitogenomes of two eastern tropical Pacific gorgonians, Muricea crassa and Muricea purpurea, using NGS technologies. The assembled mitogenomes of M. crassa and M. purpurea were 19,586 bp and 19,358 bp in length, with a GC-content ranging from 36.0% to 36.1%, respectively. The two mitogenomes had the same gene arrangement consisting of 14 protein-coding genes, two rRNAs and one tRNA. Mitogenome identity was 98.5%. The intergenic regions between COB and NAD6 and between NAD5 and NAD4 were polymorphic in length with a high level of nucleotide diversity. Based on a concatenated dataset of 14 mitochondrial protein-coding genes we inferred the phylogeny of 26 octocoral species.


Insects ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 963
Author(s):  
Lele Ding ◽  
Huiling Sang ◽  
Cheng Sun

In eukaryotes, DNA of mitochondria is transferred into the nucleus and forms nuclear mitochondrial DNAs (NUMTs). Taking advantage of the abundant genomic resources for bumblebees, in this study, we de novo generated mitochondrial genomes (mitogenomes) for 11 bumblebee species. Then, we identified and characterized NUMTs in genus-wide bumblebee species. The number of identified NUMTs varies across those species, with numbers ranging from 32 to 72, and nuclear genome size is not positively related to NUMT number. The insertion sites of NUMTs in the nuclear genome are not random, with AT-rich regions harboring more NUMTs. In addition, our results suggest that NUMTs derived from the mitochondrial COX1 gene are most abundant in the bumblebee nuclear genome. Although the majority of NUMTs are found within intergenic regions, some NUMTs do reside within genic regions. Transcripts that contain both the NUMT sequence and its flanking non-NUMT sequences could be found in the bumblebee transcriptome, suggesting a potential domestication of NUMTs in the bumblebee. Taken together, our results shed light on the molecular features of NUMTs in the bumblebee and uncover their contribution to genome innovation.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Suzana Stjelja ◽  
Johan Fogelqvist ◽  
Christian Tellgren-Roth ◽  
Christina Dixelius

Abstract Plasmodiophora brassicae is a soil-borne pathogen that attacks roots of cruciferous plants causing clubroot disease. The pathogen belongs to the Plasmodiophorida order in Phytomyxea. Here we used long-read SMRT technology to clarify the P. brassicae e3 genomic constituents along with comparative and phylogenetic analyses. Twenty contigs representing the nuclear genome and one mitochondrial (mt) contig were generated, together comprising 25.1 Mbp. Thirteen of the 20 nuclear contigs represented chromosomes from telomere to telomere characterized by [TTTTAGGG] sequences. Seven active gene candidates encoding synaptonemal complex-associated and meiotic-related protein homologs were identified, a finding that argues for possible genetic recombination events. The circular mt genome is large (114,663 bp), gene dense and intron rich. It shares high synteny with the mt genome of Spongospora subterranea, except in a unique 12 kb region delimited by shifts in GC content and containing tandem minisatellite- and microsatellite repeats with partially palindromic sequences. De novo annotation identified 32 protein-coding genes, 28 structural RNA genes and 19 ORFs. ORFs predicted in the repeat-rich region showed similarities to diverse organisms suggesting possible evolutionary connections. The data generated here form a refined platform for the next step involving functional analysis, all to clarify the complex biology of P. brassicae.


Genes ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 1036 ◽  
Author(s):  
África Sanchiz ◽  
Esperanza Morato ◽  
Alberto Rastrojo ◽  
Esther Camacho ◽  
Sandra González-de la Fuente ◽  
...  

Leishmania infantum causes visceral leishmaniasis (kala-azar), the most severe form of leishmaniasis, which is lethal if untreated. A few years ago, the re-sequencing and de novo assembling of the L. infantum (JPCM5 strain) genome was accomplished, and now we aimed to describe and characterize the experimental proteome of this species. In this work, we performed a proteomic analysis from axenic cultured promastigotes and carried out a detailed comparison with other Leishmania experimental proteomes published to date. We identified 2352 proteins based on a search of mass spectrometry data against a database built from the six-frame translated genome sequence of L. infantum. We detected many proteins belonging to organelles such as glycosomes, mitochondria, or flagellum, as well as many metabolic enzymes and many putative RNA binding proteins and molecular chaperones. Moreover, we listed some proteins presenting post-translational modifications, such as phosphorylations, acetylations, and methylations. On the other hand, the identification of peptides mapping to genomic regions previously annotated as non-coding allowed for the correction of annotations, leading to the N-terminal extension of protein sequences and the uncovering of eight novel protein-coding genes. The alliance of proteomics, genomics, and transcriptomics has resulted in a powerful combination for improving the annotation of the L. infantum reference genome.


Sign in / Sign up

Export Citation Format

Share Document