Diversity and evolution of the emerging Pandoraviridae family

Mapping Intimacies ◽

10.1101/230904 ◽

2017 ◽

Cited By ~ 1

Author(s):

Matthieu Legendre ◽

Elisabeth Fabre ◽

Olivier Poirot ◽

Sandra Jeudy ◽

Audrey Lartigue ◽

...

Keyword(s):

Comparative Genomics ◽

De Novo ◽

Gene Duplications ◽

Statistical Features ◽

Strong Component ◽

Protein Coding ◽

Protein Coding Genes ◽

Intergenic Regions ◽

Comparative Genomics Analysis ◽

Horizontal Transfers

AbstractWith DNA genomes up to 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infectingPandoravirusesremained the most spectacular viruses since their description in 2013. Our isolation of three new strains from distant locations and environments allowed us to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses, led to the discovery of many non-coding transcripts while significantly reducing the former set of predicted protein-coding genes. We found that the Pandoraviridae exhibit an open pan genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggests thatde novogene creation is a strong component in the evolution of the giant Pandoravirus genomes.

Comparative Genomics Analysis of Two Different Virulent Bovine Pasteurella multocida Isolates

International Journal of Genomics ◽

10.1155/2016/4512493 ◽

2016 ◽

Vol 2016 ◽

pp. 1-14 ◽

Cited By ~ 9

Author(s):

Huihui Du ◽

Rendong Fang ◽

Tingting Pan ◽

Tian Li ◽

Nengzhang Li ◽

...

Keyword(s):

Comparative Genomics ◽

Genome Sequence ◽

Pasteurella Multocida ◽

Draft Genome ◽

Draft Genome Sequence ◽

Insertion Sequences ◽

Capsular Type ◽

Protein Coding ◽

Protein Coding Genes ◽

Comparative Genomics Analysis

The Pasteurella multocida capsular type A isolates can cause pneumonia and bovine respiratory disease (BRD). In this study, comparative genomics analysis was carried out to identify the virulence genes in two different virulent P. multocida capsular type A isolates (high virulent PmCQ2 and low virulent PmCQ6). The draft genome sequence of PmCQ2 is 2.32 Mbp and contains 2,002 protein-coding genes, 9 insertion sequence (IS) elements, and 1 prophage region. The draft genome sequence of PmCQ6 is 2.29 Mbp and contains 1,970 protein-coding genes, 2 IS elements, and 3 prophage regions. The genome alignment analysis revealed that the genome similarity between PmCQ2 and PmCQ6 is 99% with high colinearity. To identify the candidate genes responsible for virulence, the PmCQ2 and PmCQ6 were compared together with that of the published genomes of high virulent Pm36950 and PmHN06 and avirulent Pm3480 and Pm70 (capsular type F). Five genes and two insertion sequences are identified in high virulent strains but not in low virulent or avirulent strains. These results indicated that these genes or insertion sequences might be responsible for the virulence of P. multocida, providing prospective candidates for further studies on the pathogenesis and the host-pathogen interactions of P. multocida.

Pandoravirus celtis illustrates the microevolution processes at work in the giant Pandoraviridae genomes

10.1101/500207 ◽

2018 ◽

Cited By ~ 1

Author(s):

Matthieu Legendre ◽

Jean-Marie Alempic ◽

Nadège Philippe ◽

Audrey Lartigue ◽

Sandra Jeudy ◽

...

Keyword(s):

De Novo ◽

Gene Repertoire ◽

Protein Coding ◽

Genomic Changes ◽

Coding Regions ◽

Protein Coding Genes ◽

Intergenic Regions ◽

Mere Existence ◽

Increasing Functions ◽

Similar Gene

AbstractWith genomes of up to 2.7 Mb propagated in µm-long oblong particles and initially predicted to encode more than 2000 proteins, members of the Pandoraviridae family display the most extreme features of the known viral world. The mere existence of such giant viruses raises fundamental questions about their origin and the processes governing their evolution. A previous analysis of six newly available isolates, independently confirmed by a study including 3 others, established that the Pandoraviridae pan-genome is open, meaning that each new strain exhibits protein-coding genes not previously identified in other family members. With an average increment of about 60 proteins, the gene repertoire shows no sign of reaching a limit and remains largely coding for proteins without recognizable homologs in other viruses or cells (ORFans). To explain these results, we proposed that most new protein-coding genes were created de novo, from pre-existing non-coding regions of the G+C rich pandoravirus genomes. The comparison of the gene content of a new isolate, P. celtis, closely related (96% identical genome) to the previously described P. quercus is now used to test this hypothesis by studying genomic changes in a microevolution range. Our results confirm that the differences between these two similar gene contents mostly consist of protein-coding genes without known homologs (ORFans), with statistical signatures close to that of intergenic regions. These newborn proteins are under slight negative selection, perhaps to maintain stable folds and prevent protein aggregation pending the eventual emergence of fitness-increasing functions. Our study also unraveled several insertion events mediated by a transposase of the hAT family, 3 copies of which are found in P. celtis and are presumably active. Members of the Pandoraviridae are presently the first viruses known to encode this type of transposase.

De novoemergence of adaptive membrane proteins from thymine-rich intergenic sequences

10.1101/621532 ◽

2019 ◽

Author(s):

Nikolaos Vakirlis ◽

Omer Acar ◽

Brian Hsu ◽

Nelson Castilho Coelho ◽

S. Branden Van Oss ◽

...

Keyword(s):

De Novo ◽

Transmembrane Proteins ◽

Protein Coding ◽

Coding Sequences ◽

Beneficial Effects ◽

Protein Coding Genes ◽

Evolutionary Innovation ◽

Intergenic Sequences ◽

Intergenic Regions ◽

Novel Protein

SummaryRecent evidence demonstrates that novel protein-coding genes can arisede novofrom intergenic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of intergenic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Do intergenic translation events yield polypeptides with useful biochemical capacities? The answer to this question remains controversial. Here, we systematically characterized howde novoemerging coding sequences impact fitness. In budding yeast, overexpression of these sequences was enriched in beneficial effects, while their disruption was generally inconsequential. We found that beneficial emerging sequences have a strong tendency to encode putative transmembrane proteins, which appears to stem from a cryptic propensity for transmembrane signals throughout thymine-rich intergenic regions of the genome. These findings suggest that novel genes with useful biochemical capacities, such as transmembrane domains, tend to evolvede novowithin intergenic loci that already harbored a blueprint for these capacities.

Comparative genomic analyses highlight the contribution of pseudogenized protein-coding genes to human lincRNAs

10.1101/163626 ◽

2017 ◽

Author(s):

Wan-Hsin Liu ◽

Zing Tsung-Yeh Tsai ◽

Huai-Kuang Tsai

Keyword(s):

Human Genome ◽

Noncoding Rna ◽

De Novo ◽

Systematic Investigation ◽

Comparative Genomic ◽

Protein Coding ◽

Protein Coding Genes ◽

Competing Endogenous Rnas ◽

Intergenic Regions ◽

The Relationship

AbstractBackgroundThe regulatory roles of long intergenic noncoding RNAs (lincRNAs) in humans have been revealed through the use of advanced sequencing technology. Recently, three possible scenarios of lincRNA origin have been proposed: de novo origination from intergenic regions, duplication from long noncoding RNA, and pseudogenization from protein. The first two scenarios are largely studied and supported, yet few studies focused on the evolution from pseudo genized protein-coding sequence to lincRNA. Due to the non-mutually exclusive nature that these three scenarios have, accompanied by the need of systematic investigation of lincRNA origination, we conduct a comparative genomics study to investigate the evolution of human lincRNAs.ResultsCombining with syntenic analysis and stringent Blastn e-value cutoff, we found that the majority of lincRNAs are aligned to the intergenic regions of other species. Interestingly, 193 human lincRNAs could have protein-coding orthologs in at least two of nine vertebrates. Transposable elements in these conserved regions in human genome are much less than expectation. Moreover, 19% of these lincRNAs have overlaps with or are close to pseudogenes in the human genome.ConclusionsWe suggest that a notable portion of lincRNAs could be derived from pseudogenized protein-coding genes. Furthermore, based on our computational analysis, we hypothesize that a subset of these lincRNAs could have potential to regulate their paralogs by functioning as competing endogenous RNAs. Our results provide evolutionary evidence of the relationship between human lincRNAs and protein-coding genes.

From de novo to ‘de nono’: The majority of novel protein coding genes identified with phylostratigraphy are old genes or recent duplicates

Genome Biology and Evolution ◽

10.1093/gbe/evy231 ◽

2018 ◽

Cited By ~ 2

Author(s):

Claudio Casola

Keyword(s):

De Novo ◽

Protein Coding ◽

Protein Coding Genes ◽

Novel Protein

Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

10.1101/847558 ◽

2019 ◽

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Draft genome assembly data of Anoxybacillus sp. strain MB8 isolated from Tattapani hot springs, India

10.1101/2021.06.09.447659 ◽

2021 ◽

Author(s):

VISHNU PRASOODANAN P K ◽

Shruti S. Menon ◽

Rituja Saxena ◽

Prashant Waiker ◽

Vineet K Sharma

Keyword(s):

Hot Springs ◽

De Novo ◽

Draft Genome ◽

Gc Content ◽

Central India ◽

Glycoside Hydrolases ◽

Rrna Gene ◽

Aerobic Bacterium ◽

Protein Coding ◽

Protein Coding Genes

Discovery of novel thermophiles has shown promising applications in the field of biotechnology. Due to their thermal stability, they can survive the harsh processes in the industries, which make them important to be characterized and studied. Members of Anoxybacillus are alkaline tolerant thermophiles and have been extensively isolated from manure, dairy-processed plants, and geothermal hot springs. This article reports the assembled data of an aerobic bacterium Anoxybacillus sp. strain MB8, isolated from the Tattapani hot springs in Central India, where the 16S rRNA gene shares an identity of 97% (99% coverage) with Anoxybacillus kamchatkensis strain G10. The de novo assembly and annotation performed on the genome of Anoxybacillus sp. strain MB8 comprises of 2,898,780 bp (in 190 contigs) with a GC content of 41.8% and includes 2,976 protein-coding genes,1 rRNA operon, 73 tRNAs, 1 tm-RNA and 10 CRISPR arrays. The predicted protein-coding genes have been classified into 21 eggNOG categories. The KEGG Automated Annotation Server (KAAS) analysis indicated the presence of assimilatory sulfate reduction pathway, nitrate reducing pathway, and genes for glycoside hydrolases (GHs) and glycoside transferase (GTs). GHs and GTs hold widespread applications, in the baking and food industry for bread manufacturing, and in the paper, detergent and cosmetic industry. Hence, Anoxybacillus sp. strain MB8 holds the potential to be screened and characterized for such commercially relevant enzymes.

Integrating healthcare and research genetic data empowers the discovery of 28 novel developmental disorders

10.1101/797787 ◽

2019 ◽

Cited By ~ 14

Author(s):

Joanna Kaplanis ◽

Kaitlin E. Samocha ◽

Laurens Wiel ◽

Zhancheng Zhang ◽

Kevin J. Arvai ◽

...

Keyword(s):

Developmental Disorders ◽

De Novo ◽

Genetic Data ◽

Statistical Test ◽

Integrated Healthcare ◽

Protein Coding ◽

Protein Coding Genes ◽

Clinical Diagnostic ◽

Simulation Based

SummaryDe novo mutations (DNMs) in protein-coding genes are a well-established cause of developmental disorders (DD). However, known DD-associated genes only account for a minority of the observed excess of such DNMs. To identify novel DD-associated genes, we integrated healthcare and research exome sequences on 31,058 DD parent-offspring trios, and developed a simulation-based statistical test to identify gene-specific enrichments of DNMs. We identified 285 significantly DD-associated genes, including 28 not previously robustly associated with DDs. Despite detecting more DD-associated genes than in any previous study, much of the excess of DNMs of protein-coding genes remains unaccounted for. Modelling suggests that over 1,000 novel DD-associated genes await discovery, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of dominant DDs.

Phylogenetic relationships and taxonomic position of genus Hyperacrius (Rodentia: Arvicolinae) from Kashmir based on evidences from analysis of mitochondrial genome and study of skull morphology

PeerJ ◽

10.7717/peerj.10364 ◽

2020 ◽

Vol 8 ◽

pp. e10364

Author(s):

Natalia I. Abramson ◽

Fedor N. Golenishchev ◽

Semen Yu. Bodrov ◽

Olga V. Bondareva ◽

Evgeny A. Genelt-Yanovskiy ◽

...

Keyword(s):

Mitochondrial Genome ◽

De Novo ◽

Phylogenetic Analyses ◽

Complete Mitochondrial Genome ◽

Morphological Characters ◽

Molecular Data ◽

Phylogenetic Position ◽

Skull Morphology ◽

Protein Coding ◽

Protein Coding Genes

In this article, we present the nearly complete mitochondrial genome of the Subalpine Kashmir vole Hyperacrius fertilis (Arvicolinae, Cricetidae, Rodentia), assembled using data from Illumina next-generation sequencing (NGS) of the DNA from a century-old museum specimen. De novo assembly consisted of 16,341 bp and included all mitogenome protein-coding genes as well as 12S and 16S RNAs, tRNAs and D-loop. Using the alignment of protein-coding genes of 14 previously published Arvicolini tribe mitogenomes, seven Clethrionomyini mitogenomes, and also Ondatra and Dicrostonyx outgroups, we conducted phylogenetic reconstructions based on a dataset of 13 protein-coding genes (PCGs) under maximum likelihood and Bayesian inference. Phylogenetic analyses robustly supported the phylogenetic position of this species within the tribe Arvicolini. Among the Arvicolini, Hyperacrius represents one of the early-diverged lineages. This result of phylogenetic analysis altered the conventional view on phylogenetic relatedness between Hyperacrius and Alticola and prompted the revision of morphological characters underlying the former assumption. Morphological analysis performed here confirmed molecular data and provided additional evidence for taxonomic replacement of the genus Hyperacrius from the tribe Clethrionomyini to the tribe Arvicolini.

The complete chloroplast genome of Saxifraga sinomontana (Saxifragaceae) and comparative analysis with other Saxifragaceae species

Revista Brasileira de Botânica ◽

10.1007/s40415-019-00561-y ◽

2019 ◽

Vol 42 (4) ◽

pp. 601-611 ◽

Cited By ~ 1

Author(s):

Yan Li ◽

Liukun Jia ◽

Zhihua Wang ◽

Rui Xing ◽

Xiaofeng Chi ◽

...

Keyword(s):

Comparative Analysis ◽

Chloroplast Genome ◽

Phylogenetic Relationships ◽

De Novo ◽

Single Copy ◽

Bootstrap Support ◽

Protein Coding ◽

Complete Chloroplast Genome ◽

Protein Coding Genes ◽

Chloroplast Genomes

Abstract Saxifraga sinomontana J.-T. Pan & Gornall belongs to Saxifraga sect. Ciliatae subsect. Hirculoideae, a lineage containing ca. 110 species whose phylogenetic relationships are largely unresolved due to recent rapid radiations. Analyses of complete chloroplast genomes have the potential to significantly improve the resolution of phylogenetic relationships in this young plant lineage. The complete chloroplast genome of S. sinomontana was de novo sequenced, assembled and then compared with that of other six Saxifragaceae species. The S. sinomontana chloroplast genome is 147,240 bp in length with a typical quadripartite structure, including a large single-copy region of 79,310 bp and a small single-copy region of 16,874 bp separated by a pair of inverted repeats (IRs) of 25,528 bp each. The chloroplast genome contains 113 unique genes, including 79 protein-coding genes, four rRNAs and 30 tRNAs, with 18 duplicates in the IRs. The gene content and organization are similar to other Saxifragaceae chloroplast genomes. Sixty-one simple sequence repeats were identified in the S. sinomontana chloroplast genome, mostly represented by mononucleotide repeats of polyadenine or polythymine. Comparative analysis revealed 12 highly divergent regions in the intergenic spacers, as well as coding genes of matK, ndhK, accD, cemA, rpoA, rps19, ndhF, ccsA, ndhD and ycf1. Phylogenetic reconstruction of seven Saxifragaceae species based on 66 protein-coding genes received high bootstrap support values for nearly all identified nodes, suggesting a promising opportunity to resolve infrasectional relationships of the most species-rich section Ciliatae of Saxifraga.