Discovery of a class of giant virus relatives displaying unusual functional traits and prevalent within plankton: the Mirusviricetes

Mapping Intimacies ◽

10.1101/2021.12.27.474232 ◽

2021 ◽

Author(s):

Morgan Gaia ◽

Lingjie Meng ◽

Eric Pelletier ◽

Patrick Forterre ◽

Chiara Vanni ◽

...

Keyword(s):

Comparative Genomics ◽

Functional Traits ◽

Viral Particle ◽

Metagenomic Data ◽

Data Sets ◽

Dna Viruses ◽

Tata Binding Proteins ◽

Profound Influence ◽

A Subunit ◽

Environmental Surveys

Large and giant DNA viruses of the phylum Nucleocytoviricota have a profound influence on the ecology and evolution of planktonic eukaryotes. Recently, various Nucleocytoviricota genomes have been characterized from environmental metagenomes based on the occurrence of hallmark genes identified from cultures. However, lineages diverging from the culture genomics functional principles have been overlooked thus far. Here, we developed a phylogeny-guided genome-resolved metagenomic framework using a single hallmark gene as compass, a subunit of DNA-dependent RNA polymerase encoded by most Nucleocytoviricota. We applied this method to large metagenomic data sets from the surface of five oceans and two seas and characterized 697 non-redundant Nucleocytoviricota genomes up to 1.45 Mbp in length. This database expands the known diversity of the class Megaviricetes and revealed two additional putative classes we named Proculviricetes and Mirusviricetes. Critically, the diverse and prevalent Mirusviricetes population genomes seemingly lack several hallmark genes, in particular those related to viral particle morphogenesis. Instead, they share various genes of known (e.g., TATA-binding proteins, histones, proteases and viral rhodopsins) and unknown functions rarely detected if not entirely missing in all other characterized Nucleocytoviricota lineages. Phylogenomics, comparative genomics, functional trends and the signal among planktonic cellular size fractions point to Mirusviricetes being a major, functionally divergent class of large DNA viruses that actively infect eukaryotes in the sunlit ocean using an enigmatic functional life style. Finally, we built a comprehensive marine genomic database for Nucleocytoviricota by combining multiple environmental surveys that might contribute to future endeavors exploring the ecology and evolution of plankton.

Download Full-text

Novel Cell-Virus-Virophage Tripartite Infection Systems Discovered in the Freshwater Lake Dishui Lake in Shanghai, China

Journal of Virology ◽

10.1128/jvi.00149-20 ◽

2020 ◽

Vol 94 (11) ◽

Cited By ~ 1

Author(s):

Shengzhong Xu ◽

Liang Zhou ◽

Xiaosha Liang ◽

Yifan Zhou ◽

Hao Chen ◽

...

Keyword(s):

Green Algae ◽

Green Alga ◽

Phylogenetic Analyses ◽

Freshwater Lake ◽

Metagenomic Data ◽

Data Sets ◽

Data Set ◽

Dna Viruses ◽

Oligonucleotide Frequency ◽

Dsdna Viruses

ABSTRACT Virophages are small parasitic double-stranded DNA (dsDNA) viruses of giant dsDNA viruses infecting unicellular eukaryotes. Except for a few isolated virophages characterized by parasitization mechanisms, features of virophages discovered in metagenomic data sets remain largely unknown. Here, the complete genomes of seven virophages (26.6 to 31.5 kbp) and four large DNA viruses (190.4 to 392.5 kbp) that coexist in the freshwater lake Dishui Lake, Shanghai, China, have been identified based on environmental metagenomic investigation. Both genomic and phylogenetic analyses indicate that Dishui Lake virophages (DSLVs) are closely related to each other and to other lake virophages, and Dishui Lake large DNA viruses are affiliated with the micro-green alga-infecting Prasinovirus of the Phycodnaviridae (named Dishui Lake phycodnaviruses [DSLPVs]) and protist (protozoan and alga)-infecting Mimiviridae (named Dishui Lake large alga virus [DSLLAV]). The DSLVs possess more genes with closer homology to that of large alga viruses than to that of giant protozoan viruses. Furthermore, the DSLVs are strongly associated with large green alga viruses, including DSLPV4 and DSLLAV1, based on codon usage as well as oligonucleotide frequency and correlation analyses. Surprisingly, a nonhomologous CRISPR-Cas like system is found in DSLLAV1, which appears to protect DSLLAV1 from the parasitization of DSLV5 and DSLV8. These results suggest that novel cell-virus-virophage (CVv) tripartite infection systems of green algae, large green alga virus (Phycodnaviridae- and Mimiviridae-related), and virophage exist in Dishui Lake, which will contribute to further deep investigations of the evolutionary interaction of virophages and large alga viruses as well as of the essential roles that the CVv plays in the ecology of algae. IMPORTANCE Virophages are small parasitizing viruses of large/giant viruses. To our knowledge, the few isolated virophages all parasitize giant protozoan viruses (Mimiviridae) for propagation and form a tripartite infection system with hosts, here named the cell-virus-virophage (CVv) system. However, the CVv system remains largely unknown in environmental metagenomic data sets. In this study, we systematically investigated the metagenomic data set from the freshwater lake Dishui Lake, Shanghai, China. Consequently, four novel large alga viruses and seven virophages were discovered to coexist in Dishui Lake. Surprisingly, a novel CVv tripartite infection system comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages was identified based on genetic link, genomic signature, and CRISPR system analyses. Meanwhile, a nonhomologous CRISPR-like system was found in Dishui Lake large alga viruses, which appears to protect the virus host from the infection of Dishui Lake virophages (DSLVs). These findings are critical to give insight into the potential significance of CVv in global evolution and ecology.

Download Full-text

Screening Metagenomic Data for Viruses Using the E-Probe Diagnostic Nucleic Acid Assay

Phytopathology ◽

10.1094/phyto-11-13-0310-r ◽

2014 ◽

Vol 104 (10) ◽

pp. 1125-1129 ◽

Cited By ~ 11

Author(s):

A. H. Stobbe ◽

W. L. Schneider ◽

P. R. Hoyt ◽

U. Melcher

Keyword(s):

Nucleic Acid ◽

Mosaic Virus ◽

Yellow Mosaic Virus ◽

Metagenomic Data ◽

Data Sets ◽

Virus Species ◽

Data Set ◽

Golden Yellow ◽

Nucleic Acid Assay ◽

Ngs Data

Next generation sequencing (NGS) is not used commonly in diagnostics, in part due to the large amount of time and computational power needed to identify the taxonomic origin of each sequence in a NGS data set. By using the unassembled NGS data sets as the target for searches, pathogen-specific sequences, termed e-probes, could be used as queries to enable detection of specific viruses or organisms in plant sample metagenomes. This method, designated e-probe diagnostic nucleic acid assay, first tested with mock sequence databases, was tested with NGS data sets generated from plants infected with a DNA (Bean golden yellow mosaic virus, BGYMV) or an RNA (Plum pox virus, PPV) virus. In addition, the ability to detect and differentiate among strains of a single virus species, PPV, was examined by using probe sets that were specific to strains. The use of probe sets for multiple viruses determined that one sample was dually infected with BGYMV and Bean golden mosaic virus.

Download Full-text

Microbiome Search Engine 2: a Platform for Taxonomic and Functional Search of Global Microbiomes on the Whole-Microbiome Level

mSystems ◽

10.1128/msystems.00943-20 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Gongchao Jing ◽

Lu Liu ◽

Zengbin Wang ◽

Yufeng Zhang ◽

Li Qian ◽

...

Keyword(s):

Big Data ◽

User Interface ◽

Search Engine ◽

Functional Similarity ◽

Metagenomic Data ◽

Data Sets ◽

Data Space ◽

Link Type ◽

Database Platform ◽

Microbiome Data

ABSTRACT Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.

Download Full-text

Bayesian Classification of Microbial Communities Based on 16S rRNA Metagenomic Data

10.1101/340653 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arghavan Bahadorinejad ◽

Ivan Ivanov ◽

Johanna W Lampe ◽

Meredith AJ Hullar ◽

Robert S Chapkin ◽

...

Keyword(s):

16S Rrna ◽

Sample Size ◽

Microbial Communities ◽

State Of The Art ◽

Metagenomic Data ◽

Data Sets ◽

Sequencing Data ◽

Sample Data

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.

Download Full-text

METAnnotatorX2: a Comprehensive Tool for Deep and Shallow Metagenomic Data Set Analyses

mSystems ◽

10.1128/msystems.00583-21 ◽

2021 ◽

Vol 6 (3) ◽

Author(s):

Christian Milani ◽

Gabriele Andrea Lugli ◽

Federico Fontana ◽

Leonardo Mancabelli ◽

Giulia Alessandri ◽

...

Keyword(s):

Sensitivity And Specificity ◽

Metagenomic Data ◽

Data Sets ◽

Data Set

We developed a novel tool, i.e., METAnnotatorX2, that includes a number of new advanced features for analysis of deep and shallow metagenomic data sets and is accompanied by (regularly updated) customized databases for archaea, bacteria, fungi, protists, and viruses. Both software and databases were developed so as to maximize sensitivity and specificity while including support for shallow metagenomic data sets.

Download Full-text

Mumame: a software tool for quantifying gene-specific point-mutations in shotgun metagenomic data

Metabarcoding and Metagenomics ◽

10.3897/mbmg.3.36236 ◽

2019 ◽

Vol 3 ◽

Cited By ~ 1

Author(s):

Shruthi Magesh ◽

Viktor Jonsson ◽

Johan Bengtsson-Palme

Keyword(s):

Microbial Communities ◽

Point Mutations ◽

Software Tool ◽

Metagenomic Data ◽

Data Sets ◽

Resistance Mutations ◽

Shotgun Metagenomics ◽

Key Factor ◽

Detection Of Mutations ◽

And Function

Metagenomics has emerged as a central technique for studying the structure and function of microbial communities. Often the functional analysis is restricted to classification into broad functional categories. However, important phenotypic differences, such as resistance to antibiotics, are often the result of just one or a few point mutations in otherwise identical sequences. Bioinformatic methods for metagenomic analysis have generally been poor at accounting for this fact, resulting in a somewhat limited picture of important aspects of microbial communities. Here, we address this problem by providing a software tool called Mumame, which can distinguish between wildtype and mutated sequences in shotgun metagenomic data and quantify their relative abundances. We demonstrate the utility of the tool by quantifying antibiotic resistance mutations in several publicly available metagenomic data sets. We also identified that sequencing depth is a key factor to detect rare mutations. Therefore, much larger numbers of sequences may be required for reliable detection of mutations than for most other applications of shotgun metagenomics. Mumame is freely available online (http://microbiology.se/software/mumame).

Download Full-text

Diversity of Virophages in Metagenomic Data Sets

Journal of Virology ◽

10.1128/jvi.03398-12 ◽

2013 ◽

Vol 87 (8) ◽

pp. 4225-4236 ◽

Cited By ~ 58

Author(s):

J. Zhou ◽

W. Zhang ◽

S. Yan ◽

J. Xiao ◽

Y. Zhang ◽

...

Keyword(s):

Metagenomic Data ◽

Data Sets

Download Full-text

A Need for Improved Cellulase Identification from Metagenomic Sequence Data

Applied and Environmental Microbiology ◽

10.1128/aem.01928-20 ◽

2020 ◽

Vol 87 (1) ◽

Author(s):

Rebecca Co ◽

Laura A. Hug

Keyword(s):

Sequence Data ◽

Industrial Applications ◽

Data Sets ◽

Metagenomic Sequence ◽

Environmental Sequence ◽

Sequencing Technologies ◽

Current Classification ◽

Applied Microbiology ◽

Environmental Surveys ◽

Metagenomic Sequence Data

ABSTRACT Improved sequencing technologies and the maturation of metagenomic approaches allow the identification of gene variants with potential industrial applications, including cellulases. Cellulase identification from metagenomic environmental surveys is complicated by inconsistent nomenclature and multiple categorization systems. Here, we summarize the current classification and nomenclature systems, with recommendations for improvements to these systems. Addressing the issues described will strengthen the annotation of cellulose-active enzymes from environmental sequence data sets—a rapidly growing resource in environmental and applied microbiology.

Download Full-text

Diversity and evolution of B-family DNA polymerases

Nucleic Acids Research ◽

10.1093/nar/gkaa760 ◽

2020 ◽

Vol 48 (18) ◽

pp. 10142-10156 ◽

Cited By ~ 4

Author(s):

Darius Kazlauskas ◽

Mart Krupovic ◽

Julien Guglielmini ◽

Patrick Forterre ◽

Česlovas Venclovas

Keyword(s):

Structural Information ◽

Dna Polymerases ◽

Comprehensive Analysis ◽

Metagenomic Data ◽

Dna Viruses ◽

Taxonomic Distribution ◽

Binding Domains ◽

Catalytically Active ◽

Domains Of Life ◽

Massive Accumulation

Abstract B-family DNA polymerases (PolBs) represent the most common replicases. PolB enzymes that require RNA (or DNA) primed templates for DNA synthesis are found in all domains of life and many DNA viruses. Despite extensive research on PolBs, their origins and evolution remain enigmatic. Massive accumulation of new genomic and metagenomic data from diverse habitats as well as availability of new structural information prompted us to conduct a comprehensive analysis of the PolB sequences, structures, domain organizations, taxonomic distribution and co-occurrence in genomes. Based on phylogenetic analysis, we identified a new, widespread group of bacterial PolBs that are more closely related to the catalytically active N-terminal half of the eukaryotic PolEpsilon (PolEpsilonN) than to Escherichia coli Pol II. In Archaea, we characterized six new groups of PolBs. Two of them show close relationships with eukaryotic PolBs, the first one with PolEpsilonN, and the second one with PolAlpha, PolDelta and PolZeta. In addition, structure comparisons suggested common origin of the catalytically inactive C-terminal half of PolEpsilon (PolEpsilonC) and PolAlpha. Finally, in certain archaeal PolBs we discovered C-terminal Zn-binding domains closely related to those of PolAlpha and PolEpsilonC. Collectively, the obtained results allowed us to propose a scenario for the evolution of eukaryotic PolBs.

Download Full-text

Unveiling Crucivirus Diversity by Mining Metagenomic Data

mBio ◽

10.1128/mbio.01410-20 ◽

2020 ◽

Vol 11 (5) ◽

Cited By ~ 1

Author(s):

Ignacio de la Higuera ◽

George W. Kasun ◽

Ellis L. Torrance ◽

Alyssa A. Pratt ◽

Amberlee Maluenda ◽

...

Keyword(s):

De Novo ◽

Rna Viruses ◽

Sequence Data ◽

Ecosystem Dynamics ◽

Capsid Proteins ◽

Metagenomic Data ◽

Dna Viruses ◽

Rep Protein ◽

Dna And Rna ◽

Core Proteins

ABSTRACT The discovery of cruciviruses revealed the most explicit example of a common protein homologue between DNA and RNA viruses to date. Cruciviruses are a novel group of circular Rep-encoding single-stranded DNA (ssDNA) (CRESS-DNA) viruses that encode capsid proteins that are most closely related to those encoded by RNA viruses in the family Tombusviridae. The apparent chimeric nature of the two core proteins encoded by crucivirus genomes suggests horizontal gene transfer of capsid genes between DNA and RNA viruses. Here, we identified and characterized 451 new crucivirus genomes and 10 capsid-encoding circular genetic elements through de novo assembly and mining of metagenomic data. These genomes are highly diverse, as demonstrated by sequence comparisons and phylogenetic analysis of subsets of the protein sequences they encode. Most of the variation is reflected in the replication-associated protein (Rep) sequences, and much of the sequence diversity appears to be due to recombination. Our results suggest that recombination tends to occur more frequently among groups of cruciviruses with relatively similar capsid proteins and that the exchange of Rep protein domains between cruciviruses is rarer than intergenic recombination. Additionally, we suggest members of the stramenopiles/alveolates/Rhizaria supergroup as possible crucivirus hosts. Altogether, we provide a comprehensive and descriptive characterization of cruciviruses. IMPORTANCE Viruses are the most abundant biological entities on Earth. In addition to their impact on animal and plant health, viruses have important roles in ecosystem dynamics as well as in the evolution of the biosphere. Circular Rep-encoding single-stranded (CRESS) DNA viruses are ubiquitous in nature, many are agriculturally important, and they appear to have multiple origins from prokaryotic plasmids. A subset of CRESS-DNA viruses, the cruciviruses, have homologues of capsid proteins encoded by RNA viruses. The genetic structure of cruciviruses attests to the transfer of capsid genes between disparate groups of viruses. However, the evolutionary history of cruciviruses is still unclear. By collecting and analyzing cruciviral sequence data, we provide a deeper insight into the evolutionary intricacies of cruciviruses. Our results reveal an unexpected diversity of this virus group, with frequent recombination as an important determinant of variability.

Download Full-text