scholarly journals Novel Cell-Virus-Virophage Tripartite Infection Systems Discovered in the Freshwater Lake Dishui Lake in Shanghai, China

2020 ◽  
Vol 94 (11) ◽  
Author(s):  
Shengzhong Xu ◽  
Liang Zhou ◽  
Xiaosha Liang ◽  
Yifan Zhou ◽  
Hao Chen ◽  
...  

ABSTRACT Virophages are small parasitic double-stranded DNA (dsDNA) viruses of giant dsDNA viruses infecting unicellular eukaryotes. Except for a few isolated virophages characterized by parasitization mechanisms, features of virophages discovered in metagenomic data sets remain largely unknown. Here, the complete genomes of seven virophages (26.6 to 31.5 kbp) and four large DNA viruses (190.4 to 392.5 kbp) that coexist in the freshwater lake Dishui Lake, Shanghai, China, have been identified based on environmental metagenomic investigation. Both genomic and phylogenetic analyses indicate that Dishui Lake virophages (DSLVs) are closely related to each other and to other lake virophages, and Dishui Lake large DNA viruses are affiliated with the micro-green alga-infecting Prasinovirus of the Phycodnaviridae (named Dishui Lake phycodnaviruses [DSLPVs]) and protist (protozoan and alga)-infecting Mimiviridae (named Dishui Lake large alga virus [DSLLAV]). The DSLVs possess more genes with closer homology to that of large alga viruses than to that of giant protozoan viruses. Furthermore, the DSLVs are strongly associated with large green alga viruses, including DSLPV4 and DSLLAV1, based on codon usage as well as oligonucleotide frequency and correlation analyses. Surprisingly, a nonhomologous CRISPR-Cas like system is found in DSLLAV1, which appears to protect DSLLAV1 from the parasitization of DSLV5 and DSLV8. These results suggest that novel cell-virus-virophage (CVv) tripartite infection systems of green algae, large green alga virus (Phycodnaviridae- and Mimiviridae-related), and virophage exist in Dishui Lake, which will contribute to further deep investigations of the evolutionary interaction of virophages and large alga viruses as well as of the essential roles that the CVv plays in the ecology of algae. IMPORTANCE Virophages are small parasitizing viruses of large/giant viruses. To our knowledge, the few isolated virophages all parasitize giant protozoan viruses (Mimiviridae) for propagation and form a tripartite infection system with hosts, here named the cell-virus-virophage (CVv) system. However, the CVv system remains largely unknown in environmental metagenomic data sets. In this study, we systematically investigated the metagenomic data set from the freshwater lake Dishui Lake, Shanghai, China. Consequently, four novel large alga viruses and seven virophages were discovered to coexist in Dishui Lake. Surprisingly, a novel CVv tripartite infection system comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages was identified based on genetic link, genomic signature, and CRISPR system analyses. Meanwhile, a nonhomologous CRISPR-like system was found in Dishui Lake large alga viruses, which appears to protect the virus host from the infection of Dishui Lake virophages (DSLVs). These findings are critical to give insight into the potential significance of CVv in global evolution and ecology.

2014 ◽  
Vol 104 (10) ◽  
pp. 1125-1129 ◽  
Author(s):  
A. H. Stobbe ◽  
W. L. Schneider ◽  
P. R. Hoyt ◽  
U. Melcher

Next generation sequencing (NGS) is not used commonly in diagnostics, in part due to the large amount of time and computational power needed to identify the taxonomic origin of each sequence in a NGS data set. By using the unassembled NGS data sets as the target for searches, pathogen-specific sequences, termed e-probes, could be used as queries to enable detection of specific viruses or organisms in plant sample metagenomes. This method, designated e-probe diagnostic nucleic acid assay, first tested with mock sequence databases, was tested with NGS data sets generated from plants infected with a DNA (Bean golden yellow mosaic virus, BGYMV) or an RNA (Plum pox virus, PPV) virus. In addition, the ability to detect and differentiate among strains of a single virus species, PPV, was examined by using probe sets that were specific to strains. The use of probe sets for multiple viruses determined that one sample was dually infected with BGYMV and Bean golden mosaic virus.


mSystems ◽  
2021 ◽  
Vol 6 (3) ◽  
Author(s):  
Christian Milani ◽  
Gabriele Andrea Lugli ◽  
Federico Fontana ◽  
Leonardo Mancabelli ◽  
Giulia Alessandri ◽  
...  

We developed a novel tool, i.e., METAnnotatorX2, that includes a number of new advanced features for analysis of deep and shallow metagenomic data sets and is accompanied by (regularly updated) customized databases for archaea, bacteria, fungi, protists, and viruses. Both software and databases were developed so as to maximize sensitivity and specificity while including support for shallow metagenomic data sets.


mSystems ◽  
2018 ◽  
Vol 3 (3) ◽  
Author(s):  
Luis M. Rodriguez-R ◽  
Santosh Gunturu ◽  
James M. Tiedje ◽  
James R. Cole ◽  
Konstantinos T. Konstantinidis

ABSTRACT Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k -mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity ( N d ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that N d additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes. IMPORTANCE Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.


mSphere ◽  
2020 ◽  
Vol 5 (3) ◽  
Author(s):  
Lamia Wahba ◽  
Nimit Jain ◽  
Andrew Z. Fire ◽  
Massa J. Shoura ◽  
Karen L. Artiles ◽  
...  

ABSTRACT In numerous instances, tracking the biological significance of a nucleic acid sequence can be augmented through the identification of environmental niches in which the sequence of interest is present. Many metagenomic data sets are now available, with deep sequencing of samples from diverse biological niches. While any individual metagenomic data set can be readily queried using web-based tools, meta-searches through all such data sets are less accessible. In this brief communication, we demonstrate such a meta-metagenomic approach, examining close matches to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in all high-throughput sequencing data sets in the NCBI Sequence Read Archive accessible with the “virome” keyword. In addition to the homology to bat coronaviruses observed in descriptions of the SARS-CoV-2 sequence (F. Wu, S. Zhao, B. Yu, Y. M. Chen, et al., Nature 579:265–269, 2020, https://doi.org/10.1038/s41586-020-2008-3; P. Zhou, X. L. Yang, X. G. Wang, B. Hu, et al., Nature 579:270–273, 2020, https://doi.org/10.1038/s41586-020-2012-7), we note a strong homology to numerous sequence reads in metavirome data sets generated from the lungs of deceased pangolins reported by Liu et al. (P. Liu, W. Chen, and J. P. Chen, Viruses 11:979, 2019, https://doi.org/10.3390/v11110979). While analysis of these reads indicates the presence of a similar viral sequence in pangolin lung, the similarity is not sufficient to either confirm or rule out a role for pangolins as an intermediate host in the recent emergence of SARS-CoV-2. In addition to the implications for SARS-CoV-2 emergence, this study illustrates the utility and limitations of meta-metagenomic search tools in effective and rapid characterization of potentially significant nucleic acid sequences. IMPORTANCE Meta-metagenomic searches allow for high-speed, low-cost identification of potentially significant biological niches for sequences of interest.


2021 ◽  
Vol 12 ◽  
Author(s):  
Julien Andreani ◽  
Frederik Schulz ◽  
Fabrizio Di Pinto ◽  
Anthony Levasseur ◽  
Tanja Woyke ◽  
...  

Since the discovery of Mimivirus, viruses with large genomes encoding components of the translation machinery and other cellular processes have been described as belonging to the nucleocytoplasmic large DNA viruses. Recently, genome-resolved metagenomics led to the discovery of more than 40 viruses that have been grouped together in a proposed viral subfamily named Klosneuvirinae. Members of this group had genomes of up to 2.4Mb in size and featured an expanded array of translation system genes. Yet, despite the large diversity of the Klosneuvirinae in metagenomic data, there are currently only two isolates available. Here, we report the isolation of a novel giant virus known as Fadolivirus from an Algerian sewage site and provide morphological data throughout its replication cycle in amoeba and a detailed genomic characterization. The Fadolivirus genome, which is more than 1.5Mb in size, encodes 1,452 predicted proteins and phylogenetic analyses place this viral isolate as a near relative of the metagenome assembled Klosneuvirus and Indivirus. The genome encodes for 66 tRNAs, 23 aminoacyl-tRNA synthetases and a wide range of transcription factors, surpassing Klosneuvirus and other giant viruses. The Fadolivirus genome also encodes putative vacuolar-type proton pumps with the domains D and A, potentially constituting a virus-derived system for energy generation. The successful isolation of Fadolivirus will enable future hypothesis-driven experimental studies providing deeper insights into the biology of the Klosneuvirinae.


2021 ◽  
Author(s):  
Morgan Gaia ◽  
Lingjie Meng ◽  
Eric Pelletier ◽  
Patrick Forterre ◽  
Chiara Vanni ◽  
...  

Large and giant DNA viruses of the phylum Nucleocytoviricota have a profound influence on the ecology and evolution of planktonic eukaryotes. Recently, various Nucleocytoviricota genomes have been characterized from environmental metagenomes based on the occurrence of hallmark genes identified from cultures. However, lineages diverging from the culture genomics functional principles have been overlooked thus far. Here, we developed a phylogeny-guided genome-resolved metagenomic framework using a single hallmark gene as compass, a subunit of DNA-dependent RNA polymerase encoded by most Nucleocytoviricota. We applied this method to large metagenomic data sets from the surface of five oceans and two seas and characterized 697 non-redundant Nucleocytoviricota genomes up to 1.45 Mbp in length. This database expands the known diversity of the class Megaviricetes and revealed two additional putative classes we named Proculviricetes and Mirusviricetes. Critically, the diverse and prevalent Mirusviricetes population genomes seemingly lack several hallmark genes, in particular those related to viral particle morphogenesis. Instead, they share various genes of known (e.g., TATA-binding proteins, histones, proteases and viral rhodopsins) and unknown functions rarely detected if not entirely missing in all other characterized Nucleocytoviricota lineages. Phylogenomics, comparative genomics, functional trends and the signal among planktonic cellular size fractions point to Mirusviricetes being a major, functionally divergent class of large DNA viruses that actively infect eukaryotes in the sunlit ocean using an enigmatic functional life style. Finally, we built a comprehensive marine genomic database for Nucleocytoviricota by combining multiple environmental surveys that might contribute to future endeavors exploring the ecology and evolution of plankton.


1995 ◽  
Vol 73 (S1) ◽  
pp. 649-659 ◽  
Author(s):  
François Lutzoni ◽  
Rytas Vilgalys

To provide a clearer picture of fungal species relationships, increased efforts are being made to include both molecular and morphological data sets in phylogenetic studies. This general practice in systematics has raised many unresolved questions and controversies regarding how to best integrate the phylogenetic information revealed by morphological and molecular characters. This is because phylogenetic trees derived using different data sets are rarely identical. Such discrepancies can be due to sampling error, to the use of an inappropriate evolutionary model for a given data set, or to different phylogenetic histories between the organisms and the molecule. Methods have been developed recently to test for heterogeneity among data sets, although none of these methods have been subjected to simulation studies. In this paper we compare three tests: a protocol described by Rodrigo et al., an adapted version of Faith's T-PTP test, and Kishino and Hasegawa's likelihood test. These tests were empirically compared using seven lichenized and nonlichenized Omphalina species and the related species Arrhenia lobata (Basidiomycota, Agaricales) for which nrDNA large subunit sequences and morphological data were gathered. The results of these three tests were inconsistent, Rodrigo's test being the only one suggesting that the two data sets could be combined. One of the three most parsimonious trees obtained from the combined data set with eight species is totally congruent with the relationships among the same eight species in an analysis restricted to the same portion of the nrDNA large subunit but extended to 26 species of Omphalina and related genera. Therefore, the results from phylogenetic analyses of this large molecular data set converged on one of the three most parsimonious topologies generated by the combined data set analysis. This topology was not recovered from either data set when analysed separately. This suggests that Rodrigo's homogeneity test might be better suited than the two other tests for determining if trees obtained from different data sets are sampling statistics of the same phylogenetic history. Key words: data sets heterogeneity, homogeneity test, lichen phylogeny, Omphalina, ribosomal DNA.


Genetics ◽  
2000 ◽  
Vol 155 (2) ◽  
pp. 765-775
Author(s):  
Rafael Zardoya ◽  
Axel Meyer

Abstract The complete nucleotide sequence (17,005 bp) of the mitochondrial genome of the caecilian Typhlonectes natans (Gymnophiona, Amphibia) was determined. This molecule is characterized by two distinctive genomic features: there are seven large 109-bp tandem repeats in the control region, and the sequence for the putative origin of replication of the L strand can potentially fold into two alternative secondary structures (one including part of the tRNACys). The new sequence data were used to assess the phylogenetic position of caecilians and to gain insights into the origin of living amphibians (frogs, salamanders, and caecilians). Phylogenetic analyses of two data sets—one combining protein-coding genes and the other combining tRNA genes—strongly supported a caecilian + frog clade and, hence, monophyly of modern amphibians. These two data sets could not further resolve relationships among the coelacanth, lungfishes, and tetrapods, but strongly supported diapsid affinities of turtles. Phylogenetic relationships among a larger set of species of frogs, salamanders, and caecilians were estimated with a mitochondrial rRNA data set. Maximum parsimony analysis of this latter data set also recovered monophyly of living amphibians and favored a frog + salamander (Batrachia) relationship. However, bootstrap support was only moderate at these nodes. This is likely due to an extensive among-site rate heterogeneity in the rRNA data set and the narrow window of time in which the three main groups of living amphibians were originated.


2012 ◽  
Vol 30 (2) ◽  
pp. 253-262 ◽  
Author(s):  
Martyna Molak ◽  
Eline D. Lorenzen ◽  
Beth Shapiro ◽  
Simon Y.W. Ho

Abstract In recent years, ancient DNA has increasingly been used for estimating molecular timescales, particularly in studies of substitution rates and demographic histories. Molecular clocks can be calibrated using temporal information from ancient DNA sequences. This information comes from the ages of the ancient samples, which can be estimated by radiocarbon dating the source material or by dating the layers in which the material was deposited. Both methods involve sources of uncertainty. The performance of Bayesian phylogenetic inference depends on the information content of the data set, which includes variation in the DNA sequences and the structure of the sample ages. Various sources of estimation error can reduce our ability to estimate rates and timescales accurately and precisely. We investigated the impact of sample-dating uncertainties on the estimation of evolutionary timescale parameters using the software BEAST. Our analyses involved 11 published data sets and focused on estimates of substitution rate and root age. We show that, provided that samples have been accurately dated and have a broad temporal span, it might be unnecessary to account for sample-dating uncertainty in Bayesian phylogenetic analyses of ancient DNA. We also investigated the sample size and temporal span of the ancient DNA sequences needed to estimate phylogenetic timescales reliably. Our results show that the range of sample ages plays a crucial role in determining the quality of the results but that accurate and precise phylogenetic estimates of timescales can be made even with only a few ancient sequences. These findings have important practical consequences for studies of molecular rates, timescales, and population dynamics.


2021 ◽  
Vol 87 (6) ◽  
Author(s):  
Alexandra Meziti ◽  
Luis M. Rodriguez-R ◽  
Janet K. Hatt ◽  
Angela Peña-Gonzalez ◽  
Karen Levy ◽  
...  

ABSTRACT The recovery of metagenome-assembled genomes (MAGs) from metagenomic data has recently become a common task for microbial studies. The strengths and limitations of the underlying bioinformatics algorithms are well appreciated by now based on performance tests with mock data sets of known composition. However, these mock data sets do not capture the complexity and diversity often observed within natural populations, since their construction typically relies on only a single genome of a given organism. Further, it remains unclear if MAGs can recover population-variable genes (those shared by >10% but <90% of the members of the population) as efficiently as core genes (those shared by >90% of the members). To address these issues, we compared the gene variabilities of pathogenic Escherichia coli isolates from eight diarrheal samples, for which the isolate was the causative agent, against their corresponding MAGs recovered from the companion metagenomic data set. Our analysis revealed that MAGs with completeness estimates near 95% captured only 77% of the population core genes and 50% of the variable genes, on average. Further, about 5% of the genes of these MAGs were conservatively identified as missing in the isolate and were of different (non-Enterobacteriaceae) taxonomic origin, suggesting errors at the genome-binning step, even though contamination estimates based on commonly used pipelines were only 1.5%. Therefore, the quality of MAGs may often be worse than estimated, and we offer examples of how to recognize and improve such MAGs to sufficient quality by (for instance) employing only contigs longer than 1,000 bp for binning. IMPORTANCE Metagenome assembly and the recovery of metagenome-assembled genomes (MAGs) have recently become common tasks for microbiome studies across environmental and clinical settings. However, the extent to which MAGs can capture the genes of the population they represent remains speculative. Current approaches to evaluating MAG quality are limited to the recovery and copy number of universal housekeeping genes, which represent a small fraction of the total genome, leaving the majority of the genome essentially inaccessible. If MAG quality in reality is lower than these approaches would estimate, this could have dramatic consequences for all downstream analyses and interpretations. In this study, we evaluated this issue using an approach that employed comparisons of the gene contents of MAGs to the gene contents of isolate genomes derived from the same sample. Further, our samples originated from a diarrhea case-control study, and thus, our results are relevant for recovering the virulence factors of pathogens from metagenomic data sets.


Sign in / Sign up

Export Citation Format

Share Document