scholarly journals k-SLAM: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets

2016 ◽  
pp. gkw1248 ◽  
Author(s):  
David Ainsworth ◽  
Michael J.E. Sternberg ◽  
Come Raczy ◽  
Sarah A. Butcher
2018 ◽  
Author(s):  
Raphael Eisenhofer ◽  
Laura Susan Weyrich

The field of paleomicrobiology—the study of ancient microorganisms—is rapidly growing due to recent methodological and technological advancements. It is now possible to obtain vast quantities of DNA data from ancient specimens in a high-throughput manner and use this information to investigate the dynamics and evolution of past microbial communities. However, we still know very little about how the characteristics of ancient DNA influence our ability to accurately assign microbial taxonomies (i.e. identify species) within ancient metagenomic samples. Here, we use both simulated and published metagenomic data sets to investigate how ancient DNA characteristics affect alignment-based taxonomic classification. We find that nucleotide-to-nucleotide, rather than nucleotide-to-protein, alignments are preferable when assigning taxonomies to DNA fragment lengths routinely identified within ancient specimens (<60 bp). We determine that deamination (a form of ancient DNA damage) and random sequence substitutions corresponding to ~100,000 years of genomic divergence minimally impact alignment-based classification. We also test four different reference databases and find that database choice can significantly bias the results of alignment-based taxonomic classification in ancient metagenomic studies. Finally, we perform a reanalysis of previously published ancient dental calculus data, increasing the number of microbial DNA sequences assigned taxonomically by an average of 64.2-fold and identifying microbial species previously unidentified in the original study. Overall, this study enhances our understanding of how ancient DNA characteristics influence alignment-based taxonomic classification of ancient microorganisms and provides recommendations for future paleomicrobiological studies.


2018 ◽  
Author(s):  
Raphael Eisenhofer ◽  
Laura Susan Weyrich

The field of paleomicrobiology—the study of ancient microorganisms—is rapidly growing due to recent methodological and technological advancements. It is now possible to obtain vast quantities of DNA data from ancient specimens in a high-throughput manner and use this information to investigate the dynamics and evolution of past microbial communities. However, we still know very little about how the characteristics of ancient DNA influence our ability to accurately assign microbial taxonomies (i.e. identify species) within ancient metagenomic samples. Here, we use both simulated and published metagenomic data sets to investigate how ancient DNA characteristics affect alignment-based taxonomic classification. We find that nucleotide-to-nucleotide, rather than nucleotide-to-protein, alignments are preferable when assigning taxonomies to DNA fragment lengths routinely identified within ancient specimens (<60 bp). We determine that deamination (a form of ancient DNA damage) and random sequence substitutions corresponding to ~100,000 years of genomic divergence minimally impact alignment-based classification. We also test four different reference databases and find that database choice can significantly bias the results of alignment-based taxonomic classification in ancient metagenomic studies. Finally, we perform a reanalysis of previously published ancient dental calculus data, increasing the number of microbial DNA sequences assigned taxonomically by an average of 64.2-fold and identifying microbial species previously unidentified in the original study. Overall, this study enhances our understanding of how ancient DNA characteristics influence alignment-based taxonomic classification of ancient microorganisms and provides recommendations for future paleomicrobiological studies.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6594 ◽  
Author(s):  
Raphael Eisenhofer ◽  
Laura Susan Weyrich

The field of palaeomicrobiology—the study of ancient microorganisms—is rapidly growing due to recent methodological and technological advancements. It is now possible to obtain vast quantities of DNA data from ancient specimens in a high-throughput manner and use this information to investigate the dynamics and evolution of past microbial communities. However, we still know very little about how the characteristics of ancient DNA influence our ability to accurately assign microbial taxonomies (i.e. identify species) within ancient metagenomic samples. Here, we use both simulated and published metagenomic data sets to investigate how ancient DNA characteristics affect alignment-based taxonomic classification. We find that nucleotide-to-nucleotide, rather than nucleotide-to-protein, alignments are preferable when assigning taxonomies to short DNA fragment lengths routinely identified within ancient specimens (<60 bp). We determine that deamination (a form of ancient DNA damage) and random sequence substitutions corresponding to ∼100,000 years of genomic divergence minimally impact alignment-based classification. We also test four different reference databases and find that database choice can significantly bias the results of alignment-based taxonomic classification in ancient metagenomic studies. Finally, we perform a reanalysis of previously published ancient dental calculus data, increasing the number of microbial DNA sequences assigned taxonomically by an average of 64.2-fold and identifying microbial species previously unidentified in the original study. Overall, this study enhances our understanding of how ancient DNA characteristics influence alignment-based taxonomic classification of ancient microorganisms and provides recommendations for future palaeomicrobiological studies.


2014 ◽  
Vol 104 (10) ◽  
pp. 1125-1129 ◽  
Author(s):  
A. H. Stobbe ◽  
W. L. Schneider ◽  
P. R. Hoyt ◽  
U. Melcher

Next generation sequencing (NGS) is not used commonly in diagnostics, in part due to the large amount of time and computational power needed to identify the taxonomic origin of each sequence in a NGS data set. By using the unassembled NGS data sets as the target for searches, pathogen-specific sequences, termed e-probes, could be used as queries to enable detection of specific viruses or organisms in plant sample metagenomes. This method, designated e-probe diagnostic nucleic acid assay, first tested with mock sequence databases, was tested with NGS data sets generated from plants infected with a DNA (Bean golden yellow mosaic virus, BGYMV) or an RNA (Plum pox virus, PPV) virus. In addition, the ability to detect and differentiate among strains of a single virus species, PPV, was examined by using probe sets that were specific to strains. The use of probe sets for multiple viruses determined that one sample was dually infected with BGYMV and Bean golden mosaic virus.


mSystems ◽  
2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Gongchao Jing ◽  
Lu Liu ◽  
Zengbin Wang ◽  
Yufeng Zhang ◽  
Li Qian ◽  
...  

ABSTRACT Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.


2018 ◽  
Author(s):  
Arghavan Bahadorinejad ◽  
Ivan Ivanov ◽  
Johanna W Lampe ◽  
Meredith AJ Hullar ◽  
Robert S Chapkin ◽  
...  

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.


mSystems ◽  
2021 ◽  
Vol 6 (3) ◽  
Author(s):  
Christian Milani ◽  
Gabriele Andrea Lugli ◽  
Federico Fontana ◽  
Leonardo Mancabelli ◽  
Giulia Alessandri ◽  
...  

We developed a novel tool, i.e., METAnnotatorX2, that includes a number of new advanced features for analysis of deep and shallow metagenomic data sets and is accompanied by (regularly updated) customized databases for archaea, bacteria, fungi, protists, and viruses. Both software and databases were developed so as to maximize sensitivity and specificity while including support for shallow metagenomic data sets.


2019 ◽  
Vol 3 ◽  
Author(s):  
Shruthi Magesh ◽  
Viktor Jonsson ◽  
Johan Bengtsson-Palme

Metagenomics has emerged as a central technique for studying the structure and function of microbial communities. Often the functional analysis is restricted to classification into broad functional categories. However, important phenotypic differences, such as resistance to antibiotics, are often the result of just one or a few point mutations in otherwise identical sequences. Bioinformatic methods for metagenomic analysis have generally been poor at accounting for this fact, resulting in a somewhat limited picture of important aspects of microbial communities. Here, we address this problem by providing a software tool called Mumame, which can distinguish between wildtype and mutated sequences in shotgun metagenomic data and quantify their relative abundances. We demonstrate the utility of the tool by quantifying antibiotic resistance mutations in several publicly available metagenomic data sets. We also identified that sequencing depth is a key factor to detect rare mutations. Therefore, much larger numbers of sequences may be required for reliable detection of mutations than for most other applications of shotgun metagenomics. Mumame is freely available online (http://microbiology.se/software/mumame).


2013 ◽  
Vol 87 (8) ◽  
pp. 4225-4236 ◽  
Author(s):  
J. Zhou ◽  
W. Zhang ◽  
S. Yan ◽  
J. Xiao ◽  
Y. Zhang ◽  
...  
Keyword(s):  

2020 ◽  
Vol 94 (11) ◽  
Author(s):  
Shengzhong Xu ◽  
Liang Zhou ◽  
Xiaosha Liang ◽  
Yifan Zhou ◽  
Hao Chen ◽  
...  

ABSTRACT Virophages are small parasitic double-stranded DNA (dsDNA) viruses of giant dsDNA viruses infecting unicellular eukaryotes. Except for a few isolated virophages characterized by parasitization mechanisms, features of virophages discovered in metagenomic data sets remain largely unknown. Here, the complete genomes of seven virophages (26.6 to 31.5 kbp) and four large DNA viruses (190.4 to 392.5 kbp) that coexist in the freshwater lake Dishui Lake, Shanghai, China, have been identified based on environmental metagenomic investigation. Both genomic and phylogenetic analyses indicate that Dishui Lake virophages (DSLVs) are closely related to each other and to other lake virophages, and Dishui Lake large DNA viruses are affiliated with the micro-green alga-infecting Prasinovirus of the Phycodnaviridae (named Dishui Lake phycodnaviruses [DSLPVs]) and protist (protozoan and alga)-infecting Mimiviridae (named Dishui Lake large alga virus [DSLLAV]). The DSLVs possess more genes with closer homology to that of large alga viruses than to that of giant protozoan viruses. Furthermore, the DSLVs are strongly associated with large green alga viruses, including DSLPV4 and DSLLAV1, based on codon usage as well as oligonucleotide frequency and correlation analyses. Surprisingly, a nonhomologous CRISPR-Cas like system is found in DSLLAV1, which appears to protect DSLLAV1 from the parasitization of DSLV5 and DSLV8. These results suggest that novel cell-virus-virophage (CVv) tripartite infection systems of green algae, large green alga virus (Phycodnaviridae- and Mimiviridae-related), and virophage exist in Dishui Lake, which will contribute to further deep investigations of the evolutionary interaction of virophages and large alga viruses as well as of the essential roles that the CVv plays in the ecology of algae. IMPORTANCE Virophages are small parasitizing viruses of large/giant viruses. To our knowledge, the few isolated virophages all parasitize giant protozoan viruses (Mimiviridae) for propagation and form a tripartite infection system with hosts, here named the cell-virus-virophage (CVv) system. However, the CVv system remains largely unknown in environmental metagenomic data sets. In this study, we systematically investigated the metagenomic data set from the freshwater lake Dishui Lake, Shanghai, China. Consequently, four novel large alga viruses and seven virophages were discovered to coexist in Dishui Lake. Surprisingly, a novel CVv tripartite infection system comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages was identified based on genetic link, genomic signature, and CRISPR system analyses. Meanwhile, a nonhomologous CRISPR-like system was found in Dishui Lake large alga viruses, which appears to protect the virus host from the infection of Dishui Lake virophages (DSLVs). These findings are critical to give insight into the potential significance of CVv in global evolution and ecology.


Sign in / Sign up

Export Citation Format

Share Document