taxonomic binning
Recently Published Documents


TOTAL DOCUMENTS

13
(FIVE YEARS 4)

H-INDEX

5
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Induja Chandrakumar ◽  
Nick PG Gauthier ◽  
Cassidy Nelson ◽  
Michael B Bonsall ◽  
Kerstin Locher ◽  
...  

A large gap remains between sequencing a microbial community and characterizing all of the organisms inside of it. Here we develop a novel method to taxonomically bin metagenomic assemblies through alignment of contigs against a reference database. We show that this workflow, BugSplit, bins metagenome-assembled contigs to species with a 33% absolute improvement in F1-score when compared to alternative tools. We perform nanopore mNGS on patients with COVID-19, and using a reference database predating COVID-19, demonstrate that BugSplit's taxonomic binning enables sensitive and specific detection of a novel coronavirus not possible with other approaches. When applied to nanopore mNGS data from cases of Klebsiella pneumoniae bacteremia and Neisseria gonorrhoeae infection, BugSplit's taxonomic binning accurately separates pathogen sequences from those of the host and microbiota, and unlocks the possibility of sequence typing, in silico serotyping, and antimicrobial resistance prediction of each organism within a sample. BugSplit is available at https://bugseq.com/academic.


Agriculture ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 572
Author(s):  
Massimo Ferrara ◽  
Maria Federica Sgarro ◽  
Aristide Maggiolino ◽  
Sara Damiano ◽  
Francesco Iannaccone ◽  
...  

Red orange and lemon extract (RLE) is an anthocyanins-rich dietary supplement that may influence gastrointestinal bacterial community in ruminants. The aim of the present study was to investigate the RLE effects on gut microbiota composition in lambs. Twenty-eight lambs were randomly divided into a control group (CON; n = 14) and an anthocyanin group (ANT; n = 14) and fed the same diet; additionally, only the ANT received 90 mg/kg live weight of RLE at day. After lamb slaughter (40 ± 1 days), fecal samples were collected from the rectum and stored at −20 °C until analysis. Analysis of fecal microbiome was carried out by metabarcoding analysis of 16S rRNA. After reads denoising, sequences were aligned against SILVA rRNA sequence database using MALT, and taxonomic binning was performed with MEGAN. A significant increase in Firmicutes and Bacteroidetes and a decrease in Proteobacteria and Actinobacteria was observed in ANT compared to CON. Moreover, an interesting increase of Lactobacillus and Bifidobacterium genera and a decrease in Escherichia coli and Salmonella species were detected in ANT compared to CON. Results recommend that anthocyanin supplementation in lamb diet is able to modulate positively gut microbiota and may inhibit the growth of some potential pathogenic microorganisms.


2021 ◽  
Vol 16 ◽  
Author(s):  
Brahim Matougui ◽  
Abdelbasset Boukelia ◽  
Hacene Belhadef ◽  
Clovis Galiez ◽  
Mohamed Batouche

Background: Metagenomics is the study of genomic content in mass from an environment of interest such as the human gut or soil. Taxonomy is one of the most important fields of metagenomics, which is the science of defining and naming groups of microbial organisms that share the same characteristics. The problem of taxonomy classification is the identification and quantification of microbial species or higher-level taxa sampled by high throughput sequencing. Objective: Although many methods exist to deal with the taxonomic classification problem, assignment to low taxonomic ranks remains an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. Methods: In this paper, we introduce NLP-MeTaxa, a novel composition-based method for taxonomic binning, which relies on the use of words embeddings and deep learning architecture. The new proposed approach is word-based, where the metagenomic DNA fragments are processed as a set of overlapping words by using the word2vec model to vectorize them in order to feed the deep learning model. NLP-MeTaxa output is visualized as NCBI taxonomy tree, this representation helps to show the connection between the predicted taxonomic identifiers. NLP-MeTaxa was trained on large-scale data from the NCBI RefSeq, more than 14,000 complete microbial genomes. The NLP-MeTaxa code is available at the website: https://github.com/padriba/NLP_MeTaxa/ Results: We evaluated NLP-MeTaxa with a real and simulated metagenomic dataset and compared our results to other tools' results. The experimental results have shown that our method outperforms the other methods especially for the classification of low-ranking taxonomic class such as species and genus. Conclusion: In summary, our new method might provide novel insight for understanding the microbial community through the identification of the organisms it might contain.


2020 ◽  
Author(s):  
Maud Tournoud ◽  
Etienne Ruppé ◽  
Guillaume Perrin ◽  
Stéphane Schicklin ◽  
Ghislaine Guigon ◽  
...  

AbstractBackgroundShortening the time-to-result for pathogen detection and identification and antibiotic susceptibility testing for patients with Hospital-Acquired and Ventilator-Associated pneumonia (HAP-VAP) is of great interest. For this purpose, clinical metagenomics is a promising non-hypothesis driven alternative to traditional culture-based solutions: when mature, it would allow direct sequencing all microbial genomes present in a BronchoAlveolar Lavage (BAL) sample with the purpose of simultaneously identifying pathogens and Antibiotic Resistance Genes (ARG). In this study, we describe a new bioinformatics method to detect pathogens and their ARG with good accuracy, both in mono- and polymicrobial samples.MethodsThe standard approach (hereafter called TBo), that consists in taxonomic binning of metagenomic reads followed by an assembly step, suffers from lack of sensitivity for ARG detection. Thus, we propose a new bioinformatics approach (called TBwDM) with both models and databases optimized for HAP-VAP, that performs reads mapping against ARG reference database in parallel to taxonomic binning, and joint reads assembly.ResultsIn in-silico simulated monomicrobial samples, the recall for ARG detection increased from 51% with TBo to 97.3% with TBwDM; in simulated polymicrobial infections, it increased from 41.8% to 82%. In real sequenced BAL samples (mono and polymicrobial), detected pathogens were also confirmed by traditional culture approaches. Moreover, both recall and precision for ARG detection were higher with TBwDM than with TBo (35 points difference for recall, and 7 points difference for precision).ConclusionsWe present a new bioinformatics pipeline to identify pathogens and ARG in BAL samples from patients with HAP-VAP, with higher sensitivity for ARG recovery than standard approaches and the ability to link ARG to their host pathogens.


2017 ◽  
Author(s):  
Daniel H. Huson ◽  
Benjamin Albrecht ◽  
Caner Bagci ◽  
Irina Bessarab ◽  
Anna Gorska ◽  
...  

AbstractBackgroundThere are numerous computational tools for taxonomic or functional analysis of microbiome samples, optimized to run on hundreds of millions of short, high quality sequencing reads. Programs such as MEGAN allow the user to interactively navigate these large datasets. Long read sequencing technologies continue to improve and produce increasing numbers of longer reads (of varying lengths in the range of 10k-1M bps, say), but of low quality. There is an increasing interest in using long reads in microbiome sequencing and there is a need to adapt short read tools to long read datasets.MethodsWe describe a new LCA-based algorithm for taxonomic binning, and an interval-tree based algorithm for functional binning, that are explicitly designed for long reads and assembled contigs. We provide a new interactive tool for investigating the alignment of long reads against reference sequences. For taxonomic and functional binning, we propose to use LAST to compare long reads against the NCBI-nr protein reference database so as to obtain frame-shift aware alignments, and then to process the results using our new methods.ResultsAll presented methods are implemented in the open source edition of MEGAN and we refer to this new extension as MEGAN-LR (MEGAN long read). We evaluate the LAST+MEGAN-LR approach in a simulation study, and on a number of mock community datasets consisting of Nanopore reads, PacBio reads and assembled PacBio reads. We also illustrate the practical application on a Nanopore dataset that we sequenced from an anammox bio-rector community.


2017 ◽  
Vol 5 (10) ◽  
Author(s):  
Hajime Kobayashi ◽  
Qian Fu ◽  
Haruo Maeda ◽  
Kozo Sato

ABSTRACT A draft genome of Coriobacteriaceae sp. strain EMTCatB1 was determined through taxonomic binning of a metagenome of a thermophilic biocathode actively catalyzing electromethanogenesis. This genome will provide information about the biocathode ecosystem, as well as the natural diversity of the Coriobacteriaceae family.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
J. A. Frank ◽  
Y. Pan ◽  
A. Tooming-Klunderud ◽  
V. G. H. Eijsink ◽  
A. C. McHardy ◽  
...  

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1603 ◽  
Author(s):  
Ivan Gregor ◽  
Johannes Dröge ◽  
Melanie Schirmer ◽  
Christopher Quince ◽  
Alice C. McHardy

Background.Metagenomics is an approach for characterizing environmental microbial communitiesin situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into ‘bins’ representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trainedPhyloPythiaSpackage, where a human expert decides on the taxa to incorporate in the model and identifies ‘training’ sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have.Results.We have developedPhyloPythiaS+, a successor to ourPhyloPythia(S)software. The new (+) component performs the work previously done by the human expert.PhyloPythiaS+also includes a newk-mer counting algorithm, which accelerated the simultaneous counting of 4–6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion.PhyloPythiaS+was compared toMEGAN,taxator-tk,Krakenand the genericPhyloPythiaSmodel. The results showed thatPhyloPythiaS+performs especially well for samples originating from novel environments in comparison to the other methods.Availability.PhyloPythiaS+in a virtual machine is available for installation under Windows, Unix systems or OS X on:https://github.com/algbioi/ppsp/wiki.


2016 ◽  
Vol 32 (12) ◽  
pp. 1779-1787 ◽  
Author(s):  
Magali Jaillard ◽  
Maud Tournoud ◽  
Faustine Meynier ◽  
Jean-Baptiste Veyrieras
Keyword(s):  

2015 ◽  
Author(s):  
Jeremy A. Frank ◽  
Yao Pan ◽  
Ave Tooming-Klunderud ◽  
Vincent G.H. Eijsink ◽  
Alice C. McHardy ◽  
...  

DNA assembly is a core methodological step in metagenomic pipelines used to study the structure and function within microbial communities. Here we investigate the utility of Pacific Biosciences long and high accuracy circular consensus sequencing (CCS) reads for metagenomics projects. We compared the application and performance of both PacBio CCS and Illumina HiSeq data with assembly and taxonomic binning algorithms using metagenomic samples representing a complex microbial community. Eight SMRT cells produced approximately 94 Mb of CCS reads from a biogas reactor microbiome sample, which averaged 1319 nt in length and 99.7 % accuracy. CCS data assembly generated a comparative number of large contigs greater than 1 kb, to those assembled from a ~190x larger HiSeq dataset (~18 Gb) produced from the same sample (i.e approximately 62 % of total contigs). Hybrid assemblies using PacBio CCS and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length and number of large contigs. The incorporation of CCS data produced significant enhancements in taxonomic binning and genome reconstruction of two dominant phylotypes, which assembled and binned poorly using HiSeq data alone. Collectively these results illustrate the value of PacBio CCS reads in certain metagenomics applications.


Sign in / Sign up

Export Citation Format

Share Document