scholarly journals BinChecker: a new algorithm for quality assessment of microbial draft genomes

2021 ◽  
Author(s):  
Heiner Klingenberg ◽  
Peter Meinicke

AbstractIn the reconstruction of microbial genomes from metagenomic sequence data, the estimation of the final completeness and possible contamination is crucial for quality control. In metagenomics candidate genomes are usually obtained from a metagenome assembly and a subsequent binning of the assembled contigs. BinChecker provides a novel approach to quality assessment that is based on a fast protein domain search and a clustering approach for identification of marker domain (“feature”) sets. The feature sets that are used for estimation are not pre-computed for a given database of reference genomes, but are individually found for each bin by adaptive clustering and feature selection. In particular, the adaptivity facilitates the creation and extension of the underlying database, which just requires to add protein feature profiles of reference genomes. Tests with simulated bins indicate that the prediction accuracy of BinChecker meets the current state of the art while providing significant advantages in terms of speed and flexibility.

2020 ◽  
Author(s):  
Vitalii Stebliankin ◽  
Musfiqur Rahman Sazal ◽  
Camilo Valdes ◽  
Kalai Mathee ◽  
Giri Narasimhan

Motivation: Metagenomics sequencing data can be used to compute not just the relative abundance profile, but also the replication rates of every taxon in the microbiome sample. We investigate how the dynamics implied by the replication rates can be used to understand the antibiotic response in microbiomes, given the significant variation in the types of antibiotics and the types of response by different taxa. The analysis is further expanded by factoring in the resistome of the microbiomes, which can be readily profiled from the metagenomic sequence data. The fact that some antibiotics such as β -lactams target replicating cells makes it even more critical to use replication rates to analyze the antibiotic response. Results: We introduce a novel approach for metagenomic analysis that integrates microbial community profiling, replication rate calculation, and causal structural learning to analyze the antibiotic response. First, we developed PeTRi, which involves efficient cluster computation of bacterial replication rates from metagenomic sequence data. Second, we integrate the abundance profile, replication profile, resistome profile, and environmental variables to perform causality analysis. Finally, we applied the integrated analysis to the data from an infant gut microbiome study. Conclusions from our analysis are as follows: (i) Microbes tend to lower their replication rates in response to β -lactams; (ii) The presence of antibiotic resistance genes combined with the causality analysis strongly suggest that genes fosA5, oqxA, kpnF, arnA, and acrA provides resistance for the taxon K. pneumoniae, allowing it to replicate and dominate the microbiome after the drug ticarcillin-clavulanate was administered; and (iii) Human and donor milk strongly influence the resistome of the infant gut microbiome.


2008 ◽  
Vol 74 (10) ◽  
pp. 2933-2939 ◽  
Author(s):  
Erin J. Biers ◽  
Kui Wang ◽  
Catherine Pennington ◽  
Robert Belas ◽  
Feng Chen ◽  
...  

ABSTRACT Genes with homology to the transduction-like gene transfer agent (GTA) were observed in genome sequences of three cultured members of the marine Roseobacter clade. A broader search for homologs for this host-controlled virus-like gene transfer system identified likely GTA systems in cultured Alphaproteobacteria, and particularly in marine bacterioplankton representatives. Expression of GTA genes and extracellular release of GTA particles (∼50 to 70 nm) was demonstrated experimentally for the Roseobacter clade member Silicibacter pomeroyi DSS-3, and intraspecific gene transfer was documented. GTA homologs are surprisingly infrequent in marine metagenomic sequence data, however, and the role of this lateral gene transfer mechanism in ocean bacterioplankton communities remains unclear.


2009 ◽  
Vol 25 (19) ◽  
pp. 2607-2608 ◽  
Author(s):  
M. Morgan ◽  
S. Anders ◽  
M. Lawrence ◽  
P. Aboyoun ◽  
H. Pages ◽  
...  

2020 ◽  
Vol 36 (10) ◽  
pp. 3011-3017 ◽  
Author(s):  
Olga Mineeva ◽  
Mateo Rojas-Carulla ◽  
Ruth E Ley ◽  
Bernhard Schölkopf ◽  
Nicholas D Youngblut

Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. Availability and implementation DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. Supplementary information Supplementary data are available at Bioinformatics online.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 726
Author(s):  
Mike W.C. Thang ◽  
Xin-Yi Chua ◽  
Gareth Price ◽  
Dominique Gorse ◽  
Matt A. Field

Metagenomic sequencing is an increasingly common tool in environmental and biomedical sciences.  While software for detailing the composition of microbial communities using 16S rRNA marker genes is relatively mature, increasingly researchers are interested in identifying changes exhibited within microbial communities under differing environmental conditions. In order to gain maximum value from metagenomic sequence data we must improve the existing analysis environment by providing accessible and scalable computational workflows able to generate reproducible results. Here we describe a complete end-to-end open-source metagenomics workflow running within Galaxy for 16S differential abundance analysis. The workflow accepts 454 or Illumina sequence data (either overlapping or non-overlapping paired end reads) and outputs lists of the operational taxonomic unit (OTUs) exhibiting the greatest change under differing conditions. A range of analysis steps and graphing options are available giving users a high-level of control over their data and analyses. Additionally, users are able to input complex sample-specific metadata information which can be incorporated into differential analysis and used for grouping / colouring within graphs.  Detailed tutorials containing sample data and existing workflows are available for three different input types: overlapping and non-overlapping read pairs as well as for pre-generated Biological Observation Matrix (BIOM) files. Using the Galaxy platform we developed MetaDEGalaxy, a complete metagenomics differential abundance analysis workflow. MetaDEGalaxy is designed for bench scientists working with 16S data who are interested in comparative metagenomics.  MetaDEGalaxy builds on momentum within the wider Galaxy metagenomics community with the hope that more tools will be added as existing methods mature.


2018 ◽  
Author(s):  
Simone Marini ◽  
Francesca Vitali ◽  
Sara Rampazzi ◽  
Andrea Demartini ◽  
Tatsuya Akutsu

AbstractMotivationProtein cleavage is an important cellular event, involved in a myriad of processes, from apoptosis to immune response. Bioinformatics provides in silico tools, such as machine learning-based models, to guide target discovery. State-of-the-art models have a scope limited to specific protease families (such as Caspases), and do not explicitly include biological or medical knowledge (such as the hierarchical protein domain similarity, or gene-gene interactions). To fill this gap, we present a novel approach for protease target prediction based on data integration.ResultsBy representing protease-protein target information in the form of relational matrices, we design a model that: (a) is general, i.e., not limited to a single protease family; and (b) leverages on the available knowledge, managing extremely sparse data from heterogeneous data sources, including primary sequence, pathways, domains, and interactions from nine databases. When compared to other algorithms on test data, our approach provides a better performance even for models specifically focusing on a single protease family.Availabilityhttps://gitlab.com/smarini/MaDDA/ (Matlab code and utilized data.)[email protected], or [email protected]


Author(s):  
Frederik Schulz ◽  
Julien Andreani ◽  
Rania Francis ◽  
Jacques Yaacoub Bou Khalil ◽  
Janey Lee ◽  
...  

AbstractGiant viruses have large genomes, often within the size range of cellular organisms. This distinguishes them from most other viruses and demands additional effort for the successful recovery of their genomes from environmental sequence data. Here we tested the performance of genome-resolved metagenomics on a recently isolated giant virus, Fadolivirus, by spiking it into an environmental sample from which two other giant viruses were isolated. At high spike-in levels, metagenome assembly and binning led to the successful genomic recovery of Fadolivirus from the sample. A complementary survey of viral hallmark genes indicated the presence of other giant viruses in the sample matrix, but did not detect the two isolated from this sample. Our results indicate that genome-resolved metagenomics is a valid approach for the recovery of near-complete giant virus genomes given that sufficient clonal particles are present. Our data also underline that a vast majority of giant viruses remain currently undetected, even in an era of terabase-scale metagenomics.


Microbiome ◽  
2019 ◽  
Vol 7 (1) ◽  
Author(s):  
Josef Wagner ◽  
Ewan M. Harrison ◽  
Marcos Martinez Del Pero ◽  
Beth Blane ◽  
Gert Mayer ◽  
...  

Abstract Background Ear, nose and throat involvement in granulomatosis with polyangiitis (GPA) is frequently the initial disease manifestation. Previous investigations have observed a higher prevalence of Staphylococcus aureus in patients with GPA, and chronic nasal carriage has been linked with an increased risk of disease relapse. In this cross-sectional study, we investigated changes in the nasal microbiota including a detailed analysis of Staphylococcus spp. by shotgun metagenomics in patients with active and inactive granulomatosis with polyangiitis (GPA). Shotgun metagenomic sequence data were also used to identify protein-encoding genes within the SEED database, and the abundance of proteins then correlated with the presence of bacterial species on an annotated heatmap. Results The presence of S. aureus in the nose as assessed by culture was more frequently detected in patients with active GPA (66.7%) compared with inactive GPA (34.1%). Beta diversity analysis of nasal microbiota by bacterial 16S rRNA profiling revealed a different composition between GPA patients and healthy controls (P = 0.039). Beta diversity analysis of shotgun metagenomic sequence data for Staphylococcus spp. revealed a different composition between active GPA patients and healthy controls and disease controls (P = 0.0007 and P = 0.0023, respectively), and between healthy controls and inactive GPA patients and household controls (P = 0.0168 and P = 0.0168, respectively). Patients with active GPA had a higher abundance of S. aureus, mirroring the culture data, while healthy controls had a higher abundance of S. epidermidis. Staphylococcus pseudintermedius, generally assumed to be a pathogen of cats and dogs, showed an abundance of 13% among the Staphylococcus spp. in our cohort. During long-term follow-up of patients with inactive GPA at baseline, a higher S. aureus abundance was not associated with an increased relapse risk. Functional analyses identified ten SEED protein subsystems that differed between the groups. Most significant associations were related to chorismate synthesis and involved in the vitamin B12 pathway. Conclusion Our data revealed a distinct dysbiosis of the nasal microbiota in GPA patients compared with disease and healthy controls. Metagenomic sequencing demonstrated that this dysbiosis in active GPA patients is manifested by increased abundance of S. aureus and a depletion of S. epidermidis, further demonstrating the antagonist relationships between these species. SEED functional protein subsystem analysis identified an association between the unique bacterial nasal microbiota clusters seen mainly in GPA patients and an elevated abundance of genes associated with chorismate synthesis and vitamin B12 pathways. Further studies are required to further elucidate the relationship between the biosynthesis genes and the associated bacterial species.


Sign in / Sign up

Export Citation Format

Share Document