scholarly journals Capturing variation in metagenomic assembly graphs with MetaCortex

2021 ◽  
Author(s):  
Samuel Martin ◽  
Martin Ayling ◽  
Livia Patrono ◽  
Mario Caccamo ◽  
Pablo Murcia ◽  
...  

The assembly of contiguous sequence from metagenomic samples presents a particular challenge, due to the presence of multiple species, often closely related, at varying levels of abundance. Capturing diversity within species, for example viral haplotypes, or bacterial strain-level diversity, is even more challenging. We present MetaCortex, a metagenome assembler based on data structures from the Cortex de novo assembler. MetaCortex captures intra-species diversity by searching for signatures of local variation along assembled sequences in the underlying assembly graph and outputting these sequences in sequence graph format. MetaCortex also implements a novel assembly algorithm for representing intra-species diversity in standard linear format. We show that MetaCortex produces accurate assemblies with higher genome coverage and contiguity than other popular metagenomic assemblers on mock viral communities with high levels of strain level diversity, and on simulated communities containing simulated strains. We also show that accuracy can be increased further by using the sequence graph produced by MetaCortex to create highly accurate single contig sequences.

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6902 ◽  
Author(s):  
Simon Roux ◽  
Gareth Trubl ◽  
Danielle Goudeau ◽  
Nandita Nath ◽  
Estelle Couradeau ◽  
...  

Background Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes. Conclusions PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.


2017 ◽  
Author(s):  
Victoria Cepeda ◽  
Bo Liu ◽  
Mathieu Almeida ◽  
Christopher M. Hill ◽  
Sergey Koren ◽  
...  

ABSTRACTMetagenomic studies have primarily relied on de novo approaches for reconstructing genes and genomes from microbial mixtures. While database driven approaches have been employed in certain analyses, they have not been used in the assembly of metagenomes. Here we describe the first effective approach for reference-guided metagenomic assembly of low-abundance bacterial genomes that can complement and improve upon de novo metagenomic assembly methods. When combined with de novo assembly approaches, we show that MetaCompass can generate more complete assemblies than can be obtained by de novo assembly alone, and improve on assemblies from the Human Microbiome Project (over 2,000 samples).


2018 ◽  
Author(s):  
Thomas D.S. Sutton ◽  
Adam G. Clooney ◽  
Feargal J. Ryan ◽  
R. Paul Ross ◽  
Colin Hill

AbstractBackgroundThe viral component of microbial communities play a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to reference databases. As a result, investigation of the viral metagenome (virome) relies heavily on de novo assembly of short sequencing reads to recover compositional and functional information. Metagenomic assembly is particularly challenging for virome data, often resulting in fragmented assemblies and poor recovery of viral community members. Despite the essential role of assembly in virome analysis and difficulties posed by these data, current assembly comparisons have been limited to subsections of virome studies or bacterial datasets.DesignThis study presents the most comprehensive virome assembly comparison to date, featuring 16 metagenomic assembly approaches which have featured in human virome studies. Assemblers were assessed using four independent virome datasets, namely; simulated reads, two mock communities, viromes spiked with a known phage and human gut viromes.ResultsAssembly performance varied significantly across all test datasets, with SPAdes (meta) performing consistently well. Performance of MIRA and VICUNA varied, highlighting the importance of using a range of datasets when comparing assembly programs. It was also found that while some assemblers addressed the challenges of virome data better than others, all assemblers had limitations. Low read coverage and genomic repeats resulted in assemblies with poor genome recovery, high degrees of fragmentation and low accuracy contigs across all assemblers. These limitations must be considered when setting thresholds for downstream analysis and when drawing conclusions from virome data.


2018 ◽  
Author(s):  
Simon Roux ◽  
Gareth Trubl ◽  
Danielle Goudeau ◽  
Nandita Nath ◽  
Estelle Couradeau ◽  
...  

Background. Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods. Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results. Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥ 10kb by 10 to 100-fold for low input metagenomes. Conclusions. PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.


Viruses ◽  
2019 ◽  
Vol 11 (11) ◽  
pp. 979 ◽  
Author(s):  
Ping Liu ◽  
Wu Chen ◽  
Jin-Ping Chen

Pangolins are endangered animals in urgent need of protection. Identifying and cataloguing the viruses carried by pangolins is a logical approach to evaluate the range of potential pathogens and help with conservation. This study provides insight into viral communities of Malayan Pangolins (Manis javanica) as well as the molecular epidemiology of dominant pathogenic viruses between Malayan Pangolin and other hosts. A total of 62,508 de novo assembled contigs were constructed, and a BLAST search revealed 3600 ones (≥300 nt) were related to viral sequences, of which 68 contigs had a high level of sequence similarity to known viruses, while dominant viruses were the Sendai virus and Coronavirus. This is the first report on the viral diversity of pangolins, expanding our understanding of the virome in endangered species, and providing insight into the overall diversity of viruses that may be capable of directly or indirectly crossing over into other mammals.


2012 ◽  
Vol 28 (11) ◽  
pp. 1455-1462 ◽  
Author(s):  
Binbin Lai ◽  
Ruogu Ding ◽  
Yang Li ◽  
Liping Duan ◽  
Huaiqiu Zhu
Keyword(s):  
De Novo ◽  

2021 ◽  
Author(s):  
Satoshi Hiraoka ◽  
Tomomi Sumida ◽  
Miho Hirai ◽  
Atsushi Toyoda ◽  
Shinsuke Kawagucci ◽  
...  

Chemical modifications of DNA, including methylation, play an important role in prokaryotes and viruses. However, our knowledge of the modification systems in environmental microbial communities, typically dominated by members not yet cultured, is limited. Here, we conducted 'metaepigenomic' analyses by single-molecule real-time sequencing of marine microbial communities. In total, 233 and 163 metagenomic assembly genomes (MAGs) were constructed from diverse prokaryotes and viruses, respectively, and 220 modified motifs and 276 DNA methyltransferases (MTases) were identified. Most of the MTases were not associated with the defense mechanism. The MTase-motif correspondence found in the MAGs revealed 10 novel pairs, and experimentally confirmed the catalytic specificities of the MTases. We revealed novel alternative motifs in the methylation system that are highly conserved in Alphaproteobacteria, illuminating the co-evolutionary history of the methylation system and host genome. Our findings highlight diverse unexplored DNA modifications that potentially affect the ecology and evolution of prokaryotes and viruses.


2019 ◽  
Author(s):  
Andrey N. Shkoporov ◽  
Adam G. Clooney ◽  
Thomas D.S. Sutton ◽  
Feargal J. Ryan ◽  
Karen M. Daly ◽  
...  

SummaryThe human gut contains a vast array of viruses, mostly bacteriophages. The majority remain uncharacterised and their roles in shaping the gut microbiome and in impacting on human health remain poorly understood. Here we performed a longitudinal focused metagenomic study of faecal bacteriophage populations in healthy adults. Our results reveal high temporal stability and individual specificity of bacteriophage consortia which correlates with the bacterial microbiome. We report the existence of a stable, numerically predominant individual-specific persistent personal virome. Clustering of bacteriophage genomes and de novo taxonomic annotation identified several groups of crAss-like and Microviridae bacteriophages as the most stable colonizers of the human gut. CRISPR-based host prediction highlighted connections between these stable viral communities and highly predominant gut bacterial taxa such as Bacteroides, Prevotella and Faecalibacterium. This study provides insights into the structure of the human gut virome and serves as an important baseline for hypothesis-driven research.


Sign in / Sign up

Export Citation Format

Share Document