scholarly journals ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zachary Deng ◽  
Eric Delwart

Abstract Background Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs. Results To facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets. Conclusions A novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.

Viruses ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 758 ◽  
Author(s):  
Keylie M. Gibson ◽  
Margaret C. Steiner ◽  
Uzma Rentia ◽  
Matthew L. Bendall ◽  
Marcos Pérez-Losada ◽  
...  

Next-generation sequencing (NGS) offers a powerful opportunity to identify low-abundance, intra-host viral sequence variants, yet the focus of many bioinformatic tools on consensus sequence construction has precluded a thorough analysis of intra-host diversity. To take full advantage of the resolution of NGS data, we developed HAplotype PHylodynamics PIPEline (HAPHPIPE), an open-source tool for the de novo and reference-based assembly of viral NGS data, with both consensus sequence assembly and a focus on the quantification of intra-host variation through haplotype reconstruction. We validate and compare the consensus sequence assembly methods of HAPHPIPE to those of two alternative software packages, HyDRA and Geneious, using simulated HIV and empirical HIV, HCV, and SARS-CoV-2 datasets. Our validation methods included read mapping, genetic distance, and genetic diversity metrics. In simulated NGS data, HAPHPIPE generated pol consensus sequences significantly closer to the true consensus sequence than those produced by HyDRA and Geneious and performed comparably to Geneious for HIV gp120 sequences. Furthermore, using empirical data from multiple viruses, we demonstrate that HAPHPIPE can analyze larger sequence datasets due to its greater computational speed. Therefore, we contend that HAPHPIPE provides a more user-friendly platform for users with and without bioinformatics experience to implement current best practices for viral NGS assembly than other currently available options.


PLoS ONE ◽  
2020 ◽  
Vol 15 (8) ◽  
pp. e0237455
Author(s):  
Laila Sara Arroyo Mühr ◽  
Camilla Lagheden ◽  
Sadaf Sakina Hassan ◽  
Sara Nordqvist Kleppe ◽  
Emilie Hultin ◽  
...  

BMC Genomics ◽  
2011 ◽  
Vol 12 (1) ◽  
Author(s):  
Matthew G Links ◽  
Eric Holub ◽  
Rays HY Jiang ◽  
Andrew G Sharpe ◽  
Dwayne Hegedus ◽  
...  

PLoS ONE ◽  
2013 ◽  
Vol 8 (4) ◽  
pp. e60449 ◽  
Author(s):  
Ren Wang ◽  
Sheng Xu ◽  
Yumei Jiang ◽  
Jingwei Jiang ◽  
Xiaodan Li ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document