scholarly journals New approaches for assembly of short-read metagenomic data

Author(s):  
Martin Ayling ◽  
Matthew D Clark ◽  
Richard M Leggett

In recent years, the use of longer-range read data combined with advances in assembly algorithms has stimulated big improvements in the contiguity and quality of genome assemblies. However, these advances have not directly transferred to metagenomic datasets, as assumptions made by the single genome assembly algorithms do not apply when assembling multiple genomes at varying levels of abundance. The development of dedicated assemblers for metagenomic data was a relatively late innovation and for many years, researchers had to make do using tools designed for single genomes. This has changed in the last few years and we have seen the emergence of a new type of tool built using different principles. In this review, we describe the challenges inherent in metagenomic assemblies and compare the different approaches taken by these novel assembly tools.

2018 ◽  
Author(s):  
Martin Ayling ◽  
Matthew D Clark ◽  
Richard M Leggett

In recent years, the use of longer-range read data combined with advances in assembly algorithms has stimulated big improvements in the contiguity and quality of genome assemblies. However, these advances have not directly transferred to metagenomic datasets, as assumptions made by the single genome assembly algorithms do not apply when assembling multiple genomes at varying levels of abundance. The development of dedicated assemblers for metagenomic data was a relatively late innovation and for many years, researchers had to make do using tools designed for single genomes. This has changed in the last few years and we have seen the emergence of a new type of tool built using different principles. In this review, we describe the challenges inherent in metagenomic assemblies and compare the different approaches taken by these novel assembly tools.


2019 ◽  
Vol 21 (2) ◽  
pp. 584-594 ◽  
Author(s):  
Martin Ayling ◽  
Matthew D Clark ◽  
Richard M Leggett

Abstract In recent years, the use of longer range read data combined with advances in assembly algorithms has stimulated big improvements in the contiguity and quality of genome assemblies. However, these advances have not directly transferred to metagenomic data sets, as assumptions made by the single genome assembly algorithms do not apply when assembling multiple genomes at varying levels of abundance. The development of dedicated assemblers for metagenomic data was a relatively late innovation and for many years, researchers had to make do using tools designed for single genomes. This has changed in the last few years and we have seen the emergence of a new type of tool built using different principles. In this review, we describe the challenges inherent in metagenomic assemblies and compare the different approaches taken by these novel assembly tools.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Kerstin Howe ◽  
William Chow ◽  
Joanna Collins ◽  
Sarah Pelan ◽  
Damon-Lee Pointon ◽  
...  

Abstract Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes. Whilst working towards improved datasets and fully automated pipelines, assembly evaluation and curation is actively used to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality. We describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in a gEVAL-independent context to facilitate the uptake of genome curation in the wider community.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Gokhan Yavas ◽  
Huixiao Hong ◽  
Wenming Xiao

Abstract Background Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. Results To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. Conclusions The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated.


2020 ◽  
Vol 8 (6) ◽  
pp. 4253-4259

Number of assembly algorithms have emerged out but due to constraints of genome sequencing techniques no one is perfect. Various methods for assembler’s comparison have been developed, but none is yet a recognized standard. The problem of evaluating assemblies of formerly unsequenced species has not been considered, because mostly existing methods for comparing assemblies are only applicable to new assemblies of finished genomes. For comparing and evaluating genome assemblies we have used QUAST (Quality Assessment Tool). This tool is used to assess the quality of leading assembly software by evaluating quality metrics. Assemblies with a reference genome, as well as without a reference can be evaluated by QUAST tool. For genome assembly evaluation based on alignment of contigs to a reference, it is a modern tool. In this study we demonstrate QUAST performance by comparing several leading genome assemblers on three metagenomic datasets.


2015 ◽  
Author(s):  
Alejandro Hernandez Wences ◽  
Michael Schatz

Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for metassembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net.


2021 ◽  
Author(s):  
Anurag Priyam ◽  
Alicja Witwicka ◽  
Anindita Brahma ◽  
Eckart Stolle ◽  
Yannick Wurm

Long-molecule sequencing is now routinely applied to generate high-quality reference genome assemblies. However, datasets differ in repeat composition, heterozygosity, read lengths and error profiles. The assembly parameters that provide the best results could thus differ across datasets. By integrating four complementary and biologically meaningful metrics, we show that simple fine-tuning of assembly parameters can substantially improve the quality of long-read genome assemblies. In particular, modifying estimates of sequencing error rates improves some metrics more than two-fold. We provide a flexible software, CompareGenomeQualities, that automates comparisons of assembly qualities for researchers wanting a straightforward mechanism for choosing among multiple assemblies.


2016 ◽  
Author(s):  
Charles H.D. Williamson ◽  
Andrew Sanchez ◽  
Adam Vazquez ◽  
Joshua Gutman ◽  
Jason W. Sahl

AbstractHigh-throughput comparative genomics has changed our view of bacterial evolution and relatedness. Many genomic comparisons, especially those regarding the accessory genome that is variably conserved across strains in a species, are performed using assembled genomes. For completed genomes, an assumption is made that the entire genome was incorporated into the genome assembly, while for draft assemblies, often constructed from short sequence reads, an assumption is made that genome assembly is an approximation of the entire genome. To understand the potential effects of short read assemblies on the estimation of the complete genome, we downloaded all completed bacterial genomes from GenBank, simulated short reads, assembled the simulated short reads and compared the resulting assembly to the completed assembly. Although most simulated assemblies demonstrated little reduction, others were reduced by as much as 25%, which was correlated with the repeat structure of the genome. A comparative analysis of lost coding region sequences demonstrated that up to 48 CDSs or up to ~112,000 bases of coding region sequence, were missing from some draft assemblies compared to their finished counterparts. Although this effect was observed to some extent in 32% of genomes, only minimal effects were observed on pan-genome statistics when using simulated draft genome assemblies. The benefits and limitations of using draft genome assemblies should be fully realized before interpreting data from assembly-based comparative analyses.


Author(s):  
Kerstin Howe ◽  
William Chow ◽  
Joanna Collins ◽  
Sarah Pelan ◽  
Damon-Lee Pointon ◽  
...  

AbstractBackgroundGenome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes.ResultsWhilst working towards improved data sets and fully automated pipelines, assembly evaluation and curation is actively employed to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality.ConclusionsWe describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in an gEVAL-independent context to facilitate the uptake of genome curation in the wider community.


2021 ◽  
pp. bmjebm-2021-111670
Author(s):  
Clara Locher ◽  
David Moher ◽  
Ioana Alina Cristea ◽  
Florian Naudet

During the COVID-19 pandemic, the rush to scientific and political judgements on the merits of hydroxychloroquine was fuelled by dubious papers which may have been published because the authors were not independent from the practices of the journals in which they appeared. This example leads us to consider a new type of illegitimate publishing entity, ‘self-promotion journals’ which could be deployed to serve the instrumentalisation of productivity-based metrics, with a ripple effect on decisions about promotion, tenure and grant funding, but also on the quality of manuscripts that are disseminated to the medical community and form the foundation of evidence-based medicine.


Sign in / Sign up

Export Citation Format

Share Document