Correction: Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results

Leishmania major is the main causative agent of cutaneous leishmaniasis in humans. The Friedlin strain of this species (LmjF) was chosen when a multi-laboratory consortium undertook the objective of deciphering the first genome sequence for a parasite of the genus Leishmania. The objective was successfully attained in 2005, and this represented a milestone for Leishmania molecular biology studies around the world. Although the LmjF genome sequence was done following a shotgun strategy and using classical Sanger sequencing, the results were excellent, and this genome assembly served as the reference for subsequent genome assemblies in other Leishmania species. Here, we present a new assembly for the genome of this strain (named LMJFC for clarity), generated by the combination of two high throughput sequencing platforms, Illumina short-read sequencing and PacBio Single Molecular Real-Time (SMRT) sequencing, which provides long-read sequences. Apart from resolving uncertain nucleotide positions, several genomic regions were reorganized and a more precise composition of tandemly repeated gene loci was attained. Additionally, the genome annotation was improved by adding 542 genes and more accurate coding-sequences defined for around two hundred genes, based on the transcriptome delimitation also carried out in this work. As a result, we are providing gene models (including untranslated regions and introns) for 11,238 genes. Genomic information ultimately determines the biology of every organism; therefore, our understanding of molecular mechanisms will depend on the availability of precise genome sequences and accurate gene annotations. In this regard, this work is providing an improved genome sequence and updated transcriptome annotations for the reference L. major Friedlin strain.

Download Full-text

dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies

BMC Genomics ◽

10.1186/s12864-019-6070-x ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Gokhan Yavas ◽

Huixiao Hong ◽

Wenming Xiao

Keyword(s):

Quality Assessment ◽

Genome Assembly ◽

Reference Genome ◽

De Novo ◽

Quality Score ◽

De Novo Genome Assembly ◽

Genome Assemblies ◽

Reference Genomes ◽

Better Than

Abstract Background Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. Results To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. Conclusions The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated.

Download Full-text

Improved hybrid de novo genome assembly of domesticated apple (Malus x domestica)

GigaScience ◽

10.1186/s13742-016-0139-0 ◽

2016 ◽

Vol 5 (1) ◽

Cited By ~ 28

Author(s):

Xuewei Li ◽

Ling Kui ◽

Jing Zhang ◽

Yinpeng Xie ◽

Liping Wang ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Malus X Domestica ◽

De Novo Genome Assembly

Download Full-text

Meraculous: De Novo Genome Assembly with Short Paired-End Reads

PLoS ONE ◽

10.1371/journal.pone.0023501 ◽

2011 ◽

Vol 6 (8) ◽

pp. e23501 ◽

Cited By ~ 107

Author(s):

Jarrod A. Chapman ◽

Isaac Ho ◽

Sirisha Sunkara ◽

Shujun Luo ◽

Gary P. Schroth ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

De Novo Genome Assembly

Download Full-text

Molecular Characterization of Potato Virus Y (PVY) Using High-Throughput Sequencing: Constraints on Full Genome Reconstructions Imposed by Mixed Infection Involving Recombinant PVY Strains

Plants ◽

10.3390/plants10040753 ◽

2021 ◽

Vol 10 (4) ◽

pp. 753

Author(s):

Miroslav Glasa ◽

Richard Hančinský ◽

Katarína Šoltys ◽

Lukáš Predajňa ◽

Jana Tomašechová ◽

...

Keyword(s):

High Throughput ◽

Potato Virus ◽

Mixed Infection ◽

High Throughput Sequencing ◽

Potato Virus Y ◽

De Novo ◽

Sweet Pepper ◽

Complete Sequence ◽

Genome Region ◽

Mapping Parameters

In recent years, high throughput sequencing (HTS) has brought new possibilities to the study of the diversity and complexity of plant viromes. Mixed infection of a single plant with several viruses is frequently observed in such studies. We analyzed the virome of 10 tomato and sweet pepper samples from Slovakia, all showing the presence of potato virus Y (PVY) infection. Most datasets allow the determination of the nearly complete sequence of a single-variant PVY genome, belonging to one of the PVY recombinant strains (N-Wi, NTNa, or NTNb). However, in three to-mato samples (T1, T40, and T62) the presence of N-type and O-type sequences spanning the same genome region was documented, indicative of mixed infections involving different PVY strains variants, hampering the automated assembly of PVY genomes present in the sample. The N- and O-type in silico data were further confirmed by specific RT-PCR assays targeting UTR-P1 and NIa genomic parts. Although full genomes could not be de novo assembled directly in this situation, their deep coverage by relatively long paired reads allowed their manual re-assembly using very stringent mapping parameters. These results highlight the complexity of PVY infection of some host plants and the challenges that can be met when trying to precisely identify the PVY isolates involved in mixed infection.

Download Full-text

Ultra Efficient Acceleration for De Novo Genome Assembly via Near-Memory Computing

10.1109/pact52795.2021.00022 ◽

2021 ◽

Author(s):

Minxuan Zhou ◽

Lingxi Wu ◽

Muzhou Li ◽

Niema Moshiri ◽

Kevin Skadron ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

De Novo Genome Assembly

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

De novo Genome Assembly from Next-Generation Sequencing (NGS) Reads

Next-Generation Sequencing Data Analysis ◽

10.1201/b19532-11 ◽

2016 ◽

pp. 144-155

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Next Generation ◽

De Novo Genome Assembly ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Download Full-text