assembly software Latest Research Papers

Software choice and depth of sequence coverage can impact plastid genome assembly - A case study in the narrow endemic Calligonum bakuense

10.1101/2021.10.06.463392 ◽

2021 ◽

Author(s):

Eka Giorgashvili ◽

Katja Reichel ◽

Calvinna Caswara ◽

Vuqar Kerimov ◽

Thomas Borsch ◽

...

Keyword(s):

Genome Assembly ◽

Plastid Genome ◽

Computation Time ◽

Software Tools ◽

Whole Genome Sequencing Data ◽

Phylogenetic Position ◽

Sequence Variability ◽

Sequence Coverage ◽

The Impact ◽

Assembly Software

Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequence coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense, which forms a distinct lineage in the genus Calligonum. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequence coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and three levels of sequence coverage (original depth, 2,000x, and 500x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic tree inference is also assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produced the most consistent assemblies for C. bakuense. Moreover, we found that a cap in sequence coverage can reduce both the sequence variability across assembly contigs and computation time. While no evidence was found that the sequence variability across assemblies was large enough to affect the phylogenetic position inferred for C. bakuense, differences among the assemblies may influence genotype recognition at the population level.

The Evidential Statistics of Genetic Assembly: Bootstrapping a Reference Sequence

Frontiers in Ecology and Evolution ◽

10.3389/fevo.2021.614374 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yukihiko Toquenaga ◽

Takuya Gagné

Keyword(s):

Edit Distance ◽

Reference Genome ◽

Reference Sequence ◽

Specific Method ◽

Data Sets ◽

Type Specimens ◽

Circular Genome ◽

Base Sequences ◽

Reference Sequences ◽

Assembly Software

The reference sequences play an essential role in genome assembly, like type specimens in taxonomy. Those references are also samples obtained at some time and location with a specific method. How can we evaluate or discriminate uncertainties of the reference itself and assembly methods? Here we bootstrapped 50 random read data sets from a small circular genome of a Escherichia coli bacteriophage, phiX174, and tried to reconstruct the reference with 14 free assembly programs. Nine out of 14 assembly programs were capable of circular genome reconstruction. Unicycler correctly reconstructed the reference for 44 out of 50 data sets, but each reconstructed contig of the failed six data sets had minor defects. The other assembly software could reconstruct the reference with minor defects. The defect regions differed among the assembly programs, and the defect locations were far from randomly distributed in the reference genome. All contigs of Trinity included one, but Minia had two perfect copies other than an imperfect reference copy. The centroid of contigs for assembly programs except Unicycler differed from the reference with 75bases at most. Nonmetric multidimensional scaling (NMDS) plots of the centroids indicated that even the reference sequence was located slightly off from the estimated location of the true reference. We propose that the combination of bootstrapping a reference, making consensus contigs as centroids in an edit distance, and NMDS plotting will provide an evidential statistic way of genetic assembly for non-fragmented base sequences.

Pincho: A Modular Approach to High Quality De Novo Transcriptomics

Genes ◽

10.3390/genes12070953 ◽

2021 ◽

Vol 12 (7) ◽

pp. 953

Author(s):

Randy Ortiz ◽

Priyanka Gera ◽

Christopher Rivera ◽

Juan C. Santos

Keyword(s):

Ad Hoc ◽

De Novo ◽

Transcriptome Assembly ◽

Software Tool ◽

Model Systems ◽

Short Read ◽

Bioinformatic Tools ◽

Modular Units ◽

De Novo Transcriptomics ◽

Assembly Software

Transcriptomic reconstructions without reference (i.e., de novo) are common for data samples derived from non-model biological systems. These assemblies involve massive parallel short read sequence reconstructions from experiments, but they usually employ ad-hoc bioinformatic workflows that exhibit limited standardization and customization. The increasing number of transcriptome assembly software continues to provide little room for standardization which is exacerbated by the lack of studies on modularity that compare the effects of assembler synergy. We developed a customizable management workflow for de novo transcriptomics that includes modular units for short read cleaning, assembly, validation, annotation, and expression analysis by connecting twenty-five individual bioinformatic tools. With our software tool, we were able to compare the assessment scores based on 129 distinct single-, bi- and tri-assembler combinations with diverse k-mer size selections. Our results demonstrate a drastic increase in the quality of transcriptome assemblies with bi- and tri- assembler combinations. We aim for our software to improve de novo transcriptome reconstructions for the ever-growing landscape of RNA-seq data derived from non-model systems. We offer guidance to ensure the most complete transcriptomic reconstructions via the inclusion of modular multi-assembly software controlled from a single master console.

GSER (a Genome Size Estimator using R): a pipeline for quality assessment of sequenced genome libraries through genome size estimation

Interface Focus ◽

10.1098/rsfs.2020.0077 ◽

2021 ◽

Vol 11 (4) ◽

pp. 20200077 ◽

Cited By ~ 1

Author(s):

Braulio Valdebenito-Maturana ◽

Gonzalo Riadi

Keyword(s):

Quality Control ◽

Quality Assessment ◽

Genome Size ◽

De Novo ◽

Second Step ◽

Size Estimation ◽

Genome Research ◽

A Genome ◽

The Relationship ◽

Assembly Software

The first step in any genome research after obtaining the read data is to perform a due quality control of the sequenced reads. In a de novo genome assembly project, the second step is to estimate two important features, the genome size and ‘best k -mer’, to start the assembly tests with different de novo assembly software and its parameters. However, the quality control of the sequenced genome libraries as a whole, instead of focusing on the reads only, is frequently overlooked and realized to be important only when the assembly tests did not render the expected results. We have developed GSER, a Genome Size Estimator using R, a pipeline to evaluate the relationship between k -mers and genome size, as a means for quality assessment of the sequenced genome libraries. GSER generates a set of charts that allow the analyst to evaluate the library datasets before starting the assembly. The script which runs the pipeline can be downloaded from http://www.mobilomics.org/GSER/downloads or http://github.com/mobilomics/GSER .

Engineered yeast genomes accurately assembled from pure and mixed samples

Nature Communications ◽

10.1038/s41467-021-21656-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Joseph H. Collins ◽

Kevin W. Keating ◽

Trent R. Jones ◽

Shravani Balaji ◽

Celeste B. Marsan ◽

...

Keyword(s):

Genetic Engineering ◽

Metagenomic Sequencing ◽

Long Reads ◽

Whole Genomes ◽

Genome Features ◽

Engineered Yeast ◽

Yeast Genomes ◽

Yeast Genetic ◽

Yeast Plasmids ◽

Assembly Software

AbstractYeast whole genome sequencing (WGS) lacks end-to-end workflows that identify genetic engineering. Here we present Prymetime, a tool that assembles yeast plasmids and chromosomes and annotates genetic engineering sequences. It is a hybrid workflow—it uses short and long reads as inputs to perform separate linear and circular assembly steps. This structure is necessary to accurately resolve genetic engineering sequences in plasmids and the genome. We show this by assembling diverse engineered yeasts, in some cases revealing unintended deletions and integrations. Furthermore, the resulting whole genomes are high quality, although the underlying assembly software does not consistently resolve highly repetitive genome features. Finally, we assemble plasmids and genome integrations from metagenomic sequencing, even with 1 engineered cell in 1000. This work is a blueprint for building WGS workflows and establishes WGS-based identification of yeast genetic engineering.

Assessment the Quality of Genome Assemblies by using QUAST Tool for Metagenomics

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6435.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4253-4259

Keyword(s):

Genome Sequencing ◽

Reference Genome ◽

Assessment Tool ◽

Quality Assessment Tool ◽

Assembly Evaluation ◽

Assembly Algorithms ◽

Genome Assemblies ◽

Modern Tool ◽

Assembly Software

Number of assembly algorithms have emerged out but due to constraints of genome sequencing techniques no one is perfect. Various methods for assembler’s comparison have been developed, but none is yet a recognized standard. The problem of evaluating assemblies of formerly unsequenced species has not been considered, because mostly existing methods for comparing assemblies are only applicable to new assemblies of finished genomes. For comparing and evaluating genome assemblies we have used QUAST (Quality Assessment Tool). This tool is used to assess the quality of leading assembly software by evaluating quality metrics. Assemblies with a reference genome, as well as without a reference can be evaluated by QUAST tool. For genome assembly evaluation based on alignment of contigs to a reference, it is a modern tool. In this study we demonstrate QUAST performance by comparing several leading genome assemblers on three metagenomic datasets.

Choice of assembly software has a critical impact on virome characterisation

Microbiome ◽

10.1186/s40168-019-0626-5 ◽

2019 ◽

Vol 7 (1) ◽

Cited By ~ 24

Author(s):

Thomas D. S. Sutton ◽

Adam G. Clooney ◽

Feargal J. Ryan ◽

R. Paul Ross ◽

Colin Hill

Keyword(s):

Assembly Software

SGTK: a toolkit for visualization and assessment of scaffold graphs

Bioinformatics ◽

10.1093/bioinformatics/bty956 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2303-2305 ◽

Cited By ~ 2

Author(s):

Olga Kunyavskaya ◽

Andrey D Prjibelski

Keyword(s):

Software Package ◽

Supplementary Information ◽

Sequencing Data ◽

Software Developers ◽

Long Reads ◽

Mate Pair ◽

Linkage Information ◽

Assembly Pipeline ◽

Genome Assemblies ◽

Assembly Software

Abstract Summary Scaffolding is an important step in every genome assembly pipeline, which allows to order contigs into longer sequences using various types of linkage information, such as mate-pair libraries and long reads. In this work, we operate with a notion of a scaffold graph—a graph, vertices of which correspond to the assembled contigs and edges represent connections between them. We present a software package called Scaffold Graph ToolKit that allows to construct and visualize scaffold graphs using different kinds of sequencing data. We show that the scaffold graph appears to be useful for analyzing and assessing genome assemblies, and demonstrate several use cases that can be helpful for both assembly software developers and their users. Availability and implementation SGTK is implemented in C++, Python and JavaScript and is freely available at https://github.com/olga24912/SGTK. Supplementary information Supplementary data are available at Bioinformatics online.

An Improved Genome Assembly for Drosophila navojoa, the Basal Species in the mojavensis Cluster

Journal of Heredity ◽

10.1093/jhered/esy059 ◽

2018 ◽

Vol 110 (1) ◽

pp. 118-123 ◽

Cited By ~ 1

Author(s):

Thyago Vanderlinde ◽

Eduardo Guimarães Dupim ◽

Nestor O Nazario-Yepiz ◽

Antonio Bernardo Carvalho

Keyword(s):

Genome Assembly ◽

Chromosomal Rearrangements ◽

Substantial Improvement ◽

Insert Size ◽

Genetic Studies ◽

Cactophilic Drosophila ◽

Host Shifts ◽

Evolutionary Genetic ◽

Relationship Of ◽

Assembly Software

Abstract Three North American cactophilic Drosophila species, D. mojavensis, D. arizonae, and D. navojoa, are of considerable evolutionary interest owing to the shift from breeding in Opuntia cacti to columnar species. The 3 species form the “mojavensis cluster” of Drosophila. The genome of D. mojavensis was sequenced in 2007 and the genomes of D. navojoa and D. arizonae were sequenced together in 2016 using the same technology (Illumina) and assembly software (AllPaths-LG). Yet, unfortunately, the D. navojoa genome was considerably more fragmented and incomplete than its sister species, rendering it less useful for evolutionary genetic studies. The D. navojoa read dataset does not fully meet the strict insert size required by the assembler used (AllPaths-LG) and this incompatibility might explain its assembly problems. Accordingly, when we re-assembled the genome of D. navojoa with the SPAdes assembler, which does not have the strict AllPaths-LG requirements, we obtained a substantial improvement in all quality indicators such as N50 (from 84 kb to 389 kb) and BUSCO coverage (from 77% to 97%). Here we share a new, improved reference assembly for D. navojoa genome, along with a RNAseq transcriptome. Given the basal relationship of the Opuntia breeding D. navojoa to the columnar breeding D. arizonae and D. mojavensis, the improved assembly and annotation will allow researchers to address a range of questions associated with the genomics of host shifts, chromosomal rearrangements and speciation in this group.

Comparison of bacterial genome assembly software for MinION data and their applicability to medical microbiology

Microbial Genomics ◽

10.1099/mgen.0.000085 ◽

2016 ◽

Vol 2 (9) ◽

Cited By ~ 18

Author(s):

Kim Judge ◽

Martin Hunt ◽

Sandra Reuter ◽

Alan Tracey ◽

Michael A. Quail ◽

...

Keyword(s):

Genome Assembly ◽

Bacterial Genome ◽

Medical Microbiology ◽

Assembly Software

assembly software
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Software choice and depth of sequence coverage can impact plastid genome assembly - A case study in the narrow endemic Calligonum bakuense

The Evidential Statistics of Genetic Assembly: Bootstrapping a Reference Sequence

Pincho: A Modular Approach to High Quality De Novo Transcriptomics

GSER (a Genome Size Estimator using R): a pipeline for quality assessment of sequenced genome libraries through genome size estimation

Engineered yeast genomes accurately assembled from pure and mixed samples

Assessment the Quality of Genome Assemblies by using QUAST Tool for Metagenomics

Choice of assembly software has a critical impact on virome characterisation

SGTK: a toolkit for visualization and assessment of scaffold graphs

An Improved Genome Assembly for Drosophila navojoa, the Basal Species in the mojavensis Cluster

Comparison of bacterial genome assembly software for MinION data and their applicability to medical microbiology

Export Citation Format

assembly softwareRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Software choice and depth of sequence coverage can impact plastid genome assembly - A case study in the narrow endemic Calligonum bakuense

The Evidential Statistics of Genetic Assembly: Bootstrapping a Reference Sequence

Pincho: A Modular Approach to High Quality De Novo Transcriptomics

GSER (a Genome Size Estimator using R): a pipeline for quality assessment of sequenced genome libraries through genome size estimation

Engineered yeast genomes accurately assembled from pure and mixed samples

Assessment the Quality of Genome Assemblies by using QUAST Tool for Metagenomics

Choice of assembly software has a critical impact on virome characterisation

SGTK: a toolkit for visualization and assessment of scaffold graphs

An Improved Genome Assembly for Drosophila navojoa, the Basal Species in the mojavensis Cluster

Comparison of bacterial genome assembly software for MinION data and their applicability to medical microbiology

assembly software
Recently Published Documents