scholarly journals Trycycler: consensus long-read assemblies for bacterial genomes

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ryan R. Wick ◽  
Louise M. Judd ◽  
Louise T. Cerdeira ◽  
Jane Hawkey ◽  
Guillaume Méric ◽  
...  

AbstractWhile long-read sequencing allows for the complete assembly of bacterial genomes, long-read assemblies contain a variety of errors. Here, we present Trycycler, a tool which produces a consensus assembly from multiple input assemblies of the same genome. Benchmarking showed that Trycycler assemblies contained fewer errors than assemblies constructed with a single tool. Post-assembly polishing further reduced errors and Trycycler+polishing assemblies were the most accurate genomes in our study. As Trycycler requires manual intervention, its output is not deterministic. However, we demonstrated that multiple users converge on similar assemblies that are consistently more accurate than those produced by automated assembly tools.

2021 ◽  
Author(s):  
Ryan R Wick ◽  
Louise M Judd ◽  
Louise T Cerdeira ◽  
Jane Hawkey ◽  
Guillaume Meric ◽  
...  

Assembly of bacterial genomes from long-read data (generated by Oxford Nanopore or Pacific Biosciences platforms) can often be complete: a single contig for each chromosome or plasmid in the genome. However, even complete bacterial genome assemblies constructed solely from long reads still contain a variety of errors, and different assemblies of the same genome often contain different errors. Here, we present Trycycler, a tool which produces a consensus assembly from multiple input assemblies of the same genome. Benchmarking using both simulated and real sequencing reads showed that Trycycler consensus assemblies contained fewer errors than any of those constructed with a single long-read assembler. Post-assembly polishing with Medaka and Pilon further reduced errors and yielded the most accurate genome assemblies in our study. As Trycycler can require human judgement and manual intervention, its output is not deterministic, and different users can produce different Trycycler assemblies from the same input data. However, we demonstrated that multiple users with minimal training converge on similar assemblies that are consistently more accurate than those produced by automated assembly tools. We therefore recommend Trycycler+Medaka+Pilon as an ideal approach for generating high-quality bacterial reference genomes.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Valentine Murigneux ◽  
Leah W. Roberts ◽  
Brian M. Forde ◽  
Minh-Duy Phan ◽  
Nguyen Thi Khanh Nhu ◽  
...  

Abstract Background Oxford Nanopore Technology (ONT) long-read sequencing has become a popular platform for microbial researchers due to the accessibility and affordability of its devices. However, easy and automated construction of high-quality bacterial genomes using nanopore reads remains challenging. Here we aimed to create a reproducible end-to-end bacterial genome assembly pipeline using ONT in combination with Illumina sequencing. Results We evaluated the performance of several popular tools used during genome reconstruction, including base-calling, filtering, assembly, and polishing. We also assessed overall genome accuracy using ONT both natively and with Illumina. All steps were validated using the high-quality complete reference genome for the Escherichia coli sequence type (ST)131 strain EC958. Software chosen at each stage were incorporated into our final pipeline, MicroPIPE. Further validation of MicroPIPE was carried out using 11 additional ST131 E. coli isolates, which demonstrated that complete circularised chromosomes and plasmids could be achieved without manual intervention. Twelve publicly available Gram-negative and Gram-positive bacterial genomes (with available raw ONT data and matched complete genomes) were also assembled using MicroPIPE. We found that revised basecalling and updated assembly of the majority of these genomes resulted in improved accuracy compared to the current publicly available complete genomes. Conclusions MicroPIPE is built in modules using Singularity container images and the bioinformatics workflow manager Nextflow, allowing changes and adjustments to be made in response to future tool development. Overall, MicroPIPE provides an easy-access, end-to-end solution for attaining high-quality bacterial genomes. MicroPIPE is available at https://github.com/BeatsonLab-MicrobialGenomics/micropipe.


DNA Research ◽  
2020 ◽  
Vol 27 (3) ◽  
Author(s):  
Rei Kajitani ◽  
Dai Yoshimura ◽  
Yoshitoshi Ogura ◽  
Yasuhiro Gotoh ◽  
Tetsuya Hayashi ◽  
...  

Abstract De novo assembly of short DNA reads remains an essential technology, especially for large-scale projects and high-resolution variant analyses in epidemiology. However, the existing tools often lack sufficient accuracy required to compare closely related strains. To facilitate such studies on bacterial genomes, we developed Platanus_B, a de novo assembler that employs iterations of multiple error-removal algorithms. The benchmarks demonstrated the superior accuracy and high contiguity of Platanus_B, in addition to its ability to enhance the hybrid assembly of both short and nanopore long reads. Although the hybrid strategies for short and long reads were effective in achieving near full-length genomes, we found that short-read-only assemblies generated with Platanus_B were sufficient to obtain ≥90% of exact coding sequences in most cases. In addition, while nanopore long-read-only assemblies lacked fine-scale accuracies, inclusion of short reads was effective in improving the accuracies. Platanus_B can, therefore, be used for comprehensive genomic surveillances of bacterial pathogens and high-resolution phylogenomic analyses of a wide range of bacteria.


2018 ◽  
Author(s):  
Eli L. Moss ◽  
Ami S. Bhatt

AbstractWe present the first method for efficient recovery of complete, closed genomes directly from microbiomes using nanopore long-read sequencing and assembly. We apply our approach to three healthy human gut communities and compare results to short read and read cloud approaches. We obtain nine finished genomes including the first reported closed genome of Prevotella copri, an organism with highly repetitive genome structure prevalent in non-western human gut microbiomes.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1839 ◽  
Author(s):  
Tom O. Delmont ◽  
A. Murat Eren

High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target organisms requires advanced bioinformatics approaches and practices. Here, we re-analyzed the sequencing data generated for the tardigradeHypsibius dujardini,and created a holistic display of the eukaryotic genome assembly using DNA data originating from two groups and eleven sequencing libraries. By using bacterial single-copy genes, k-mer frequencies, and coverage values of scaffolds we could identify and characterize multiple near-complete bacterial genomes from the raw assembly, and curate a 182 Mbp draft genome forH. dujardinisupported by RNA-Seq data. Our results indicate that most contaminant scaffolds were assembled from Moleculo long-read libraries, and most of these contaminants have differed between library preparations. Our re-analysis shows that visualization and curation of eukaryotic genome assemblies can benefit from tools designed to address the needs of today’s microbiologists, who are constantly challenged by the difficulties associated with the identification of distinct microbial genomes in complex environmental metagenomes.


mBio ◽  
2020 ◽  
Vol 11 (2) ◽  
Author(s):  
Crystal L. Frost ◽  
Stefanos Siozios ◽  
Pol Nadal-Jimenez ◽  
Michael A. Brockhurst ◽  
Kayla C. King ◽  
...  

ABSTRACT Mobile elements—plasmids and phages—are important components of microbial function and evolution via traits that they encode and their capacity to shuttle genetic material between species. We here report the unusually rich array of mobile elements within the genome of Arsenophonus nasoniae, the son-killer symbiont of the parasitic wasp Nasonia vitripennis. This microbe’s genome has the highest prophage complement reported to date, with over 50 genomic regions that represent either intact or degraded phage material. Moreover, the genome is predicted to include 17 extrachromosomal genetic elements, which carry many genes predicted to be important at the microbe-host interface, derived from a diverse assemblage of insect-associated gammaproteobacteria. In our system, this diversity was previously masked by repetitive mobile elements that broke the assembly derived from short reads. These findings suggest that other complex bacterial genomes will be revealed in the era of long-read sequencing. IMPORTANCE The biology of many bacteria is critically dependent on genes carried on plasmid and phage mobile elements. These elements shuttle between microbial species, thus providing an important source of biological innovation across taxa. It has recently been recognized that mobile elements are also important in symbiotic bacteria, which form long-lasting interactions with their host. In this study, we report a bacterial symbiont genome that carries a highly complex array of these elements. Arsenophonus nasoniae is the son-killer microbe of the parasitic wasp Nasonia vitripennis and exists with the wasp throughout its life cycle. We completed its genome with the aid of recently developed long-read technology. This assembly contained over 50 chromosomal regions of phage origin and 17 extrachromosomal elements within the genome, encoding many important traits at the host-microbe interface. Thus, the biology of this symbiont is enabled by a complex array of mobile elements.


2019 ◽  
Vol 35 (21) ◽  
pp. 4239-4246 ◽  
Author(s):  
Pierre Marijon ◽  
Rayan Chikhi ◽  
Jean-Stéphane Varré

Abstract Motivation Long-read genome assembly tools are expected to reconstruct bacterial genomes nearly perfectly; however, they still produce fragmented assemblies in some cases. It would be beneficial to understand whether these cases are intrinsically impossible to resolve, or if assemblers are at fault, implying that genomes could be refined or even finished with little to no additional experimental cost. Results We propose a set of computational techniques to assist inspection of fragmented bacterial genome assemblies, through careful analysis of assembly graphs. By finding paths of overlapping raw reads between pairs of contigs, we recover potential short-range connections between contigs that were lost during the assembly process. We show that our procedure recovers 45% of missing contig adjacencies in fragmented Canu assemblies, on samples from the NCTC bacterial sequencing project. We also observe that a simple procedure based on enumerating weighted Hamiltonian cycles can suggest likely contig orderings. In our tests, the correct contig order is ranked first in half of the cases and within the top-three predictions in nearly all evaluated cases, providing a direction for finishing fragmented long-read assemblies. Availability and implementation https://gitlab.inria.fr/pmarijon/knot . Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Elizabeth G. Wilbanks ◽  
Hugo Doré ◽  
Meredith H. Ashby ◽  
Cheryl Heiner ◽  
Richard J. Roberts ◽  
...  

AbstractThe plasticity of bacterial and archaeal genomes makes examining their ecological and evolutionary dynamics both exciting and challenging. The same mechanisms that enable rapid genomic change and adaptation confound current approaches for recovering complete genomes from metagenomes. Here, we use strain-specific patterns of DNA methylation to resolve complex bacterial genomes from the long-read metagenome of a marine microbial consortia, the “pink berries” of the Sippewissett Marsh. Unique combinations of restriction-modification (RM) systems encoded by the bacteria produced distinctive methylation profiles that accurately binned and classified metagenomic sequences. We linked the methylation patterns of each metagenome-assembled genome with encoded DNA methyltransferases and discovered new restriction modification (RM) defense systems, including novel associations of RM systems with RNase toxins. Using this approach, we finished the largest and most complex circularized bacterial genome ever recovered from a metagenome (7.9 Mb with >600 IS elements), the finished genome of Thiohalocapsa sp. PB-PSB1 the dominant bacteria in the consortia. From these methylation-binned genomes, we identified instances of lateral gene transfer between sulfur-cycling symbionts (Thiohalocapsa sp. PB-PSB1 and Desulfofustis sp. PB-SRB1), phage infection, and strain-level structural variation.


Author(s):  
Anton Bankevich ◽  
Pavel Pevzner

AbstractLong-read technologies revolutionized genome assembly and enabled resolution of bridged repeats (i.e., repeats that are spanned by some reads) in various genomes. However, the problem of resolving unbridged repeats (such as long segmental duplications in the human genome) remains largely unsolved, making it a major obstacle towards achieving the goal of complete genome assemblies. Moreover, the challenge of resolving unbridged repeats is not limited to eukaryotic genomes but also impairs assemblies of bacterial genomes and metagenomes. We describe the mosaicFlye algorithm for resolving complex unbridged repeats based on differences between various repeat copies and show how it improves assemblies of the human genome as well as bacterial genomes and metagenomes. In particular, we show that mosaicFlye results in a complete assembly of both arms of the human chromosome 6.


Sign in / Sign up

Export Citation Format

Share Document