scholarly journals Completing bacterial genome assemblies with multiplex MinION sequencing

2017 ◽  
Author(s):  
Ryan R. Wick ◽  
Louise M. Judd ◽  
Claire L. Gorrie ◽  
Kathryn E. Holt

AbstractIllumina sequencing platforms have enabled widespread bacterial whole genome sequencing. While Illumina data is appropriate for many analyses, its short read length limits its ability to resolve genomic structure. This has major implications for tracking the spread of mobile genetic elements, including those which carry antimicrobial resistance determinants. Fully resolving a bacterial genome requires long-read sequencing such as those generated by Oxford Nanopore Technologies (ONT) platforms. Here we describe our use of the ONT MinION to sequence 12 isolates of Klebsiella pneumoniae on a single flow cell. We assembled each genome using a combination of ONT reads and previously available Illumina reads, and little to no manual intervention was needed to achieve fully resolved assemblies using the Unicycler hybrid assembler. Assembling only ONT reads with Canu was less effective, resulting in fewer resolved genomes and higher error rates even following error correction with Nanopolish. We demonstrate that multiplexed ONT sequencing is a valuable tool for high-throughput bacterial genome finishing. Specifically, we advocate the use of Illumina sequencing as a first analysis step, followed by ONT reads as needed to resolve genomic structure.Data summarySequence read files for all 12 isolates have been deposited in SRA, accessible through these NCBI BioSample accession numbers: SAMEA3357010, SAMEA3357043, SAMN07211279, SAMN07211280, SAMEA3357223, SAMEA3357193, SAMEA3357346, SAMEA3357374, SAMEA3357320, SAMN07211281, SAMN07211282, SAMEA3357405.A full list of SRA run accession numbers (both Illumina reads and ONT reads) for these samples are available in Table S1.Assemblies and sequencing reads corresponding to each stage of processing and analysis are provided in the following figshare project: https://figshare.com/projects/Completing_bacterial_genome_assemblies_with_multiplex_MinION_sequencing/23068Source code is provided in the following public GitHub repositories: https://github.com/rrwick/Bacterial-genome-assemblies-with-multiplex-MinION-sequencinghttps://github.com/rrwick/Porechophttps://github.com/rrwick/Fast5-to-FastqImpact StatementLike many research and public health laboratories, we frequently perform large-scale bacterial comparative genomics studies using Illumina sequencing, which assays gene content and provides the high-confidence variant calls needed for phylogenomics and transmission studies. However, problems often arise with resolving genome assemblies, particularly around regions that matter most to our research, such as mobile genetic elements encoding antibiotic resistance or virulence genes. These complexities can often be resolved by long sequence reads generated with PacBio or Oxford Nanopore Technologies (ONT) platforms. While effective, this has proven difficult to scale, due to the relatively high costs of generating long reads and the manual intervention required for assembly. Here we demonstrate the use of barcoded ONT libraries sequenced in multiplex on a single ONT MinION flow cell, coupled with hybrid assembly using Unicycler, to resolve 12 large bacterial genomes. Minor manual intervention was required to fully resolve small plasmids in five isolates, which we found to be underrepresented in ONT data. Cost per sample for the ONT sequencing was equivalent to Illumina sequencing, and there is potential for significant savings by multiplexing more samples on the ONT run. This approach paves the way for high-throughput and cost-effective generation of completely resolved bacterial genomes to become widely accessible.

2021 ◽  
Author(s):  
Ryan R Wick ◽  
Louise M Judd ◽  
Louise T Cerdeira ◽  
Jane Hawkey ◽  
Guillaume Meric ◽  
...  

Assembly of bacterial genomes from long-read data (generated by Oxford Nanopore or Pacific Biosciences platforms) can often be complete: a single contig for each chromosome or plasmid in the genome. However, even complete bacterial genome assemblies constructed solely from long reads still contain a variety of errors, and different assemblies of the same genome often contain different errors. Here, we present Trycycler, a tool which produces a consensus assembly from multiple input assemblies of the same genome. Benchmarking using both simulated and real sequencing reads showed that Trycycler consensus assemblies contained fewer errors than any of those constructed with a single long-read assembler. Post-assembly polishing with Medaka and Pilon further reduced errors and yielded the most accurate genome assemblies in our study. As Trycycler can require human judgement and manual intervention, its output is not deterministic, and different users can produce different Trycycler assemblies from the same input data. However, we demonstrated that multiple users with minimal training converge on similar assemblies that are consistently more accurate than those produced by automated assembly tools. We therefore recommend Trycycler+Medaka+Pilon as an ideal approach for generating high-quality bacterial reference genomes.


2020 ◽  
Author(s):  
E.G. Mogro ◽  
N. Ambrosis ◽  
M.J. Lozano

AbstractMotivationBacterial genomes are composed by a core and an accessory genome. The first composed of housekeeping and essential genes, while the second is composed, in its majority, of mobile genetic elements, including transposable elements (TEs). Insertion sequences (ISs), the smallest TEs, have an important role in genome evolution, and contribute to bacterial genome plasticity and adaptability. ISs can spread in a genome, presenting different locations in nearly related strains, and producing phenotypic variations. Few tools are available which can identify differentially located ISs (DLIS) on assembled genomes.ResultsWe developed ISCompare to profile IS mobilization events in related bacterial strains using complete or draft genome assemblies. ISCompare was validated using artificial genomes with simulated random IS insertions and real sequences, achieving the same or better results than other available tools, with the advantage that ISCompare can analyse multiple ISs at the same time and outputs a list of candidate DLIS. We think that ISCompare provides an easy and straightforward approach to look for differentially located ISs on bacterial genomes.Availability and implementationISCompare was implemented in python3 and its source code is freely available for download at https://github.com/maurijlozano/ISCompare.Supplementary informationSupplementary data are available at https://github.com/maurijlozano/ISCompare.


Genes ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 381 ◽  
Author(s):  
Olivier Tytgat ◽  
Yannick Gansemans ◽  
Jana Weymaere ◽  
Kaat Rubben ◽  
Dieter Deforce ◽  
...  

Nanopore sequencing for forensic short tandem repeats (STR) genotyping comes with the advantages associated with massively parallel sequencing (MPS) without the need for a high up-front device cost, but genotyping is inaccurate, partially due to the occurrence of homopolymers in STR loci. The goal of this study was to apply the latest progress in nanopore sequencing by Oxford Nanopore Technologies in the field of STR genotyping. The experiments were performed using the state of the art R9.4 flow cell and the most recent R10 flow cell, which was specifically designed to improve consensus accuracy of homopolymers. Two single-contributor samples and one mixture sample were genotyped using Illumina sequencing, Nanopore R9.4 sequencing, and Nanopore R10 sequencing. The accuracy of genotyping was comparable for both types of flow cells, although the R10 flow cell provided improved data quality for loci characterized by the presence of homopolymers. We identify locus-dependent characteristics hindering accurate STR genotyping, providing insights for the design of a panel of STR loci suited for nanopore sequencing. Repeat number, the number of different reference alleles for the locus, repeat pattern complexity, flanking region complexity, and the presence of homopolymers are identified as unfavorable locus characteristics. For single-contributor samples and for a limited set of the commonly used STR loci, nanopore sequencing could be applied. However, the technology is not mature enough yet for implementation in routine forensic workflows.


Author(s):  
Ezequiel G Mogro ◽  
Nicolás M Ambrosis ◽  
Mauricio J Lozano

Abstract Bacterial genomes are composed of core and accessory genomes. The first is composed of housekeeping and essential genes, while the second is highly enriched in mobile genetic elements, including transposable elements (TEs). Insertion sequences (ISs), the smallest TEs, have an important role in genome evolution, and contribute to bacterial genome plasticity and adaptability. ISs can spread in a genome, presenting different locations in nearly related strains, and producing phenotypic variations. Few tools are available which can identify differentially located ISs (DLISs) on assembled genomes. Here, we introduce ISCompare, a new program to profile IS mobilization events in related bacterial strains using complete or draft genome assemblies. ISCompare was validated using artificial genomes with simulated random IS insertions and real sequences, achieving the same or better results than other available tools, with the advantage that ISCompare can analyze multiple ISs at the same time and outputs a list of candidate DLISs. ISCompare provides an easy and straightforward approach to look for differentially located ISs on bacterial genomes.


2021 ◽  
Author(s):  
Arang Rhie ◽  
Ann Mc Cartney ◽  
Kishwar Shafin ◽  
Michael Alonge ◽  
Andrey Bzikadze ◽  
...  

Abstract Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first Telomere-to-Telomere (T2T) human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Though derived from highly accurate sequencing, evaluation revealed that the initial T2T draft assembly had evidence of small errors and structural misassemblies. To correct these errors, we designed a novel repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly QV to 73.9. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both PacBio HiFi and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5018 ◽  
Author(s):  
Mami Tanaka ◽  
Sayaka Mino ◽  
Yoshitoshi Ogura ◽  
Tetsuya Hayashi ◽  
Tomoo Sawabe

Whole genome sequence comparisons have become essential for establishing a robust scheme in bacterial taxonomy. To generalize this genome-based taxonomy, fast, reliable, and cost-effective genome sequencing methodologies are required. MinION, the palm-sized sequencer from Oxford Nanopore Technologies, enables rapid sequencing of bacterial genomes using minimal laboratory resources. Here we tested the ability of Nanopore sequences for the genome-based taxonomy of Vibrionaceae and compared Nanopore-only assemblies to complete genomes of five Rumoiensis clade species: Vibrio aphrogenes, V. algivorus, V. casei, V. litoralis, and V. rumoiensis. Comparison of overall genome relatedness indices (OGRI) and multilocus sequence analysis (MLSA) based on Nanopore-only assembly and Illumina or hybrid assemblies revealed that errors in Nanopore-only assembly do not influence average nucleotide identity (ANI), in silico DNA-DNA hybridization (DDH), G+C content, or MLSA tree topology in Vibrionaceae. Our results show that the genome sequences from Nanopore-based approach can be used for rapid species identification based on the OGRI and MLSA.


2021 ◽  
Author(s):  
Brian W Strehlow ◽  
Astrid Schuster ◽  
Warren R Francis ◽  
Donald E Canfield

Objectives: These data were collected to generate a novel reference metagenome for the sponge Halichondria panicea and its microbiome for subsequent differential expression analyses. Data description: These data include raw sequences from four separate sequencing runs of the metagenome of a single individual of H. panicea - one Illumina MiSeq (2x300 bp, paired-end) run and three Oxford Nanopore Technologies (ONT) long-read sequencing runs, generating 53.8 and 7.42 Gbp respectively. Comparing assemblies of Illumina, ONT and an Illumina-ONT hybrid revealed the hybrid to be the best assembly, comprising 163 Mbp in 63,555 scaffolds (N50: 3,084). This assembly, however, was still highly fragmented and only contained 52% of core metazoan genes (with 77.9% partial genes), so it was also not complete. However, this sponge is an emerging model species for field and laboratory work, and there is considerable interest in genomic sequencing of this species. Although the resultant assemblies from the data presented here are suboptimal, this data note can inform future studies by providing an estimated genome size and coverage requirements for future sequencing, sharing additional data to potentially improve other suboptimal assemblies of this species, and outlining potential limitations and pitfalls of the combined Illumina and ONT approach to novel genome sequencing.


2019 ◽  
Vol 35 (21) ◽  
pp. 4239-4246 ◽  
Author(s):  
Pierre Marijon ◽  
Rayan Chikhi ◽  
Jean-Stéphane Varré

Abstract Motivation Long-read genome assembly tools are expected to reconstruct bacterial genomes nearly perfectly; however, they still produce fragmented assemblies in some cases. It would be beneficial to understand whether these cases are intrinsically impossible to resolve, or if assemblers are at fault, implying that genomes could be refined or even finished with little to no additional experimental cost. Results We propose a set of computational techniques to assist inspection of fragmented bacterial genome assemblies, through careful analysis of assembly graphs. By finding paths of overlapping raw reads between pairs of contigs, we recover potential short-range connections between contigs that were lost during the assembly process. We show that our procedure recovers 45% of missing contig adjacencies in fragmented Canu assemblies, on samples from the NCTC bacterial sequencing project. We also observe that a simple procedure based on enumerating weighted Hamiltonian cycles can suggest likely contig orderings. In our tests, the correct contig order is ranked first in half of the cases and within the top-three predictions in nearly all evaluated cases, providing a direction for finishing fragmented long-read assemblies. Availability and implementation https://gitlab.inria.fr/pmarijon/knot . Supplementary information Supplementary data are available at Bioinformatics online.


DNA Research ◽  
2019 ◽  
Vol 26 (5) ◽  
pp. 391-398 ◽  
Author(s):  
Mitsuhiko P Sato ◽  
Yoshitoshi Ogura ◽  
Keiji Nakamura ◽  
Ruriko Nishida ◽  
Yasuhiro Gotoh ◽  
...  

Abstract In bacterial genome and metagenome sequencing, Illumina sequencers are most frequently used due to their high throughput capacity, and multiple library preparation kits have been developed for Illumina platforms. Here, we systematically analysed and compared the sequencing bias generated by currently available library preparation kits for Illumina sequencing. Our analyses revealed that a strong sequencing bias is introduced in low-GC regions by the Nextera XT kit. The level of bias introduced is dependent on the level of GC content; stronger bias is generated as the GC content decreases. Other analysed kits did not introduce this strong sequencing bias. The GC content-associated sequencing bias introduced by Nextera XT was more remarkable in metagenome sequencing of a mock bacterial community and seriously affected estimation of the relative abundance of low-GC species. The results of our analyses highlight the importance of selecting proper library preparation kits according to the purposes and targets of sequencing, particularly in metagenome sequencing, where a wide range of microbial species with various degrees of GC content is present. Our data also indicate that special attention should be paid to which library preparation kit was used when analysing and interpreting publicly available metagenomic data.


2020 ◽  
Author(s):  
Dandan Lang ◽  
Shilai Zhang ◽  
Pingping Ren ◽  
Fan Liang ◽  
Zongyi Sun ◽  
...  

AbstractThe availability of reference genomes has revolutionized the study of biology. Multiple competing technologies have been developed to improve the quality and robustness of genome assemblies during the last decade. The two widely-used long read sequencing providers – Pacbio (PB) and Oxford Nanopore Technologies (ONT) – have recently updated their platforms: PB enable high throughput HiFi reads with base-level resolution with >99% and ONT generated reads as long as 2 Mb. We applied the two up-to-date platforms to one single rice individual, and then compared the two assemblies to investigate the advantages and limitations of each. The results showed that ONT ultralong reads delivered higher contiguity producing a total of 18 contigs of which 10 were assembled into a single chromosome compared to that of 394 contigs and three chromosome-level contigs for the PB assembly. The ONT ultralong reads also prevented assembly errors caused by long repetitive regions for which we observed a total 44 genes of false redundancies and 10 genes of false losses in the PB assembly leading to over/under-estimations of the gene families in those long repetitive regions. We also noted that the PB HiFi reads generated assemblies with considerably less errors at the level of single nucleotide and small InDels than that of the ONT assembly which generated an average 1.06 errors per Kb assembly and finally engendered 1,475 incorrect gene annotations via altered or truncated protein predictions.


Sign in / Sign up

Export Citation Format

Share Document