scholarly journals Nanopore sequencing enables high-resolution analysis of resistance determinants and mobile elements in the human gut microbiome

2018 ◽  
Author(s):  
Denis Bertrand ◽  
Jim Shaw ◽  
Manesh Kalathiappan ◽  
Amanda Hui Qi Ng ◽  
Senthil Muthiah ◽  
...  

AbstractThe analysis of information rich whole-metagenome datasets acquired from complex microbial communities is often restricted by the fragmented nature of assembly from short-read sequencing. The availability of long-reads from third-generation sequencing technologies (e.g. PacBio or Oxford Nanopore) can help improve assembly quality in principle, but high error rates and low throughput have limited their application in metagenomics. In this work, we describe the first hybrid metagenomic assembler which combines the advantages of short and long-read technologies, providing an order of magnitude improvement in contiguity compared to short read assemblies, and high base-pair level accuracy. The proposed approach (OPERA-MS) integrates a novel assembly-based metagenome clustering technique with an exact scaffolding algorithm that can efficiently assemble repeat rich sequences. Based on evaluations with defined in vitro communities and virtual gut microbiomes, we show that it is possible to assemble near complete genomes from metagenomes with as little as 9× long read coverage, thus enabling high quality assembly of lowly abundant species (<1%). Furthermore, OPERA-MS’s fine-grained clustering is able to deconvolute and assemble multiple genomes of the same species in a single sample, allowing us to study the complex dynamics of the human microbiome at the sub-species level. Applying nanopore sequencing to gut metagenomes of patients undergoing antibiotic treatment, we show that long reads can be obtained from stool samples in clinical studies to produce more meaningful metagenomic assemblies (up to 200× improvement over short-read assemblies), including the closed assembly of >80 putative plasmid/phage sequences and a 263kbp jumbo phage. Our results highlight that high-quality hybrid assemblies provide an unprecedented view of the gut resistome in these patients, including strain dynamics and identification of novel plasmid sequences.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Seth Commichaux ◽  
Kiran Javkar ◽  
Padmini Ramachandran ◽  
Niranjan Nagarajan ◽  
Denis Bertrand ◽  
...  

Abstract Background Whole genome sequencing of cultured pathogens is the state of the art public health response for the bioinformatic source tracking of illness outbreaks. Quasimetagenomics can substantially reduce the amount of culturing needed before a high quality genome can be recovered. Highly accurate short read data is analyzed for single nucleotide polymorphisms and multi-locus sequence types to differentiate strains but cannot span many genomic repeats, resulting in highly fragmented assemblies. Long reads can span repeats, resulting in much more contiguous assemblies, but have lower accuracy than short reads. Results We evaluated the accuracy of Listeria monocytogenes assemblies from enrichments (quasimetagenomes) of naturally-contaminated ice cream using long read (Oxford Nanopore) and short read (Illumina) sequencing data. Accuracy of ten assembly approaches, over a range of sequencing depths, was evaluated by comparing sequence similarity of genes in assemblies to a complete reference genome. Long read assemblies reconstructed a circularized genome as well as a 71 kbp plasmid after 24 h of enrichment; however, high error rates prevented high fidelity gene assembly, even at 150X depth of coverage. Short read assemblies accurately reconstructed the core genes after 28 h of enrichment but produced highly fragmented genomes. Hybrid approaches demonstrated promising results but had biases based upon the initial assembly strategy. Short read assemblies scaffolded with long reads accurately assembled the core genes after just 24 h of enrichment, but were highly fragmented. Long read assemblies polished with short reads reconstructed a circularized genome and plasmid and assembled all the genes after 24 h enrichment but with less fidelity for the core genes than the short read assemblies. Conclusion The integration of long and short read sequencing of quasimetagenomes expedited the reconstruction of a high quality pathogen genome compared to either platform alone. A new and more complete level of information about genome structure, gene order and mobile elements can be added to the public health response by incorporating long read analyses with the standard short read WGS outbreak response.


2015 ◽  
Author(s):  
Ivan Sovic ◽  
Mile Sikic ◽  
Andreas Wilm ◽  
Shannon Nicole Fenlon ◽  
Swaine Chen ◽  
...  

Exploiting the power of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. We present the first nanopore read mapper (GraphMap) that uses a read-funneling paradigm to robustly handle variable error rates and fast graph traversal to align long reads with speed and very high precision (>95%). Evaluation on MinION sequencing datasets against short and long-read mappers indicates that GraphMap increases mapping sensitivity by at least 15-80%. GraphMap alignments are the first to demonstrate consensus calling with <1 error in 100,000 bases, variant calling on the human genome with 76% improvement in sensitivity over the next best mapper (BWA-MEM), precise detection of structural variants from 100bp to 4kbp in length and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S11) ◽  
Author(s):  
Arghya Kusum Das ◽  
Sayan Goswami ◽  
Kisung Lee ◽  
Seung-Jong Park

Abstract Background Long-read sequencing has shown the promises to overcome the short length limitations of second-generation sequencing by providing more complete assembly. However, the computation of the long sequencing reads is challenged by their higher error rates (e.g., 13% vs. 1%) and higher cost ($0.3 vs. $0.03 per Mbp) compared to the short reads. Methods In this paper, we present a new hybrid error correction tool, called ParLECH (Parallel Long-read Error Correction using Hybrid methodology). The error correction algorithm of ParLECH is distributed in nature and efficiently utilizes the k-mer coverage information of high throughput Illumina short-read sequences to rectify the PacBio long-read sequences.ParLECH first constructs a de Bruijn graph from the short reads, and then replaces the indel error regions of the long reads with their corresponding widest path (or maximum min-coverage path) in the short read-based de Bruijn graph. ParLECH then utilizes the k-mer coverage information of the short reads to divide each long read into a sequence of low and high coverage regions, followed by a majority voting to rectify each substituted error base. Results ParLECH outperforms latest state-of-the-art hybrid error correction methods on real PacBio datasets. Our experimental evaluation results demonstrate that ParLECH can correct large-scale real-world datasets in an accurate and scalable manner. ParLECH can correct the indel errors of human genome PacBio long reads (312 GB) with Illumina short reads (452 GB) in less than 29 h using 128 compute nodes. ParLECH can align more than 92% bases of an E. coli PacBio dataset with the reference genome, proving its accuracy. Conclusion ParLECH can scale to over terabytes of sequencing data using hundreds of computing nodes. The proposed hybrid error correction methodology is novel and rectifies both indel and substitution errors present in the original long reads or newly introduced by the short reads.


2021 ◽  
Author(s):  
Yelena Chernyavskaya ◽  
Xiaofei Zhang ◽  
Jinze Liu ◽  
Jessica S. Blackburn

Nanopore sequencing technology has revolutionized the field of genome biology with its ability to generate extra-long reads that can resolve regions of the genome that were previously inaccessible to short-read sequencing platforms. Although long-read sequencing has been used to resolve several vertebrate genomes, a nanopore-based zebrafish assembly has not yet been released. Over 50% of the zebrafish genome consists of difficult to map, highly repetitive, low complexity elements that pose inherent problems for short-read sequencers and assemblers. We used nanopore sequencing to improve upon and resolve the issues plaguing the current zebrafish reference assembly (GRCz11). Our long-read assembly improved the current resolution of the reference genome by identifying 1,697 novel insertions and deletions over 1Kb in length and placing 106 previously unlocalized scaffolds. We also discovered additional sites of retrotransposon integration previously unreported in GRCz11 and observed their expression in adult zebrafish under physiologic conditions, implying they have active mobility in the zebrafish genome and contribute to the ever-changing genomic landscape.


Author(s):  
Leho Tedersoo ◽  
Mads Albertsen ◽  
Sten Anslan ◽  
Benjamin Callahan

Short-read, high-throughput sequencing (HTS) methods have yielded numerous important insights into microbial ecology and function. Yet, in many instances short-read HTS techniques are suboptimal, for example by providing insufficient phylogenetic resolution or low integrity of assembled genomes. Single-molecule and synthetic long-read (SLR) HTS methods have successfully ameliorated these limitations. In addition, nanopore sequencing has generated a number of unique analysis opportunities such as rapid molecular diagnostics and direct RNA sequencing, and both PacBio and nanopore sequencing support detection of epigenetic modifications. Although initially suffering from relatively low sequence quality, recent advances have greatly improved the accuracy of long read sequencing technologies. In spite of great technological progress in recent years, the long-read HTS methods (PacBio and nanopore sequencing) are still relatively costly, require large amounts of high-quality starting material, and commonly need specific solutions in various analysis steps. Despite these challenges, long-read sequencing technologies offer high-quality, cutting-edge alternatives for testing hypotheses about microbiome structure and functioning as well as assembly of eukaryote genomes from complex environmental DNA samples.


2020 ◽  
Vol 21 (23) ◽  
pp. 9161
Author(s):  
Zhao Chen ◽  
David L. Erickson ◽  
Jianghong Meng

Oxford Nanopore sequencing can be used to achieve complete bacterial genomes. However, the error rates of Oxford Nanopore long reads are greater compared to Illumina short reads. Long-read assemblers using a variety of assembly algorithms have been developed to overcome this deficiency, which have not been benchmarked for genomic analyses of bacterial pathogens using Oxford Nanopore long reads. In this study, long-read assemblers, namely Canu, Flye, Miniasm/Racon, Raven, Redbean, and Shasta, were thus benchmarked using Oxford Nanopore long reads of bacterial pathogens. Ten species were tested for mediocre- and low-quality simulated reads, and 10 species were tested for real reads. Raven was the most robust assembler, obtaining complete and accurate genomes. All Miniasm/Racon and Raven assemblies of mediocre-quality reads provided accurate antimicrobial resistance (AMR) profiles, while the Raven assembly of Klebsiella variicola with low-quality reads was the only assembly with an accurate AMR profile among all assemblers and species. All assemblers functioned well for predicting virulence genes using mediocre-quality and real reads, whereas only the Raven assemblies of low-quality reads had accurate numbers of virulence genes. Regarding multilocus sequence typing (MLST), Miniasm/Racon was the most effective assembler for mediocre-quality reads, while only the Raven assemblies of Escherichia coli O157:H7 and K. variicola with low-quality reads showed positive MLST results. Miniasm/Racon and Raven were the best performers for MLST using real reads. The Miniasm/Racon and Raven assemblies showed accurate phylogenetic inference. For the pan-genome analyses, Raven was the strongest assembler for simulated reads, whereas Miniasm/Racon and Raven performed the best for real reads. Overall, the most robust and accurate assembler was Raven, closely followed by Miniasm/Racon.


2021 ◽  
Author(s):  
Mantas Sereika ◽  
Rasmus Hansen Krikegaard ◽  
Søren Michael Karst ◽  
Thomas Yssing Michaelsen ◽  
Emil Aarre Sørensen ◽  
...  

Short-read DNA sequencing has led to a massive growth of genome databases but mainly with highly fragmented metagenome assembled genomes from environmental systems. The fragmentation is a result of closely related species, strains, and genome repeats that cannot be resolved with short reads. To confidently explore the functional potential of a microbial community, high-quality reference genomes are needed. In this study, we evaluated the use of different combinations of short (Illumina) and long-read technologies (Nanopore R9.4, R10.3, and PacBio CCS) for recovering high-quality metagenome assembled genomes (HQ MAGs) from a complex microbial community (anaerobic digester). Depending on the sequencing approach, 33 to 86 HQ MAGs (encompassing up to 34 % of the assembly and 49 % of the reads) were recovered using long reads, with Nanopore R9 featuring the lowest sequencing costs per HQ MAG recovered. PacBio CCS was also found to be an effective platform for genome-centric metagenomics (74 HQ MAGs) and produced HQ MAGs with the lowest fragmentation (median of 9 contigs) as a stand-alone technology. Using PacBio CCS MAGs as reference, we show that, although a high number of high-quality MAGs can be generated using Nanopore R9, systematic indel errors are still present, which can lead to truncated gene calling. However, polishing the Nanopore MAGs with short-read Illumina data, enabled recovery of MAGs with similar quality as MAGs from PacBio CCS.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2020 ◽  
Author(s):  
Andrew J. Page ◽  
Nabil-Fareed Alikhan ◽  
Michael Strinden ◽  
Thanh Le Viet ◽  
Timofey Skvortsov

AbstractSpoligotyping of Mycobacterium tuberculosis provides a subspecies classification of this major human pathogen. Spoligotypes can be predicted from short read genome sequencing data; however, no methods exist for long read sequence data such as from Nanopore or PacBio. We present a novel software package Galru, which can rapidly detect the spoligotype of a Mycobacterium tuberculosis sample from as little as a single uncorrected long read. It allows for near real-time spoligotyping from long read data as it is being sequenced, giving rapid sample typing. We compare it to the existing state of the art software and find it performs identically to the results obtained from short read sequencing data. Galru is freely available from https://github.com/quadram-institute-bioscience/galru under the GPLv3 open source licence.


2018 ◽  
Vol 7 (23) ◽  
Author(s):  
Narjol González-Escalona ◽  
Kuan Yao ◽  
Maria Hoffmann

Here we report the genome sequence of Salmonella enterica serovar Richmond strain CFSAN000191, isolated from tilapia from Thailand in 2005. The genome was determined by a combination of long-read and short-read sequencing.


Sign in / Sign up

Export Citation Format

Share Document