scholarly journals An End-to-end Oxford Nanopore Basecaller Using Convolution-augmented Transformer

2020 ◽  
Author(s):  
Xuan Lv ◽  
Zhiguang Chen ◽  
Yutong Lu ◽  
Yuedong Yang

AbstractOxford Nanopore sequencing is fastly becoming an active field in genomics, and it’s critical to basecall nucleotide sequences from the complex electrical signals. Many efforts have been devoted to developing new basecalling tools over the years. However, the basecalled reads still suffer from a high error rate and slow speed. Here, we developed an open-source basecalling method, CATCaller, by simultaneously capturing global context through Attention and modeling local dependencies through dynamic convolution. The method was shown to consistently outper-form the ONT default basecaller Albacore, Guppy, and a recently developed attention-based method SACall in read accuracy. More importantly, our method is fast through a heterogeneously computational model to integrate both CPUs and GPUs. When compared to SACall, the method is nearly 4 times faster on a single GPU, and is highly scalable in parallelization with a further speedup of 3.3 on a four-GPU node.

2016 ◽  
Author(s):  
Matei David ◽  
L.J. Dursi ◽  
Delia Yao ◽  
Paul C. Boutros ◽  
Jared T. Simpson

ABSTRACTMotivationThe highly portable Oxford Nanopore MinlON sequencer has enabled new applications of genome sequencing directly in the field. However, the MinlON currently relies on a cloud computing platform, Metrichor (metrichor.com), for translating locally generated sequencing data into basecalls.ResultsTo allow offline and private analysis of MinlON data, we created Nanocall. Nanocall is the first freely-available, open-source basecaller for Oxford Nanopore sequencing data and does not require an internet connection. On two E.coli and two human samples, with natural as well as PCR-amplified DNA, Nanocall reads have ~68% identity, directly comparable to Metrichor ”1D” data. Further, Nanocall is efficient, processing ~500Kbp of sequence per core hour, and fully parallelized. Using 8 cores, Nanocall could basecall a MinlON sequencing run in real time. Metrichor provides the ability to integrate the ”1D” sequencing of template and complement strands of a single DNA molecule, and create a ”2D” read. Nanocall does not currently integrate this technology, and addition of this capability will be an important future development. In summary, Nanocall is the first open-source, freely available, off-line basecaller for Oxford Nanopore sequencing data.AvailabilityNanocall is available at github.com/mateidavid/nanocall, released under the MIT license.Contactmatei.david at oicr.on.ca


2019 ◽  
Vol 85 (21) ◽  
Author(s):  
Kaire Loit ◽  
Kalev Adamson ◽  
Mohammad Bahram ◽  
Rasmus Puusepp ◽  
Sten Anslan ◽  
...  

ABSTRACT Culture-based molecular identification methods have revolutionized detection of pathogens, yet these methods are slow and may yield inconclusive results from environmental materials. The second-generation sequencing tools have much-improved precision and sensitivity of detection, but these analyses are costly and may take several days to months. Of the third-generation sequencing techniques, the portable MinION device (Oxford Nanopore Technologies) has received much attention because of its small size and possibility of rapid analysis at reasonable cost. Here, we compare the relative performances of two third-generation sequencing instruments, MinION and Sequel (Pacific Biosciences), in identification and diagnostics of fungal and oomycete pathogens from conifer (Pinaceae) needles and potato (Solanum tuberosum) leaves and tubers. We demonstrate that the Sequel instrument is efficient for metabarcoding of complex samples, whereas MinION is not suited for this purpose due to a high error rate and multiple biases. However, we find that MinION can be utilized for rapid and accurate identification of dominant pathogenic organisms and other associated organisms from plant tissues following both amplicon-based and PCR-free metagenomics approaches. Using the metagenomics approach with shortened DNA extraction and incubation times, we performed the entire MinION workflow, from sample preparation through DNA extraction, sequencing, bioinformatics, and interpretation, in 2.5 h. We advocate the use of MinION for rapid diagnostics of pathogens and potentially other organisms, but care needs to be taken to control or account for multiple potential technical biases. IMPORTANCE Microbial pathogens cause enormous losses to agriculture and forestry, but current combined culturing- and molecular identification-based detection methods are too slow for rapid identification and application of countermeasures. Here, we develop new and rapid protocols for Oxford Nanopore MinION-based third-generation diagnostics of plant pathogens that greatly improve the speed of diagnostics. However, due to high error rate and technical biases in MinION, the Pacific BioSciences Sequel platform is more useful for in-depth amplicon-based biodiversity monitoring (metabarcoding) from complex environmental samples.


2019 ◽  
Author(s):  
Kaire Loit ◽  
Kalev Adamson ◽  
Mohammad Bahram ◽  
Rasmus Puusepp ◽  
Sten Anslan ◽  
...  

ABSTRACTCulture-based molecular characterization methods have revolutionized detection of pathogens, yet these methods are either slow or imprecise. The second-generation sequencing tools have much improved precision and sensitivity of detection, but the analysis processes are costly and take several days. Of third-generation techniques, the portable Oxford Nanopore MinION device has received much attention because of its small size and possibility of rapid analysis at reasonable cost. Here, we compare the relative performance of two third-generation sequencing instruments, MinION and Pacific Biosciences Sequel in identification and diagnostics of pathogens from conifer needles and potato leaves and tubers. We demonstrate that Sequel is efficient in metabarcoding of complex samples, whereas MinION is not suited for this purpose due to the high error rate and multiple biases. However, we find that MinION can be utilized for rapid and accurate identification of dominant pathogenic organisms from plant tissues following both amplicon-based and metagenomics-based approaches. Using the PCR-free approach with shortened extraction and incubation times, we performed the entire MinION workflow from sample preparation through DNA extraction, sequencing, bioinformatics and interpretation in two and half hours. We advocate the use of MinION for rapid diagnostics of pathogens, but care needs to be taken to control or account for all potential technical biases.IMPORTANCEWe develop new and rapid protocols for MinION-based third-generation diagnostics of plant pathogens that greatly improves the speed and precision of diagnostics. Due to high error rate and technical biases in MinION, PacBio Sequel platform is more useful for amplicon-based metabarcoding from complex biological samples.


2016 ◽  
Vol 33 (1) ◽  
pp. 49-55 ◽  
Author(s):  
Matei David ◽  
L. J. Dursi ◽  
Delia Yao ◽  
Paul C. Boutros ◽  
Jared T. Simpson

2019 ◽  
Author(s):  
Michael Hahn ◽  
Frank Keller ◽  
Yonatan Bisk ◽  
Yonatan Belinkov

Intuitively, human readers cope easily with errors in text; typos, misspelling, word substitutions, etc. do not unduly disrupt natural reading. Previous work indicates that letter transpositions result in increased reading times, but it is unclear if this effect generalizes to more natural errors. In this paper, we report an eye-tracking study that compares two error types (letter transpositions and naturally occurring misspelling) and two error rates (10% or 50% of all words contain errors). We find that human readers show unimpaired comprehension in spite of these errors, but error words cause more reading difficulty than correct words. Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones. We then present a computational model that uses character-based (rather than traditional word-based) surprisal to account for these results. The model explains that transpositions are harder than misspellings because they contain unexpected letter combinations. It also explains the error rate effect: upcoming words are more difficultto predict when the context is degraded, leading to increased surprisal.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0257521
Author(s):  
Clara Delahaye ◽  
Jacques Nicolas

Oxford Nanopore Technologies’ (ONT) long read sequencers offer access to longer DNA fragments than previous sequencer generations, at the cost of a higher error rate. While many papers have studied read correction methods, few have addressed the detailed characterization of observed errors, a task complicated by frequent changes in chemistry and software in ONT technology. The MinION sequencer is now more stable and this paper proposes an up-to-date view of its error landscape, using the most mature flowcell and basecaller. We studied Nanopore sequencing error biases on both bacterial and human DNA reads. We found that, although Nanopore sequencing is expected not to suffer from GC bias, it is a crucial parameter with respect to errors. In particular, low-GC reads have fewer errors than high-GC reads (about 6% and 8% respectively). The error profile for homopolymeric regions or regions with short repeats, the source of about half of all sequencing errors, also depends on the GC rate and mainly shows deletions, although there are some reads with long insertions. Another interesting finding is that the quality measure, although over-estimated, offers valuable information to predict the error rate as well as the abundance of reads. We supplemented this study with an analysis of a rapeseed RNA read set and shown a higher level of errors with a higher level of deletion in these data. Finally, we have implemented an open source pipeline for long-term monitoring of the error profile, which enables users to easily compute various analysis presented in this work, including for future developments of the sequencing device. Overall, we hope this work will provide a basis for the design of better error-correction methods.


Viruses ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1424
Author(s):  
Lia W. Liefting ◽  
David W. Waite ◽  
Jeremy R. Thompson

The adoption of Oxford Nanopore Technologies (ONT) sequencing as a tool in plant virology has been relatively slow despite its promise in more recent years to yield large quantities of long nucleotide sequences in real time without the need for prior amplification. The portability of the MinION and Flongle platforms combined with lowering costs and continued improvements in read accuracy make ONT an attractive method for both low- and high-scale virus diagnostics. Here, we provide a detailed step-by-step protocol using the ONT Flongle platform that we have developed for the routine application on a range of symptomatic post-entry quarantine and domestic surveillance plant samples. The aim of this methods paper is to highlight ONT’s feasibility as a valuable component to the diagnostician’s toolkit and to hopefully stimulate other laboratories towards the eventual goal of integrating high-throughput sequencing technologies as validated plant virus diagnostic methods in their own right.


Author(s):  
Yunfan Fan ◽  
Andrew N Gale ◽  
Anna Bailey ◽  
Kali Barnes ◽  
Kiersten Colotti ◽  
...  

Abstract We present a highly contiguous genome and transcriptome of the pathogenic yeast, Candida nivariensis. We sequenced both the DNA and RNA of this species using both the Oxford Nanopore Technologies (ONT) and Illumina platforms. We assembled the genome into an 11.8 Mb draft composed of 16 contigs with an N50 of 886 Kb, including a circular mitochondrial sequence of 28 Kb. Using direct RNA nanopore sequencing and Illumina cDNA sequencing, we constructed an annotation of our new assembly, supplemented by lifting over genes from Saccharomyces cerevisiae and Candida glabrata.


Sign in / Sign up

Export Citation Format

Share Document