Nanocall: an open source basecaller for Oxford Nanopore sequencing data

ABSTRACTMotivationThe highly portable Oxford Nanopore MinlON sequencer has enabled new applications of genome sequencing directly in the field. However, the MinlON currently relies on a cloud computing platform, Metrichor (metrichor.com), for translating locally generated sequencing data into basecalls.ResultsTo allow offline and private analysis of MinlON data, we created Nanocall. Nanocall is the first freely-available, open-source basecaller for Oxford Nanopore sequencing data and does not require an internet connection. On two E.coli and two human samples, with natural as well as PCR-amplified DNA, Nanocall reads have ~68% identity, directly comparable to Metrichor ”1D” data. Further, Nanocall is efficient, processing ~500Kbp of sequence per core hour, and fully parallelized. Using 8 cores, Nanocall could basecall a MinlON sequencing run in real time. Metrichor provides the ability to integrate the ”1D” sequencing of template and complement strands of a single DNA molecule, and create a ”2D” read. Nanocall does not currently integrate this technology, and addition of this capability will be an important future development. In summary, Nanocall is the first open-source, freely available, off-line basecaller for Oxford Nanopore sequencing data.AvailabilityNanocall is available at github.com/mateidavid/nanocall, released under the MIT license.Contactmatei.david at oicr.on.ca

Download Full-text

SACall: a neural network basecaller for Oxford Nanopore sequencing data based on self-attention mechanism

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2020.3039244 ◽

2020 ◽

pp. 1-1

Author(s):

Neng Huang ◽

Fan Nie ◽

Peng Ni ◽

Feng Luo ◽

Jianxin Wang

Keyword(s):

Neural Network ◽

Attention Mechanism ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore

Download Full-text

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.761791 ◽

2021 ◽

Vol 12 ◽

Author(s):

Davide Bolognini ◽

Alberto Magi

Keyword(s):

Variant Calling ◽

Research Report ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Factors Affecting ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Sequencing Studies ◽

Long Read

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.

Download Full-text

Detection of Clinically Relevant Molecular Alterations in Chronic Lymphocytic Leukemia (CLL) By Nanopore Sequencing

Blood ◽

10.1182/blood-2018-99-110948 ◽

2018 ◽

Vol 132 (Supplement 1) ◽

pp. 1847-1847 ◽

Cited By ~ 1

Author(s):

Adam Burns ◽

David Robert Bruce ◽

Pauline Robbe ◽

Adele Timbs ◽

Basile Stamatopoulos ◽

...

Keyword(s):

Error Correction ◽

Low Cost ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Mutation Status ◽

Short Read ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Low Coverage ◽

Oxford Nanopore Technologies

Abstract Introduction Chronic Lymphocytic Leukaemia (CLL) is the most prevalent leukaemia in the Western world and characterised by clinical heterogeneity. IgHV mutation status, mutations in the TP53 gene and deletions of the p-arm of chromosome 17 are currently used to predict an individual patient's response to therapy and give an indication as to their long-term prognosis. Current clinical guidelines recommend screening patients prior to initial, and any subsequent, treatment. Routine clinical laboratory practices for CLL involve three separate assays, each of which are time-consuming and require significant investment in equipment. Nanopore sequencing offers a rapid, low-cost alternative, generating a full prognostic dataset on a single platform. In addition, Nanopore sequencing also promises low failure rates on degraded material such as FFPE and excellent detection of structural variants due to long read length of sequencing. Importantly, Nanopore technology does not require expensive equipment, is low-maintenance and ideal for patient-near testing, making it an attractive DNA sequencing device for low-to-middle-income countries. Methods Eleven untreated CLL samples were selected for the analysis, harbouring both mutated (n=5) and unmutated (n=6) IgHV genes, seven TP53 mutations (five missense, one stop gain and one frameshift) and two del(17p) events. Primers were designed to amplify all exons of TP53, along with the IgHV locus, and each primer included universal tails for individual sample barcoding. The resulting PCR amplicons were prepared for sequencing using a ligation sequencing kit (SQK-LSK108, Oxford Nanopore Technologies, Oxford, UK). All IgHV libraries were pooled and sequenced on one R9.4 flowcell, with the TP53 libraries pooled and sequenced on a second R9.4 flowcell. Whole genome libraries were prepared from 400ng genomic DNA for each sample using a rapid sequencing kit (SQK-RAD004, Oxford Nanopore Technologies, Oxford, UK), and each sample sequenced on individual flowcells on a MinION mk1b instrument (Oxford Nanopore Technologies, Oxford, UK). We developed a bespoke bioinformatics pipeline to detect copy-number changes, TP53 mutations and IgHV mutation status from the Nanopore sequencing data. Results were compared to short-read sequencing data obtained earlier by targeted deep sequencing (MiSeq, Illumina Inc, San Diego, CA, USA) and whole genome sequencing (HiSeq 2500, Illumina Inc, San Diego CA, USA). Results Following basecalling and adaptor trimming, the raw data were submitted to the IMGT database. In the absence of error correction, it was possible to identify the correct VH family for each sample; however the germline homology was not sufficient to differentiate between IgHVmut and IgHVunmut CLL cases. Following bio-informatic error correction and consensus building, the percentage to germline homology was the same as that obtained from short-read sequencing and nanopore sequencing also called the same productive rearrangements in all cases. A total of 77 TP53 variants were identified, including 68 in non-coding regions, and three synonymous SNVs. The remaining 6 were predicted to be functional variants (eight missense and two stop-gains) and had all been identified in early MiSeq targeted sequencing. However, the frameshift mutation was not called by the analysis pipeline, although it is present in the aligned reads. Using the low-coverage WGS data, we were able to identify del(17p) events, of 19Mb and 20Mb length, in both patients with high confidence. Conclusions Here we demonstrate that characterization of the IgHV locus in CLL cases is possible using the MinION platform, provided sufficient downstream analysis, including error correction, is applied. Furthermore, somatic SNVs in TP53 can be identified, although similar to second generation sequencing, variant calling of small insertions and deletions is more problematic. Identification of del(17p) is possible from low-coverage WGS on the MinION and is inexpensive. Our data demonstrates that Nanopore sequencing can be a viable, patient-near, low-cost alternative to established screening methods, with the potential of diagnostic implementation in resource-poor regions of the world. Disclosures Schuh: Giles, Roche, Janssen, AbbVie: Honoraria.

Download Full-text

An attention-based neural network basecaller for Oxford Nanopore sequencing data

2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm47256.2019.8983231 ◽

2019 ◽

Author(s):

Neng Huang ◽

Fan Nie ◽

Peng Ni ◽

Feng Luo ◽

Jianxin Wang

Keyword(s):

Neural Network ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore

Download Full-text

A comparative analysis of computational tools for the prediction of epigenetic DNA methylation from long-read sequencing data

10.1101/2021.04.24.441281 ◽

2021 ◽

Author(s):

Shruta Sandesh Pai ◽

Aimee Rachel Mathew ◽

Roy Anindya

Keyword(s):

Dna Methylation ◽

Dna Modification ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Computational Tools ◽

Specific Location ◽

Oxford Nanopore ◽

Long Read ◽

Dna Methylations ◽

Higher Eukaryotes

AbstractRecent development of Oxford Nanopore long-read sequencing has opened new avenues of identifying epigenetic DNA methylation. Among the different epigenetic DNA methylations, N6-methyladenosine is the most prevalent DNA modification in prokaryotes and 5-methylcytosine is common in higher eukaryotes. Here we investigated if N6-methyladenosine and 5-methylcytosine modifications could be predicted from the nanopore sequencing data. Using publicly available genome sequencing data of Saccharomyces cerevisiae, we compared the open-access computational tools, including Tombo, mCaller, Nanopolish and DeepSignal for predicting 6mA and 5mC. Our results suggest that Tombo and mCaller can predict DNA N6-methyladenosine modifications at a specific location, whereas, Tombo dampened fraction, Nanopolish methylation likelihood and DeepSignal methylation probability have comparable efficiency for 5-methylcytosine prediction from Oxford Nanopore sequencing data.

Download Full-text

DNAscent v2: Detecting Replication Forks in Nanopore Sequencing Data with Deep Learning

10.1101/2020.11.04.368225 ◽

2020 ◽

Author(s):

Michael A. Boemo

Keyword(s):

Single Molecule ◽

Cell Populations ◽

Nanopore Sequencing ◽

Replication Forks ◽

Sequencing Data ◽

Single Base ◽

Oxford Nanopore ◽

Experimental Protocols ◽

Single Base Resolution ◽

Oxford Nanopore Technologies

AbstractThe detection of base analogues in Oxford Nanopore Technologies (ONT) sequencing reads has become a promising new method for the high-throughput measurement of DNA replication dynamics with single-molecule resolution. This paper introduces DNAscent v2, software that uses a residual neural network to achieve fast, accurate detection of the thymidine analogue BrdU with single-base resolution. DNAscent v2 comes equipped with an autoencoder that detects replication forks, origins, and termination sites in ONT sequencing reads from both synchronous and asynchronous cell populations, outcompeting previous versions and other tools across different experimental protocols. DNAscent v2 is open-source and available at https://github.com/MBoemo/DNAscent.

Download Full-text

CRISPR-Cas off-target detection using Oxford Nanopore sequencing - is the mitochondrial genome more vulnerable to off-targets?

10.1101/741322 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sandeep Chakraborty

Keyword(s):

Mitochondrial Genome ◽

Human Subjects ◽

Pcr Amplification ◽

Distal Region ◽

Error Rates ◽

Fast Method ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore ◽

Bona Fide

AbstractOxford Nanopore sequencing of DNA molecules is fast gaining popularity for generating longer reads, albeit with higher error rates, in much lesser time, and without the error introduced by PCR-amplification. Recently, CRISPR-Cas9 has been used to enrich genomic regions (nCATS [1]). This was applied on 10 genomic loci (median length=18kb). Here, using the sequencing data (Accid:PRJNA531320), it is shown that the same flow can be used to identify CRISPR-Cas9 off-target edits (OTE). OTEs are an important, but unfortunately underestimated, aspect of CRISPR-Cas gene-editing. An OTE in the mitochondrial genome is shown having 7 mismatches with one of the 10 gRNAs used (GPX1), having as much enrichment as the targeted genomic loci in some samples. Previous study has shown that Cas9 bind to off-targets having as many as 10 mismatches in the PAM-distal region. This OTE has not been reported in the original study (still a pre-print), which states that sequences from parts other than the target locations arise ‘from ligation of nanopore adaptors to random breakage points, with no clear evidence of off-target cleavage by Cas9’ [1], Furthermore, a lot of reads aligning to the mitochondrial genome (sometimes full length) are inverted after the edit. It remains to be seen if these are bona fide translocations after the Cas9 edit, or ONP sequencing artifacts. This also raises the question whether the mitochondrial genome is more prone to off-targets by virtue of being non-nuclear. Another locus in ChrX (13121412) has only 1 mismatch with the second BRAF gRNA (GACCAAGGATTTCGTGGTGA). Although the number of reads for this OTE is less, its very unlikely this is random since it happens 8 out of 11 samples. With the increasing use of (TALEN/ZFN/CRISPR-Cas9) on human subjects, this provides a fast method to quickly query gRNAs for off-targets in cells obtained from the patient, which will have their own unique off-targets due to single nucleotide polymorphism or other variants.

Download Full-text

PyPore: a python toolbox for nanopore sequencing data handling

Bioinformatics ◽

10.1093/bioinformatics/btz269 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4445-4447 ◽

Cited By ~ 1

Author(s):

Roberto Semeraro ◽

Alberto Magi

Keyword(s):

Open Source Software ◽

Reference Genome ◽

State Of The Art ◽

Supplementary Information ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Software Packages ◽

Technological Improvement ◽

Fastq Format ◽

Oxford Nanopore

Abstract Motivation The recent technological improvement of Oxford Nanopore sequencing pushed the throughput of these devices to 10–20 Gb allowing the generation of millions of reads. For these reasons, the availability of fast software packages for evaluating experimental quality by generating highly informative and interactive summary plots is of fundamental importance. Results We developed PyPore, a three module python toolbox designed to handle raw FAST5 files from quality checking to alignment to a reference genome and to explore their features through the generation of browsable HTML files. The first module provides an interface to explore and evaluate the information contained in FAST5 and summarize them into informative quality measures. The second module converts raw data in FASTQ format, while the third module allows to easily use three state-of-the-art aligners and collects mapping statistics. Availability and implementation PyPore is an open-source software and is written in Python2.7, source code is freely available, for all OS platforms, in Github at https://github.com/rsemeraro/PyPore Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Real-time demultiplexing Nanopore barcoded sequencing data with npBarcode

10.1101/134155 ◽

2017 ◽

Cited By ~ 1

Author(s):

Son Hoang Nguyen ◽

Tania Duarte ◽

Lachlan J. M. Coin ◽

Minh Duc Cao

Keyword(s):

Real Time ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Analysis Pipeline ◽

Bioinformatic Tools ◽

Oxford Nanopore ◽

Pooled Sequencing ◽

Cross Platform ◽

Real Time Applications ◽

Friendly Graphical User Interface

AbstractMotivationThe recently introduced barcoding protocol to Oxford Nanopore sequencing has increased the versatility of the technology. Several bioinformatic tools have been developed to demultiplex the barcoded reads, but none of them support the streaming analysis. This limits the use of pooled sequencing in real-time applications, which is one of the main advantages of the technology.ResultsWe introduced npBarcode, an open source and cross platform tool for barcode demultiplex in streaming fashion. npBarcode can be seamlessly integrated into a streaming analysis pipeline. The tool also provides a friendly graphical user interface through npReader, allowing the real-time visual monitoring of the sequencing progress of barcoded samples. We show that npBarcode achieves comparable accuracies to the other alternatives.AvailabilitynpBarcode is bundled in Japsa - a Java tools kit for genome analysis, and is freely available at https://github.com/hsnguyen/npBarcode.

Download Full-text