Real-time demultiplexing Nanopore barcoded sequencing data with npBarcode

AbstractMotivationThe recently introduced barcoding protocol to Oxford Nanopore sequencing has increased the versatility of the technology. Several bioinformatic tools have been developed to demultiplex the barcoded reads, but none of them support the streaming analysis. This limits the use of pooled sequencing in real-time applications, which is one of the main advantages of the technology.ResultsWe introduced npBarcode, an open source and cross platform tool for barcode demultiplex in streaming fashion. npBarcode can be seamlessly integrated into a streaming analysis pipeline. The tool also provides a friendly graphical user interface through npReader, allowing the real-time visual monitoring of the sequencing progress of barcoded samples. We show that npBarcode achieves comparable accuracies to the other alternatives.AvailabilitynpBarcode is bundled in Japsa - a Java tools kit for genome analysis, and is freely available at https://github.com/hsnguyen/npBarcode.

Download Full-text

TagSeqTools: a flexible and comprehensive analysis pipeline for NAD tagSeq data

10.1101/2020.03.09.982934 ◽

2020 ◽

Cited By ~ 1

Author(s):

Huan Zhong ◽

Zongwei Cai ◽

Zhu Yang ◽

Yiji Xia

Keyword(s):

Rna Sequencing ◽

Comprehensive Analysis ◽

Enzymatic Reactions ◽

Computational Tool ◽

Sequencing Data ◽

Analysis Pipeline ◽

Oxford Nanopore ◽

Long Read ◽

Identification And Characterization

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.

Download Full-text

SACall: a neural network basecaller for Oxford Nanopore sequencing data based on self-attention mechanism

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2020.3039244 ◽

2020 ◽

pp. 1-1

Author(s):

Neng Huang ◽

Fan Nie ◽

Peng Ni ◽

Feng Luo ◽

Jianxin Wang

Keyword(s):

Neural Network ◽

Attention Mechanism ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore

Download Full-text

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.761791 ◽

2021 ◽

Vol 12 ◽

Author(s):

Davide Bolognini ◽

Alberto Magi

Keyword(s):

Variant Calling ◽

Research Report ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Factors Affecting ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Sequencing Studies ◽

Long Read

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.

Download Full-text

Detection of Clinically Relevant Molecular Alterations in Chronic Lymphocytic Leukemia (CLL) By Nanopore Sequencing

Blood ◽

10.1182/blood-2018-99-110948 ◽

2018 ◽

Vol 132 (Supplement 1) ◽

pp. 1847-1847 ◽

Cited By ~ 1

Author(s):

Adam Burns ◽

David Robert Bruce ◽

Pauline Robbe ◽

Adele Timbs ◽

Basile Stamatopoulos ◽

...

Keyword(s):

Error Correction ◽

Low Cost ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Mutation Status ◽

Short Read ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Low Coverage ◽

Oxford Nanopore Technologies

Abstract Introduction Chronic Lymphocytic Leukaemia (CLL) is the most prevalent leukaemia in the Western world and characterised by clinical heterogeneity. IgHV mutation status, mutations in the TP53 gene and deletions of the p-arm of chromosome 17 are currently used to predict an individual patient's response to therapy and give an indication as to their long-term prognosis. Current clinical guidelines recommend screening patients prior to initial, and any subsequent, treatment. Routine clinical laboratory practices for CLL involve three separate assays, each of which are time-consuming and require significant investment in equipment. Nanopore sequencing offers a rapid, low-cost alternative, generating a full prognostic dataset on a single platform. In addition, Nanopore sequencing also promises low failure rates on degraded material such as FFPE and excellent detection of structural variants due to long read length of sequencing. Importantly, Nanopore technology does not require expensive equipment, is low-maintenance and ideal for patient-near testing, making it an attractive DNA sequencing device for low-to-middle-income countries. Methods Eleven untreated CLL samples were selected for the analysis, harbouring both mutated (n=5) and unmutated (n=6) IgHV genes, seven TP53 mutations (five missense, one stop gain and one frameshift) and two del(17p) events. Primers were designed to amplify all exons of TP53, along with the IgHV locus, and each primer included universal tails for individual sample barcoding. The resulting PCR amplicons were prepared for sequencing using a ligation sequencing kit (SQK-LSK108, Oxford Nanopore Technologies, Oxford, UK). All IgHV libraries were pooled and sequenced on one R9.4 flowcell, with the TP53 libraries pooled and sequenced on a second R9.4 flowcell. Whole genome libraries were prepared from 400ng genomic DNA for each sample using a rapid sequencing kit (SQK-RAD004, Oxford Nanopore Technologies, Oxford, UK), and each sample sequenced on individual flowcells on a MinION mk1b instrument (Oxford Nanopore Technologies, Oxford, UK). We developed a bespoke bioinformatics pipeline to detect copy-number changes, TP53 mutations and IgHV mutation status from the Nanopore sequencing data. Results were compared to short-read sequencing data obtained earlier by targeted deep sequencing (MiSeq, Illumina Inc, San Diego, CA, USA) and whole genome sequencing (HiSeq 2500, Illumina Inc, San Diego CA, USA). Results Following basecalling and adaptor trimming, the raw data were submitted to the IMGT database. In the absence of error correction, it was possible to identify the correct VH family for each sample; however the germline homology was not sufficient to differentiate between IgHVmut and IgHVunmut CLL cases. Following bio-informatic error correction and consensus building, the percentage to germline homology was the same as that obtained from short-read sequencing and nanopore sequencing also called the same productive rearrangements in all cases. A total of 77 TP53 variants were identified, including 68 in non-coding regions, and three synonymous SNVs. The remaining 6 were predicted to be functional variants (eight missense and two stop-gains) and had all been identified in early MiSeq targeted sequencing. However, the frameshift mutation was not called by the analysis pipeline, although it is present in the aligned reads. Using the low-coverage WGS data, we were able to identify del(17p) events, of 19Mb and 20Mb length, in both patients with high confidence. Conclusions Here we demonstrate that characterization of the IgHV locus in CLL cases is possible using the MinION platform, provided sufficient downstream analysis, including error correction, is applied. Furthermore, somatic SNVs in TP53 can be identified, although similar to second generation sequencing, variant calling of small insertions and deletions is more problematic. Identification of del(17p) is possible from low-coverage WGS on the MinION and is inexpensive. Our data demonstrates that Nanopore sequencing can be a viable, patient-near, low-cost alternative to established screening methods, with the potential of diagnostic implementation in resource-poor regions of the world. Disclosures Schuh: Giles, Roche, Janssen, AbbVie: Honoraria.

Download Full-text

Real-time monitoring and analysis of SARS-CoV-2 nanopore sequencing with minoTour.

10.1101/2021.09.13.459777 ◽

2021 ◽

Author(s):

Rory James Munro ◽

Nadine Holmes ◽

Christopher Moore ◽

Matthew Carlile ◽

Alex Payne ◽

...

Keyword(s):

Real Time ◽

Phylogenetic Trees ◽

Sequencing Data ◽

Time Analysis ◽

Real Time Analysis ◽

Oxford Nanopore ◽

Individual Snps ◽

Wide Range ◽

Time Required ◽

Viral Sequencing

Motivation: The ongoing SARS-CoV-2 pandemic has demonstrated the utility of real-time analysis of sequencing data, with a wide range of databases and resources for analysis now available. Here we show how the real-time nature of Oxford Nanopore Technologies sequencers can accelerate consensus generation, lineage and variant status assignment. We exploit the fact that multiplexed viral sequencing libraries quickly generate sufficient data for the majority of samples, with diminishing returns on remaining samples as the sequencing run progresses. We demonstrate methods to determine when a sequencing run has passed this point in order to reduce the time required and cost of sequencing. Results: We extended MinoTour, our real-time analysis and monitoring platform for nanopore sequencers, to provide SARS-CoV2 analysis using ARTIC network pipelines. We additionally developed an algorithm to predict which samples will achieve sufficient coverage, automatically running the ARTIC medaka informatics pipeline once specific coverage thresholds have been reached on these samples. After testing on run data, we find significant run time savings are possible, enabling flow cells to be used more efficiently and enabling higher throughput data analysis. The resultant consensus genomes are assigned both PANGO lineage and variant status as defined by Public Health England. Samples from within individual runs are used to generate phylogenetic trees incorporating optional background samples as well as summaries of individual SNPs. As minoTour uses ARTIC pipelines, new primer schemes and pathogens can be added to allow minoTour to aid in real-time analysis of pathogens in the future.

Download Full-text

NanoR: a user-friendly R package to analyze and compare nanopore sequencing data

10.1101/514232 ◽

2019 ◽

Author(s):

Davide Bolognini ◽

Niccolò Bartalucci ◽

Alessandra Mingrino ◽

Alessandro Maria Vannucchi ◽

Alberto Magi

Keyword(s):

Real Time ◽

Low Cost ◽

R Package ◽

Sequencing Data ◽

High Performing ◽

Dna And Rna ◽

Oxford Nanopore ◽

The One ◽

User Friendly ◽

Oxford Nanopore Technologies

AbstractMinION and GridION X5 from Oxford Nanopore Technologies are devices for real-time DNA and RNA sequencing. On the one hand, MinION is the only real-time, low cost and portable sequencing device and, thanks to its unique properties, is becoming more and more popular among biologists; on the other, GridION X5, mainly for its costs, is less widespread but highly suitable for researchers with large sequencing projects. Despite the fact that Oxford Nanopore Technologies’ devices have been increasingly used in the last few years, there is a lack of high-performing and user-friendly tools to handle the data outputted by both MinION and GridION X5 platforms. Here we present NanoR, a cross-platform R package designed with the purpose to simplify and improve nanopore data visualization. Indeed, NanoR is built on few functions but overcomes the capabilities of existing tools to extract meaningful informations from MinION sequencing data; in addition, as exclusive features, NanoR can deal with GridION X5 sequencing outputs and allows comparison of both MinION and GridION X5 sequencing data in one command. NanoR is released as free package for R at https://github.com/davidebolo1993/NanoR.

Download Full-text

Nanocall: An Open Source Basecaller for Oxford Nanopore Sequencing Data

10.1101/046086 ◽

2016 ◽

Cited By ~ 10

Author(s):

Matei David ◽

L.J. Dursi ◽

Delia Yao ◽

Paul C. Boutros ◽

Jared T. Simpson

Keyword(s):

Open Source ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Computing Platform ◽

Internet Connection ◽

Oxford Nanopore ◽

Efficient Processing ◽

Cloud Computing Platform ◽

New Applications ◽

Single Dna Molecule

ABSTRACTMotivationThe highly portable Oxford Nanopore MinlON sequencer has enabled new applications of genome sequencing directly in the field. However, the MinlON currently relies on a cloud computing platform, Metrichor (metrichor.com), for translating locally generated sequencing data into basecalls.ResultsTo allow offline and private analysis of MinlON data, we created Nanocall. Nanocall is the first freely-available, open-source basecaller for Oxford Nanopore sequencing data and does not require an internet connection. On two E.coli and two human samples, with natural as well as PCR-amplified DNA, Nanocall reads have ~68% identity, directly comparable to Metrichor ”1D” data. Further, Nanocall is efficient, processing ~500Kbp of sequence per core hour, and fully parallelized. Using 8 cores, Nanocall could basecall a MinlON sequencing run in real time. Metrichor provides the ability to integrate the ”1D” sequencing of template and complement strands of a single DNA molecule, and create a ”2D” read. Nanocall does not currently integrate this technology, and addition of this capability will be an important future development. In summary, Nanocall is the first open-source, freely available, off-line basecaller for Oxford Nanopore sequencing data.AvailabilityNanocall is available at github.com/mateidavid/nanocall, released under the MIT license.Contactmatei.david at oicr.on.ca

Download Full-text

An attention-based neural network basecaller for Oxford Nanopore sequencing data

2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm47256.2019.8983231 ◽

2019 ◽

Author(s):

Neng Huang ◽

Fan Nie ◽

Peng Ni ◽

Feng Luo ◽

Jianxin Wang

Keyword(s):

Neural Network ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore

Download Full-text

BoardION: real-time monitoring of Oxford Nanopore sequencing instruments

BMC Bioinformatics ◽

10.1186/s12859-021-04161-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Aimeric Bruno ◽

Jean-Marc Aury ◽

Stefan Engelen

Keyword(s):

Real Time ◽

Web Application ◽

Data Streaming ◽

Access To Information ◽

Library Preparation ◽

Sequencing Data ◽

Time Data ◽

Interactive Interface ◽

Oxford Nanopore

Abstract Background One of the main advantages of the Oxford Nanopore Technology (ONT) is the possibility of real-time sequencing. This gives access to information during the experiment and allows either to control the sequencing or to stop the sequencing once the results have been obtained. However, the ONT sequencing interface is not sufficient to explore the quality of sequencing data in depth and existing quality control tools do not take full advantage of real-time data streaming. Results Herein, we present BoardION, an interactive web application to analyze the efficiency of ONT sequencing runs. The interactive interface of BoardION allows users to easily explore sequencing metrics and optimize the quantity and the quality of the data generated during the experiment. It also enables the comparison of multiple flowcells to assess library preparation protocols or the quality of input samples. Conclusion BoardION is dedicated to people who manage ONT sequencing instruments and allows them to remotely and in real time monitor their experiments and compare multiple sequencing runs. Source code, a Docker image and a demo version are available at http://www.genoscope.cns.fr/boardion/.

Download Full-text

SquiggleKit: A toolkit for manipulating nanopore signal data

10.1101/549741 ◽

2019 ◽

Cited By ~ 1

Author(s):

James M. Ferguson ◽

Martin A. Smith

Keyword(s):

Signal Processing ◽

Nucleotide Sequence ◽

Signal Analysis ◽

Data Extraction ◽

Nanopore Sequencing ◽

Sequence Motifs ◽

Sequencing Data ◽

Signal Space ◽

Memory Footprint ◽

Cross Platform

SummaryThe management of raw nanopore sequencing data poses a challenge that must be overcome to accelerate the development of new bioinformatics algorithms predicated on signal analysis. SquiggleKit is a toolkit for manipulating and interrogating nanopore data that simplifies file handling, data extraction, visualisation, and signal processing. Its modular tools can be used to reduce file numbers and memory footprint, identify poly-A tails, target barcodes, adapters, and find nucleotide sequence motifs in raw nanopore signal, amongst other applications. SquiggleKit serves as a bioinformatics portal into signal space, for novice and experienced users alike. It is comprehensively documented, simple to use, cross-platform compatible and freely available from (https://github.com/Psy-Fer/SquiggleKit).

Download Full-text