scholarly journals NanoSplicer: Accurate identification of splice junctions using Oxford Nanopore sequencing

2021 ◽  
Author(s):  
Yupei You ◽  
Michael B. Clark ◽  
Heejung Shim

Motivation: Long read sequencing methods have considerable advantages for characterising RNA isoforms. Oxford nanopore sequencing records changes in electrical current when nucleic acid traverses through a pore. However, basecalling of this raw signal (known as a squiggle) is error prone, making it challenging to accurately identify splice junctions. Existing strategies include utilising matched short-read data and/or annotated splice junctions to correct nanopore reads but add expense or limit junctions to known (incomplete) annotations. Therefore, a method that could accurately identify splice junctions solely from nanopore data would have numerous advantages. Results: We developed "NanoSplicer" to identify splice junctions using raw nanopore signal (squiggles). For each splice junction the observed squiggle is compared to candidate squiggles representing potential junctions to identify the correct candidate. Measuring squiggle similarity enables us to compute the probability of each candidate junction and find the most likely one. We tested our method using 1. synthetic mRNAs with known splice junctions 2. biological mRNAs from a lung-cancer cell-line. The results from both datasets demonstrate NanoSplicer improves splice junction identification, especially when the basecalling error rate near the splice junction is elevated. Our method is implemented in the software package NanoSplicer, available at https://github.com/shimlab/NanoSplicer.

2017 ◽  
Vol 2 ◽  
pp. 23 ◽  
Author(s):  
Jean-Michel Carter ◽  
Shobbir Hussain

Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications.


2018 ◽  
Vol 2 ◽  
pp. 23 ◽  
Author(s):  
Jean-Michel Carter ◽  
Shobbir Hussain

Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications with relevance to complex genomes.


2017 ◽  
Vol 2 ◽  
pp. 23 ◽  
Author(s):  
Jean-Michel Carter ◽  
Shobbir Hussain

Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications with relevance to complex genomes.


2011 ◽  
Vol 31 (10) ◽  
pp. 1091-1095
Author(s):  
Xiao-lin LI ◽  
Yan-fang ZHANG ◽  
Kai TANG ◽  
Ying TANG ◽  
Ruo-bing JIN ◽  
...  

2013 ◽  
Vol 13 (3) ◽  
pp. 414-421 ◽  
Author(s):  
Raquel T. Lima ◽  
Gemma A. Barron ◽  
Joanna A. Grabowska ◽  
Giovanna Bermano ◽  
Simranjeet Kaur ◽  
...  

2021 ◽  
Vol 27 ◽  
pp. 101089
Author(s):  
Kazuo Ohara ◽  
Shintaro Kinoshita ◽  
Jun Ando ◽  
Yoko Azusawa ◽  
Midori Ishii ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document