scholarly journals CONNET: Accurate Genome Consensus in Assembling Nanopore Sequencing Data via Deep Learning

iScience ◽  
2020 ◽  
Vol 23 (5) ◽  
pp. 101128 ◽  
Author(s):  
Yifan Zhang ◽  
Chi-Man Liu ◽  
Henry C.M. Leung ◽  
Ruibang Luo ◽  
Tak-Wah Lam
2021 ◽  
Author(s):  
Artem Danilevsky ◽  
Avital Luba Polsky ◽  
Noam Shomron

Abstract Nanopore sequencing is an emerging technology that utilizes a unique method of reading nucleic acid sequences and, at the same time, it detects various chemical modifications. Deep learning has increased in popularity as a useful technique to solve many complex computational tasks. Selective sequencing has been widely used in genomic research; although it introduces several caveats to the process of sequencing, its advantages supersede them. In this study we demonstrate an alternative method of software-based selective sequencing that is performed in real time by combining nanopore sequencing and deep learning. Our results show the feasibility of using deep learning for classifying signals from only the first 200 nucleotides in a raw nanopore sequencing signal format. Using custom deep learning models and a script utilizing "Read-Until" framework to target mitochondrial molecules in real time from a human cell line sample, we achieved a significant separation and enrichment ability of more than 2-fold. In a series of very short sequencing runs (10, 30, and 120 minutes), we identified genomic and mitochondrial reads with accuracy above 90%, although mitochondrial DNA comprises only 0.1% of the total input material. We believe that our results will lay the foundation for rapid and selective sequencing using nanopore technology and will pave the way for future clinical applications using nanopore sequencing data.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ratanond Koonchanok ◽  
Swapna Vidhur Daulatabad ◽  
Quoseena Mir ◽  
Khairi Reda ◽  
Sarath Chandra Janga

Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia.


2018 ◽  
Vol 6 (7) ◽  
Author(s):  
Annette Fagerlund ◽  
Solveig Langsrud ◽  
Birgitte Moen ◽  
Even Heir ◽  
Trond Møretrø

ABSTRACT Listeria monocytogenes is a foodborne pathogen that causes the often-fatal disease listeriosis. We present here the complete genome sequences of six L. monocytogenes isolates of sequence type 9 (ST9) collected from two different meat processing facilities in Norway. The genomes were assembled using Illumina and Nanopore sequencing data.


2020 ◽  
Author(s):  
Timour Baslan ◽  
Sam Kovaka ◽  
Fritz J. Sedlazeck ◽  
Yanming Zhang ◽  
Robert Wappel ◽  
...  

ABSTRACTGenome copy number is an important source of genetic variation in health and disease. In cancer, clinically actionable Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in terms of the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms. This represents a challenge for applications that require high read counts such as CNA inference. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that sequencing short DNA molecules reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for the sequencing of relatively short DNA molecules on nanopore devices with applications in research and medicine, that include but are not limited to, CNAs.


2021 ◽  
Author(s):  
Jiaqi Li ◽  
Lei Wei ◽  
Xianglin Zhang ◽  
Wei Zhang ◽  
Haochen Wang ◽  
...  

ABSTRACTDetecting cancer signals in cell-free DNA (cfDNA) high-throughput sequencing data is emerging as a novel non-invasive cancer detection method. Due to the high cost of sequencing, it is crucial to make robust and precise prediction with low-depth cfDNA sequencing data. Here we propose a novel approach named DISMIR, which can provide ultrasensitive and robust cancer detection by integrating DNA sequence and methylation information in plasma cfDNA whole genome bisulfite sequencing (WGBS) data. DISMIR introduces a new feature termed as “switching region” to define cancer-specific differentially methylated regions, which can enrich the cancer-related signal at read-resolution. DISMIR applies a deep learning model to predict the source of every single read based on its DNA sequence and methylation state, and then predicts the risk that the plasma donor is suffering from cancer. DISMIR exhibited high accuracy and robustness on hepatocellular carcinoma detection by plasma cfDNA WGBS data even at ultra-low sequencing depths. Analysis showed that DISMIR tends to be insensitive to alterations of single CpG sites’ methylation states, which suggests DISMIR could resist to technical noise of WGBS. All these results showed DISMIR with the potential to be a precise and robust method for low-cost early cancer detection.


2021 ◽  
Vol 8 ◽  
Author(s):  
Bo-Wei Han ◽  
Xu Yang ◽  
Shou-Fang Qu ◽  
Zhi-Wei Guo ◽  
Li-Min Huang ◽  
...  

Cell-free DNA (cfDNA) serves as a footprint of the nucleosome occupancy status of transcription start sites (TSSs), and has been subject to wide development for use in noninvasive health monitoring and disease detection. However, the requirement for high sequencing depth limits its clinical use. Here, we introduce a deep-learning pipeline designed for TSS coverage profiles generated from shallow cfDNA sequencing called the Autoencoder of cfDNA TSS (AECT) coverage profile. AECT outperformed existing single-cell sequencing imputation algorithms in terms of improvements to TSS coverage accuracy and the capture of latent biological features that distinguish sex or tumor status. We built classifiers for the detection of breast and rectal cancer using AECT-imputed shallow sequencing data, and their performance was close to that achieved by high-depth sequencing, suggesting that AECT could provide a broadly applicable noninvasive screening approach with high accuracy and at a moderate cost.


2019 ◽  
Vol 35 (22) ◽  
pp. 4586-4595 ◽  
Author(s):  
Peng Ni ◽  
Neng Huang ◽  
Zhi Zhang ◽  
De-Peng Wang ◽  
Fan Liang ◽  
...  

Abstract Motivation The Oxford Nanopore sequencing enables to directly detect methylation states of bases in DNA from reads without extra laboratory techniques. Novel computational methods are required to improve the accuracy and robustness of DNA methylation state prediction using Nanopore reads. Results In this study, we develop DeepSignal, a deep learning method to detect DNA methylation states from Nanopore sequencing reads. Testing on Nanopore reads of Homo sapiens (H. sapiens), Escherichia coli (E. coli) and pUC19 shows that DeepSignal can achieve higher performance at both read level and genome level on detecting 6 mA and 5mC methylation states comparing to previous hidden Markov model (HMM) based methods. DeepSignal achieves similar performance cross different DNA methylation bases, different DNA methylation motifs and both singleton and mixed DNA CpG. Moreover, DeepSignal requires much lower coverage than those required by HMM and statistics based methods. DeepSignal can achieve 90% above accuracy for detecting 5mC and 6 mA using only 2× coverage of reads. Furthermore, for DNA CpG methylation state prediction, DeepSignal achieves 90% correlation with bisulfite sequencing using just 20× coverage of reads, which is much better than HMM based methods. Especially, DeepSignal can predict methylation states of 5% more DNA CpGs that previously cannot be predicted by bisulfite sequencing. DeepSignal can be a robust and accurate method for detecting methylation states of DNA bases. Availability and implementation DeepSignal is publicly available at https://github.com/bioinfomaticsCSU/deepsignal. Supplementary information Supplementary data are available at bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document