scholarly journals DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning

2018 ◽  
Author(s):  
Peng Ni ◽  
Neng Huang ◽  
Feng Luo ◽  
Jianxin Wang

AbstractThe Oxford Nanopore sequencing enables to directly detect methylation sites in DNA from reads without extra laboratory techniques. In this study, we develop DeepSignal, a deep learning method to detect DNA methylated sites from Nanopore sequencing reads. DeepSignal construct features from both raw electrical signals and signal sequences in Nanopore reads. Testing on Nanopore reads of pUC19, E. coli and human, we show that DeepSignal can achieve both higher read level and genome level accuracy on detecting 6mA and 5mC methylation comparing to previous HMM based methods. Moreover, DeepSignal achieves similar performance cross different methylation bases and different methylation motifs. Furthermore, DeepSignal can detect 5mC and 6mA methylation states of genome sites with above 90% genome level accuracy under just 5X coverage using controlled methylation data.

2019 ◽  
Vol 35 (22) ◽  
pp. 4586-4595 ◽  
Author(s):  
Peng Ni ◽  
Neng Huang ◽  
Zhi Zhang ◽  
De-Peng Wang ◽  
Fan Liang ◽  
...  

Abstract Motivation The Oxford Nanopore sequencing enables to directly detect methylation states of bases in DNA from reads without extra laboratory techniques. Novel computational methods are required to improve the accuracy and robustness of DNA methylation state prediction using Nanopore reads. Results In this study, we develop DeepSignal, a deep learning method to detect DNA methylation states from Nanopore sequencing reads. Testing on Nanopore reads of Homo sapiens (H. sapiens), Escherichia coli (E. coli) and pUC19 shows that DeepSignal can achieve higher performance at both read level and genome level on detecting 6 mA and 5mC methylation states comparing to previous hidden Markov model (HMM) based methods. DeepSignal achieves similar performance cross different DNA methylation bases, different DNA methylation motifs and both singleton and mixed DNA CpG. Moreover, DeepSignal requires much lower coverage than those required by HMM and statistics based methods. DeepSignal can achieve 90% above accuracy for detecting 5mC and 6 mA using only 2× coverage of reads. Furthermore, for DNA CpG methylation state prediction, DeepSignal achieves 90% correlation with bisulfite sequencing using just 20× coverage of reads, which is much better than HMM based methods. Especially, DeepSignal can predict methylation states of 5% more DNA CpGs that previously cannot be predicted by bisulfite sequencing. DeepSignal can be a robust and accurate method for detecting methylation states of DNA bases. Availability and implementation DeepSignal is publicly available at https://github.com/bioinfomaticsCSU/deepsignal. Supplementary information Supplementary data are available at bioinformatics online.


2021 ◽  
Author(s):  
Sara Gombert ◽  
Kirsten Jahn ◽  
Hansi Pathak ◽  
Alexandra Burkert ◽  
Gunnar Schmidt ◽  
...  

Bisulfite sequencing has long been considered the gold standard for measurement of DNA methylation at single CpG resolution. In the meantime, several new approaches have been developed, which are regarded as less error-prone. Since these errors were shown to be sequence-specific, we aimed to verify the methylation data of a particular region of the TRPA1 promoter obtained from our previous studies. For this purpose, we compared methylation rates obtained via direct bisulfite sequencing and nanopore sequencing. Thus, we were able to confirm our previous findings to a large extent.


2019 ◽  
Vol 20 (S18) ◽  
Author(s):  
Zhenxing Wang ◽  
Yadong Wang

Abstract Background Lung cancer is one of the most malignant tumors, causing over 1,000,000 deaths each year worldwide. Deep learning has brought success in many domains in recent years. DNA methylation, an epigenetic factor, is used for model training in many studies. There is an opportunity for deep learning methods to analyze the lung cancer epigenetic data to determine their subtypes for appropriate treatment. Results Here, we employ variational autoencoders (VAEs), an unsupervised deep learning framework, on 450K DNA methylation data of TCGA-LUAD and TCGA-LUSC to learn latent representations of the DNA methylation landscape. We extract a biologically relevant latent space of LUAD and LUSC samples. It is showed that the bivariate classifiers on the further compressed latent features could classify the subtypes accurately. Through clustering of methylation-based latent space features, we demonstrate that the VAEs can capture differential methylation patterns about subtypes of lung cancer. Conclusions VAEs can distinguish the original subtypes from manually mixed methylation data frame with the encoded features of latent space. Further applications about VAEs should focus on fine-grained subtypes identification for precision medicine.


Author(s):  
Luotong Wang ◽  
Li Qu ◽  
Longshu Yang ◽  
Yiying Wang ◽  
Huaiqiu Zhu

AbstractNanopore sequencing is regarded as one of the most promising third-generation sequencing (TGS) technologies. Since 2014, Oxford Nanopore Technologies (ONT) has developed a series of devices based on nanopore sequencing to produce very long reads, with an expected impact on genomics. However, the nanopore sequencing reads are susceptible to a fairly high error rate owing to the difficulty in identifying the DNA bases from the complex electrical signals. Although several basecalling tools have been developed for nanopore sequencing over the past years, it is still challenging to correct the sequences after applying the basecalling procedure. In this study, we developed an open-source DNA basecalling reviser, NanoReviser, based on a deep learning algorithm to correct the basecalling errors introduced by current basecallers provided by default. In our module, we re-segmented the raw electrical signals based on the basecalled sequences provided by the default basecallers. By employing convolution neural networks (CNNs) and bidirectional long short-term memory (Bi-LSTM) networks, we took advantage of the information from the raw electrical signals and the basecalled sequences from the basecallers. Our results showed NanoReviser, as a post-basecalling reviser, significantly improving the basecalling quality. After being trained on standard ONT sequencing reads from public E. coli and human NA12878 datasets, NanoReviser reduced the sequencing error rate by over 5% for both the E. coli dataset and the human dataset. The performance of NanoReviser was found to be better than those of all current basecalling tools. Furthermore, we analyzed the modified bases of the E. coli dataset and added the methylation information to train our module. With the methylation annotation, NanoReviser reduced the error rate by 7% for the E. coli dataset and specifically reduced the error rate by over 10% for the regions of the sequence rich in methylated bases. To the best of our knowledge, NanoReviser is the first post-processing tool after basecalling to accurately correct the nanopore sequences without the time-consuming procedure of building the consensus sequence. The NanoReviser package is freely available at https://github.com/pkubioinformatics/NanoReviser.


Genes ◽  
2019 ◽  
Vol 10 (10) ◽  
pp. 778 ◽  
Author(s):  
Liu ◽  
Liu ◽  
Pan ◽  
Li ◽  
Yang ◽  
...  

For cancer diagnosis, many DNA methylation markers have been identified. However, few studies have tried to identify DNA methylation markers to diagnose diverse cancer types simultaneously, i.e., pan-cancers. In this study, we tried to identify DNA methylation markers to differentiate cancer samples from the respective normal samples in pan-cancers. We collected whole genome methylation data of 27 cancer types containing 10,140 cancer samples and 3386 normal samples, and divided all samples into five data sets, including one training data set, one validation data set and three test data sets. We applied machine learning to identify DNA methylation markers, and specifically, we constructed diagnostic prediction models by deep learning. We identified two categories of markers: 12 CpG markers and 13 promoter markers. Three of 12 CpG markers and four of 13 promoter markers locate at cancer-related genes. With the CpG markers, our model achieved an average sensitivity and specificity on test data sets as 92.8% and 90.1%, respectively. For promoter markers, the average sensitivity and specificity on test data sets were 89.8% and 81.1%, respectively. Furthermore, in cell-free DNA methylation data of 163 prostate cancer samples, the CpG markers achieved the sensitivity as 100%, and the promoter markers achieved 92%. For both marker types, the specificity of normal whole blood was 100%. To conclude, we identified methylation markers to diagnose pan-cancers, which might be applied to liquid biopsy of cancers.


2021 ◽  
Author(s):  
Shruta Sandesh Pai ◽  
Aimee Rachel Mathew ◽  
Roy Anindya

AbstractRecent development of Oxford Nanopore long-read sequencing has opened new avenues of identifying epigenetic DNA methylation. Among the different epigenetic DNA methylations, N6-methyladenosine is the most prevalent DNA modification in prokaryotes and 5-methylcytosine is common in higher eukaryotes. Here we investigated if N6-methyladenosine and 5-methylcytosine modifications could be predicted from the nanopore sequencing data. Using publicly available genome sequencing data of Saccharomyces cerevisiae, we compared the open-access computational tools, including Tombo, mCaller, Nanopolish and DeepSignal for predicting 6mA and 5mC. Our results suggest that Tombo and mCaller can predict DNA N6-methyladenosine modifications at a specific location, whereas, Tombo dampened fraction, Nanopolish methylation likelihood and DeepSignal methylation probability have comparable efficiency for 5-methylcytosine prediction from Oxford Nanopore sequencing data.


2020 ◽  
Author(s):  
Kaushik Panda ◽  
R. Keith Slotkin

AbstractHigh-quality transcript-based annotations of genes facilitates both genome-wide analyses and detailed single locus research. In contrast, transposable element (TE) annotations are rudimentary, consisting of only information on location and type of TE. When analyzing TEs, their repetitiveness and limited annotation prevents the ability to distinguish between potentially functional expressed elements and degraded copies. To improve genome-wide TE bioinformatics, we performed long-read Oxford Nanopore sequencing of cDNAs from Arabidopsis lines deficient in multiple layers of TE repression. We used these uniquely-mapping transcripts to identify the set of TEs able to generate mRNAs, and created a new transcript-based annotation of TEs that we have layered upon the existing high-quality community standard TAIR10 annotation. The improved annotation enables us to test specific standing hypotheses in the TE field. We demonstrate that inefficient TE splicing does not trigger small RNA production, and the cell more strongly targets DNA methylation to TEs that have the potential to make mRNAs. This work provides a transcript-based TE annotation for Arabidopsis, and serves as a blueprint to reduce the genomic complexity associated with repetitive TEs in any organism.


Author(s):  
Rocío del Amor ◽  
Adrián Colomer ◽  
Carlos Monteagudo ◽  
Valery Naranjo

AbstractEpigenetic alterations have an important role in the development of several types of cancer. Epigenetic studies generate a large amount of data, which makes it essential to develop novel models capable of dealing with large-scale data. In this work, we propose a deep embedded refined clustering method for breast cancer differentiation based on DNA methylation. In concrete, the deep learning system presented here uses the levels of CpG island methylation between 0 and 1. The proposed approach is composed of two main stages. The first stage consists in the dimensionality reduction of the methylation data based on an autoencoder. The second stage is a clustering algorithm based on the soft assignment of the latent space provided by the autoencoder. The whole method is optimized through a weighted loss function composed of two terms: reconstruction and classification terms. To the best of the authors’ knowledge, no previous studies have focused on the dimensionality reduction algorithms linked to classification trained end-to-end for DNA methylation analysis. The proposed method achieves an unsupervised clustering accuracy of 0.9927 and an error rate (%) of 0.73 on 137 breast tissue samples. After a second test of the deep-learning-based method using a different methylation database, an accuracy of 0.9343 and an error rate (%) of 6.57 on 45 breast tissue samples are obtained. Based on these results, the proposed algorithm outperforms other state-of-the-art methods evaluated under the same conditions for breast cancer classification based on DNA methylation data.


2021 ◽  
Author(s):  
Niki Thomas ◽  
Vinay Poodari ◽  
Miten Jain ◽  
Hugh Olsen ◽  
Mark Akeson ◽  
...  

We describe a method for direct tRNA sequencing using the Oxford Nanopore MinION. The principal technical advance is custom adapters that facilitate end-to-end sequencing of individual tRNA molecules. A second advance is a Nanopore sequencing pipeline optimized for tRNA. We tested this method using purified E. coli tRNAfMet, tRNALys, and tRNAPhe samples. 76-to-92% of individual aligned tRNA sequence reads were full length. As proof of concept, we showed that Nanopore sequencing detected all 42 expected tRNA isoacceptors in total E. coli tRNA. Alignment-based comparison between the three purified tRNAs and their synthetic controls revealed systematic miscalls at or adjacent to the positions of known nucleotide modifications. Systematic miscalls were also observed proximal to known modifications in total E. coli tRNA alignments.


Sign in / Sign up

Export Citation Format

Share Document