scholarly journals A comparative analysis of computational tools for the prediction of epigenetic DNA methylation from long-read sequencing data

2021 ◽  
Author(s):  
Shruta Sandesh Pai ◽  
Aimee Rachel Mathew ◽  
Roy Anindya

AbstractRecent development of Oxford Nanopore long-read sequencing has opened new avenues of identifying epigenetic DNA methylation. Among the different epigenetic DNA methylations, N6-methyladenosine is the most prevalent DNA modification in prokaryotes and 5-methylcytosine is common in higher eukaryotes. Here we investigated if N6-methyladenosine and 5-methylcytosine modifications could be predicted from the nanopore sequencing data. Using publicly available genome sequencing data of Saccharomyces cerevisiae, we compared the open-access computational tools, including Tombo, mCaller, Nanopolish and DeepSignal for predicting 6mA and 5mC. Our results suggest that Tombo and mCaller can predict DNA N6-methyladenosine modifications at a specific location, whereas, Tombo dampened fraction, Nanopolish methylation likelihood and DeepSignal methylation probability have comparable efficiency for 5-methylcytosine prediction from Oxford Nanopore sequencing data.

2021 ◽  
Vol 12 ◽  
Author(s):  
Davide Bolognini ◽  
Alberto Magi

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.


2017 ◽  
Vol 2 ◽  
pp. 23 ◽  
Author(s):  
Jean-Michel Carter ◽  
Shobbir Hussain

Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications.


2020 ◽  
Author(s):  
Kaushik Panda ◽  
R. Keith Slotkin

AbstractHigh-quality transcript-based annotations of genes facilitates both genome-wide analyses and detailed single locus research. In contrast, transposable element (TE) annotations are rudimentary, consisting of only information on location and type of TE. When analyzing TEs, their repetitiveness and limited annotation prevents the ability to distinguish between potentially functional expressed elements and degraded copies. To improve genome-wide TE bioinformatics, we performed long-read Oxford Nanopore sequencing of cDNAs from Arabidopsis lines deficient in multiple layers of TE repression. We used these uniquely-mapping transcripts to identify the set of TEs able to generate mRNAs, and created a new transcript-based annotation of TEs that we have layered upon the existing high-quality community standard TAIR10 annotation. The improved annotation enables us to test specific standing hypotheses in the TE field. We demonstrate that inefficient TE splicing does not trigger small RNA production, and the cell more strongly targets DNA methylation to TEs that have the potential to make mRNAs. This work provides a transcript-based TE annotation for Arabidopsis, and serves as a blueprint to reduce the genomic complexity associated with repetitive TEs in any organism.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yang Liu ◽  
Wojciech Rosikiewicz ◽  
Ziwei Pan ◽  
Nathaniel Jillette ◽  
Ping Wang ◽  
...  

Abstract Background Nanopore long-read sequencing technology greatly expands the capacity of long-range, single-molecule DNA-modification detection. A growing number of analytical tools have been developed to detect DNA methylation from nanopore sequencing reads. Here, we assess the performance of different methylation-calling tools to provide a systematic evaluation to guide researchers performing human epigenome-wide studies. Results We compare seven analytic tools for detecting DNA methylation from nanopore long-read sequencing data generated from human natural DNA at a whole-genome scale. We evaluate the per-read and per-site performance of CpG methylation prediction across different genomic contexts, CpG site coverage, and computational resources consumed by each tool. The seven tools exhibit different performances across the evaluation criteria. We show that the methylation prediction at regions with discordant DNA methylation patterns, intergenic regions, low CG density regions, and repetitive regions show room for improvement across all tools. Furthermore, we demonstrate that 5hmC levels at least partly contribute to the discrepancy between bisulfite and nanopore sequencing. Lastly, we provide an online DNA methylation database (https://nanome.jax.org) to display the DNA methylation levels detected by nanopore sequencing and bisulfite sequencing data across different genomic contexts. Conclusions Our study is the first systematic benchmark of computational methods for detection of mammalian whole-genome DNA modifications in nanopore sequencing. We provide a broad foundation for cross-platform standardization and an evaluation of analytical tools designed for genome-scale modified base detection using nanopore sequencing.


2018 ◽  
Vol 2 ◽  
pp. 23 ◽  
Author(s):  
Jean-Michel Carter ◽  
Shobbir Hussain

Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications with relevance to complex genomes.


2017 ◽  
Vol 2 ◽  
pp. 23 ◽  
Author(s):  
Jean-Michel Carter ◽  
Shobbir Hussain

Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications with relevance to complex genomes.


2020 ◽  
Vol 22 (Supplement_3) ◽  
pp. iii406-iii406
Author(s):  
Julien Masliah-Planchon ◽  
Elodie Girard ◽  
Philipp Euskirchen ◽  
Christine Bourneix ◽  
Delphine Lequin ◽  
...  

Abstract Medulloblastoma (MB) can be classified into four molecular subgroups (WNT group, SHH group, group 3, and group 4). The gold standard of assignment of molecular subgroup through DNA methylation profiling uses Illumina EPIC array. However, this tool has some limitation in terms of cost and timing, in order to get the results soon enough for clinical use. We present an alternative DNA methylation assay based on nanopore sequencing efficient for rapid, cheaper, and reliable subgrouping of clinical MB samples. Low-depth whole genome with long-read single-molecule nanopore sequencing was used to simultaneously assess copy number profile and MB subgrouping based on DNA methylation. The DNA methylation data generated by Nanopore sequencing were compared to a publicly available reference cohort comprising over 2,800 brain tumors including the four subgroups of MB (Capper et al. Nature; 2018) to generate a score that estimates a confidence with a tumor group assignment. Among the 24 MB analyzed with nanopore sequencing (six WNT, nine SHH, five group 3, and four group 4), all of them were classified in the appropriate subgroup established by expression-based Nanostring subgrouping. In addition to the subgrouping, we also examine the genomic profile. Furthermore, all previously identified clinically relevant genomic rearrangements (mostly MYC and MYCN amplifications) were also detected with our assay. In conclusion, we are confirming the full reliability of nanopore sequencing as a novel rapid and cheap assay for methylation-based MB subgrouping. We now plan to implement this technology to other embryonal tumors of the central nervous system.


Author(s):  
Karlijn Doorenspleet ◽  
Lara Jansen ◽  
Saskia Oosterbroek ◽  
Oscar Bos ◽  
Pauline Kamermans ◽  
...  

To monitor the effect of nature restoration projects in North Sea ecosystems, accurate and intensive biodiversity assessments are vital. DNA based techniques and especially environmental DNA (eDNA) metabarcoding from seawater is becoming a powerful monitoring tool. However, current approaches are based on genetic target regions of <500 nucleotides, which offer limited taxonomic resolution. This study aims to develop and validate a long read nanopore sequencing method for eDNA that enables improved identification of fish species. We designed a universal primer pair targeting a 2kb region covering the 12S and 16S rRNA genes of fish mitochondria. eDNA was amplified and sequenced using the Oxford Nanopore MiniON. Sequence data was processed using the new pipeline Decona, and accurate consensus identities of above 99.9% were retrieved. The primer set efficiency was tested with eDNA from a 3.000.000 L zoo aquarium with 31 species of bony fish and elasmobranchs. Over 55% of the species present were identified on species level and over 75% on genus level. Next, our long read eDNA metabarcoding approach was applied to North Sea eDNA field samples collected at ship wreck sites, the Gemini Offshore Wind Farm, the Borkum Reef Grounds and a bare sand bottom. Here, location specific fish and vertebrate communities were obtained. Incomplete reference databases still form a major bottleneck in further developing high resolution long read metabarcoding. Yet, the method has great potential for rapid and accurate fish species monitoring in marine field studies.


Author(s):  
Huan Zhong ◽  
Zongwei Cai ◽  
Zhu Yang ◽  
Yiji Xia

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.


2018 ◽  
Vol 5 (suppl_1) ◽  
pp. S364-S364
Author(s):  
Roby Bhattacharyya ◽  
Alejandro Pironti ◽  
Bruce J Walker ◽  
Abigail Manson ◽  
Virginia Pierce ◽  
...  

Abstract Background Carbapenem-resistant Enterobacteriaceae (CRE) are a major public health threat. We report four clonally related Citrobacter freundii isolates harboring the blaKPC-3 carbapenemase in April–May 2017 that are nearly identical to a strain from 2014 at the same institution. Despite differing by ≤5 single nucleotide polymorphisms (SNPs), these isolates exhibited dramatic differences in carbapenemase plasmid architecture. Methods We sequenced four carbapenem-resistant C. freundii isolates from 2017 and compared them with an ongoing CRE surveillance project at our institution. SNPs were identified from Illumina MiSeq data aligned to a reference genome using the variant caller Pilon. Plasmids were assembled from Illumina and Oxford Nanopore sequencing data using Unicycler. Results The four 2017 isolates differed from one another by 0–5 chromosomal SNPs; two were identical. With one exception, these isolates differed by &gt;38,000 SNPs from 25 C. freundii isolates sequenced from 2013 to 2017 at the same institution for CRE surveillance. The exception was a 2014 isolate that differed by 13–16 SNPs from each 2017 isolate, with 13 SNPs common to all four. Each C. freundii isolate harbored wild-type blaKPC-3. Despite the close relationship among the 2017 cluster, the plasmids harboring the blaKPC-3 genes differed dramatically: the carbapenemase occurred in one of the two different plasmids, with rearrangements between these plasmids across isolates. The related 2014 isolate harbored both plasmids, each with a separate copy of blaKPC-3. No transmission chains were found between any of the affected patients. Conclusion WGS confirmed clonality among four contemporaneous blaKPC-3-containing C. freundii isolates, and marked similarity with a 2014 isolate, within an institution. That only 13–16 SNPs varied between the 2014 and 2017 isolates suggests durable persistence of the blaKPC-3 gene within this lineage in a hospital ecosystem. The plasmids harboring these carbapenemase genes proved remarkably plastic, with plasmid loss and rearrangements occurring on the same time scale as two to three chromosomal point mutations. Combining short and long-read sequencing in a case cluster uniquely revealed unexpectedly rapid dynamics of carbapenemase plasmids, providing critical insight into their manner of spread. Disclosures M. J. Ferraro, SeLux Diagnostics: Scientific Advisor and Shareholder, Consulting fee. D. C. Hooper, SeLux Diagnostics: Scientific Advisor, Consulting fee.


Sign in / Sign up

Export Citation Format

Share Document