scholarly journals The SEQC2 epigenomics quality control (EpiQC) study

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jonathan Foox ◽  
Jessica Nordlund ◽  
Claudia Lalancette ◽  
Ting Gong ◽  
Michelle Lacey ◽  
...  

Abstract Background Cytosine modifications in DNA such as 5-methylcytosine (5mC) underlie a broad range of developmental processes, maintain cellular lineage specification, and can define or stratify types of cancer and other diseases. However, the wide variety of approaches available to interrogate these modifications has created a need for harmonized materials, methods, and rigorous benchmarking to improve genome-wide methylome sequencing applications in clinical and basic research. Here, we present a multi-platform assessment and cross-validated resource for epigenetics research from the FDA’s Epigenomics Quality Control Group. Results Each sample is processed in multiple replicates by three whole-genome bisulfite sequencing (WGBS) protocols (TruSeq DNA methylation, Accel-NGS MethylSeq, and SPLAT), oxidative bisulfite sequencing (TrueMethyl), enzymatic deamination method (EMSeq), targeted methylation sequencing (Illumina Methyl Capture EPIC), single-molecule long-read nanopore sequencing from Oxford Nanopore Technologies, and 850k Illumina methylation arrays. After rigorous quality assessment and comparison to Illumina EPIC methylation microarrays and testing on a range of algorithms (Bismark, BitmapperBS, bwa-meth, and BitMapperBS), we find overall high concordance between assays, but also differences in efficiency of read mapping, CpG capture, coverage, and platform performance, and variable performance across 26 microarray normalization algorithms. Conclusions The data provided herein can guide the use of these DNA reference materials in epigenomics research, as well as provide best practices for experimental design in future studies. By leveraging seven human cell lines that are designated as publicly available reference materials, these data can be used as a baseline to advance epigenomics research.

2021 ◽  
Author(s):  
Iacopo Bicci ◽  
Claudia Calabrese ◽  
Zoe J. Golder ◽  
Aurora Gomez-Duran ◽  
Patrick F Chinnery

SummaryMethylation on CpG residues is one of the most important epigenetic modifications of nuclear DNA, regulating gene expression. Methylation of mitochondrial DNA (mtDNA) has been studied using whole genome bisulfite sequencing (WGBS), but recent evidence has uncovered major technical issues which introduce a potential bias during methylation quantification. Here, we validate the technical concerns with WGBS, and then develop and assess the accuracy of a protocol for variant-specific methylation identification using long-read Oxford Nanopore Sequencing. Our approach circumvents mtDNA-specific confounders, while enriching for native full-length molecules over nuclear DNA. Variant calling analysis against Illumina deep re-sequencing showed that all expected mtDNA variants can be reliably identified. Methylation calling revealed negligible mtDNA methylation levels in multiple human primary and cancer cell lines. In conclusion, our protocol enables the reliable analysis of epigenetic modifications of mtDNA at single-molecule level at single base resolution, with potential applications beyond methylation.MotivationAlthough whole genome bisulfite sequencing (WGBS) is the gold-standard approach to determine base-level CpG methylation in the nuclear genome, emerging technical issues raise questions about its reliability for evaluating mitochondrial DNA (mtDNA) methylation. Concerns include mtDNA strand asymmetry rendering the C-rich light strand disproportionately vulnerable the chemical modifications introduced with WGBS. Also, short-read sequencing can result in a co-amplification of nuclear sequences originating from ancestral mtDNA with a high nucleotide similarity. Lastly, calling mtDNA alleles with varying proportions (heteroplasmy) is complicated by the C-to-T conversion introduced by WGBS on unmethylated CpGs. Here, we propose an alternative protocol to quantify methyl-CpGs in mtDNA, at single-molecule level, using Oxford Nanopore Sequencing (ONS). By optimizing the standard ONS library preparation, we achieved selective enrichment of native mtDNA and accurate single nucleotide variant and CpG methylation calling, thus overcoming previous limitations.


2020 ◽  
Author(s):  
Jonathan Foox ◽  
Jessica Nordlund ◽  
Claudia Lalancette ◽  
Ting Gong ◽  
Michelle Lacey ◽  
...  

AbstractDetection of DNA cytosine modifications such as 5-methylcytosine (5mC) and 5-hydroxy-methylcytosine (5hmC) is essential for understanding the epigenetic changes that guide development, cellular lineage specification, and disease. The wide variety of approaches available to interrogate these modifications has created a need for harmonized materials, methods, and rigorous benchmarking to improve genome-wide methylome sequencing applications in clinical and basic research.We present a multi-platform assessment and a global resource for epigenetics research from the FDA’s Epigenomics Quality Control (EpiQC) Group. The study design leverages seven human cell lines that are publicly available from the National Institute of Standards and Technology (NIST) and Genome in a Bottle (GIAB) consortium. These genomes were subject to a variety of genome-wide methylation interrogation approaches across six independent laboratories. Our primary focus was on cytosine modifications found in mammalian genomes (5mC, 5hmC). Each sample was processed in two or more technical replicates by three whole-genome bisulfite sequencing (WGBS) protocols (TruSeq DNA methylation, Accel-NGS, SPLAT), oxidative bisulfite sequencing (oxBS), Enzymatic Methyl-seq (EM-seq), Illumina EPIC targeted-methylation sequencing, and ATAC-seq. Each library was sequenced to high coverage on an Illumina NovaSeq 6000. The data were subject to rigorous quality assessment and subsequently compared to Illumina EPIC methylation microarrays. We provide a wide range of sequence data for commonly used genomics reference materials, as well as best practices for epigenomics research. These findings can serve as a guide for researchers to enable epigenomic analysis of cellular identity in development, health, and disease.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2019 ◽  
Author(s):  
Dóra Tombácz ◽  
Zsolt Balázs ◽  
Gábor Gulyás ◽  
Zsolt Csabai ◽  
Miklós Boldogkoi ◽  
...  

ABSTRACTLong-read sequencing (LRS) has become increasingly important in RNA research due to its strength in resolving complex transcriptomic architectures. In this regard, currently two LRS platforms have demonstrated adequate performance: the Single Molecule Real-Time Sequencing by Pacific Biosciences (PacBio) and the nanopore sequencing by Oxford Nanopore Technologies (ONT). Even though these techniques produce lower coverage and are more error prone than short-read sequencing, they continue to be more successful in identifying transcript isoforms including polycistronic and multi-spliced RNA molecules, as well as transcript overlaps. Recent reports have successfully applied LRS for the investigation of the transcriptome of viruses belonging to various families. These studies have substantially increased the number of previously known viral RNA molecules. In this work, we used the Sequel and MinION technique from PacBio and ONT, respectively, to characterize the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). In most samples, we analyzed the poly(A) fraction of the transcriptome, but we also performed random oligonucleotide-based sequencing. Besides cDNA sequencing, we also carried out native RNA sequencing. Our investigations identified more than 160 previously undetected transcripts, including coding and non-coding RNAs, multi-splice transcripts, as well as polycistronic and complex transcripts. Furthermore, we determined previously unsubstantiated transcriptional start sites, polyadenylation sites, and splice sites. A large number of novel transcriptional overlaps were also detected. Random-primed sequencing revealed that each convergent gene pair produces non-polyadenylated read-through RNAs overlapping the partner genes. Furthermore, we identified novel replication-associated transcripts overlapping the HSV-1 replication origins, and novel LAT variants with very long 5’ regions, which are co-terminal with the LAT-0.7kb transcript. Overall, our results demonstrated that the HSV-1 transcripts form an extremely complex pattern of overlaps, and that entire viral genome is transcriptionally active. In most viral genes, if not in all, both DNA strands are expressed.


2020 ◽  
Vol 10 (4) ◽  
pp. 1193-1196
Author(s):  
Yoshinori Fukasawa ◽  
Luca Ermini ◽  
Hai Wang ◽  
Karen Carty ◽  
Min-Sin Cheung

We propose LongQC as an easy and automated quality control tool for genomic datasets generated by third generation sequencing (TGS) technologies such as Oxford Nanopore technologies (ONT) and SMRT sequencing from Pacific Bioscience (PacBio). Key statistics were optimized for long read data, and LongQC covers all major TGS platforms. LongQC processes and visualizes those statistics automatically and quickly.


Plant Disease ◽  
2020 ◽  
Vol 104 (4) ◽  
pp. 1011-1012
Author(s):  
Stephen P. Cohen ◽  
Emily K. Luna ◽  
Jillian M. Lang ◽  
Janet Ziegle ◽  
Christine Chang ◽  
...  

The bacterial plant pathogen Xanthomonas hyacinthi is the causal agent of yellow disease of Hyacinthus and other ornamental plant genera. There is no available complete genome for X. hyacinthi, limiting basic research for this pathogen. Here, we release a high-quality complete genome sequence for the X. hyacinthi type strain, CFBP 1156. Single-molecule real-time (SMRT) sequencing with a mean coverage of 306× revealed two contigs of 4,918,645 and 44,381 bp in size. This was the first characterized plant-disease-causing species of Xanthomonas and this genome provides a resource to better understand the biology of yellow disease of hyacinth.


2019 ◽  
Author(s):  
Alejandro R. Gener

ABSTRACTObjective(s)To evaluate nanopore DNA sequencing for sequencing full-length HIV-1 provirus.DesignI used nanopore sequencing to sequence full-length HIV-1 from a plasmid (pHXB2).MethodspHXB2 plasmid was processed with the Rapid PCR-Barcoding library kit and sequenced on the MinION sequencer (Oxford Nanopore Technologies, Oxford., UK). Raw fast5 reads were converted into fastq (base called) with Albacore, Guppy, and FlipFlop base callers. Reads were first aligned to the reference with BWA-MEM to evaluate sample coverage manually. Reads were then assembled with Canu into contigs, and contigs manually finished in SnapGene.ResultsI sequenced full-length HXB2 HIV-1 from 5’ to 3’ LTR (100%), with median per-base coverage of over 9000x in one 12-barcoded experiment on a single MinION flow cell. The longest HIV-spanning read to-date was generated, at a length of 11,487 bases, which included full-length HIV-1 and plasmid backbone on either side. At least 20 variants were discovered in pHXB2 compared to reference.ConclusionsThe MinION sequencer performed as-expected, covering full-length HIV. The discovery of variants in a dogmatic reference plasmid demonstrates the need for single-molecule sequence verification moving forward. These results illustrate the utility of long read sequencing to advance the study of HIV at single integration site resolution.


2018 ◽  
Author(s):  
Alexander Lim ◽  
Bryan Naidenov ◽  
Haley Bates ◽  
Karyn Willyerd ◽  
Timothy Snider ◽  
...  

AbstractDisruptive innovations in long-range, cost-effective direct template nucleic acid sequencing are transforming clinical and diagnostic medicine. A multidrug resistant strain and a pan-susceptible strain ofMannheimia haemolytica, isolated from pneumonic bovine lung samples, were respectively sequenced at 146x and 111x coverage with Oxford Nanopore Technologies MinION.De novoassembly produced a complete genome for the non-resistant strain and a nearly complete assembly for the drug resistant strain. Functional annotation using RAST (Rapid Annotations using Subsystems Technology), CARD (Comprehensive Antibiotic Resistance Database) and ResFinder databases identified genes conferring resistance to different classes of antibiotics including beta lactams, tetracyclines, lincosamides, phenicols, aminoglycosides, sulfonamides and macrolides. Antibiotic resistance phenotypes of theM. haemolyticastrains were confirmed with minimum inhibitory concentration (MIC) assays. The sequencing capacity of highly portable MinION devices was verified by sub-sampling sequencing reads; potential for antimicrobial resistance determined by identification of resistance genes in the draft assemblies with as little as 5,437 MinION reads corresponded to all classes of MIC assays. The resulting quality assemblies and AMR gene annotation highlight efficiency of ultra long-read, whole-genome sequencing (WGS) as a valuable tool in diagnostic veterinary medicine.


2019 ◽  
Author(s):  
Søren M. Karst ◽  
Ryan M. Ziels ◽  
Rasmus H. Kirkegaard ◽  
Emil A. Sørensen ◽  
Daniel McDonald ◽  
...  

AbstractHigh-throughput amplicon sequencing of large genomic regions remains challenging for short-read technologies. Here, we report a high-throughput amplicon sequencing approach combining unique molecular identifiers (UMIs) with Oxford Nanopore Technologies or Pacific Biosciences CCS sequencing, yielding high accuracy single-molecule consensus sequences of large genomic regions. Our approach generates amplicon and genomic sequences of >10,000 bp in length with a mean error-rate of 0.0049-0.0006% and chimera rate <0.022%.


Sign in / Sign up

Export Citation Format

Share Document