Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres

Nanopore long-read genome sequencing is emerging as a potential approach for the study of genomes including long repetitive elements like telomeres. Here, we report extensive basecalling induced errors at telomere repeats across nanopore datasets, sequencing platforms, basecallers, and basecalling models. We found that telomeres which are represented by (TTAGGG)n and (CCCTAA)n repeats in many organisms were frequently miscalled (~40-50% of reads) as (TTAAAA)n, or as (CTTCTT)n and (CCCTGG)n repeats respectively in a strand-specific manner during nanopore sequencing. We showed that this miscalling is likely caused by the high similarity of current profiles between telomeric repeats and these repeat artefacts, leading to mis-assignment of electrical current profiles during basecalling. We further demonstrated that tuning of nanopore basecalling models, and selective application of the tuned models to telomeric reads led to improved recovery and analysis of telomeric regions, with little detected negative impact on basecalling of other genomic regions. Our study thus highlights the importance of verifying nanopore basecalls in long, repetitive, and poorly defined regions of the genome, and showcases how such artefacts in regions like telomeres can potentially be resolved by improvements in nanopore basecalling models.

Download Full-text

How Long Are Long Tandem Repeats? A Challenge for Current Methods of Whole-Genome Sequence Assembly: The Case of Satellites in Caenorhabditis elegans

Genes ◽

10.3390/genes9100500 ◽

2018 ◽

Vol 9 (10) ◽

pp. 500

Author(s):

Juan A. Subirana ◽

Xavier Messeguer

Keyword(s):

Caenorhabditis Elegans ◽

Sanger Sequencing ◽

Tandem Repeats ◽

Whole Genome Sequence ◽

Nanopore Sequencing ◽

Original Sequence ◽

Genome Sequence Assembly ◽

Long Read ◽

Genomic Regions ◽

Caenorhabditis Elegans Genome

Repetitive genome regions have been difficult to sequence, mainly because of the comparatively small size of the fragments used in assembly. Satellites or tandem repeats are very abundant in nematodes and offer an excellent playground to evaluate different assembly methods. Here, we compare the structure of satellites found in three different assemblies of the Caenorhabditis elegans genome: the original sequence obtained by Sanger sequencing, an assembly based on PacBio technology, and an assembly using Nanopore sequencing reads. In general, satellites were found in equivalent genomic regions, but the new long-read methods (PacBio and Nanopore) tended to result in longer assembled satellites. Important differences exist between the assemblies resulting from the two long-read technologies, such as the sizes of long satellites. Our results also suggest that the lengths of some annotated genes with internal repeats which were assembled using Sanger sequencing are likely to be incorrect.

Download Full-text

Nanopore sequencing provides rapid and reliable insight into microbial profiles of Intensive Care Units

10.1101/2021.05.14.444165 ◽

2021 ◽

Author(s):

Guilherme Marcelino Viana de Siqueira ◽

Felipe Marcelo Pereira-dos-Santos ◽

Rafael Silva-Rocha ◽

Maria-Eugenia Guazzaroni

Keyword(s):

Intensive Care ◽

Intensive Care Units ◽

Nanopore Sequencing ◽

Accurate Identification ◽

Complex Samples ◽

Healthcare Settings ◽

Long Read ◽

Single Use ◽

Sequencing Platforms ◽

Insight Into

Fast and accurate identification of pathogens is an essential task in healthcare settings. Next generation sequencing platforms such as Illumina have greatly expanded the capacity with which different organisms can be detected in hospital samples, and third-generation nanopore-driven sequencing devices such as Oxford Nanopore's minION have recently emerged as ideal sequencing platforms for routine healthcare surveillance due to their long-read capacity and high portability. Despite its great potential, protocols and analysis pipelines for nanopore sequencing are still being extensively validated. In this work, we assess the ability of nanopore sequencing to provide reliable community profiles based on 16S rRNA sequencing in comparison to traditional Illumina platforms using samples collected from Intensive Care Units from a hospital in Brazil. While our results point that lower throughputs may be a shortcoming of the method in more complex samples, we show that the use of single-use Flongle flowcells in nanopore sequencing runs can provide insightful information on the community composition in healthcare settings.

Download Full-text

Long-Read Sequencing of the Zebrafish Genome Reorganizes Genomic Architecture

10.1101/2021.08.27.457855 ◽

2021 ◽

Author(s):

Yelena Chernyavskaya ◽

Xiaofei Zhang ◽

Jinze Liu ◽

Jessica S. Blackburn

Keyword(s):

Low Complexity ◽

Zebrafish Genome ◽

Nanopore Sequencing ◽

Sequencing Technology ◽

Short Read ◽

Short Read Sequencing ◽

Genomic Landscape ◽

Long Reads ◽

Long Read ◽

Sequencing Platforms

Nanopore sequencing technology has revolutionized the field of genome biology with its ability to generate extra-long reads that can resolve regions of the genome that were previously inaccessible to short-read sequencing platforms. Although long-read sequencing has been used to resolve several vertebrate genomes, a nanopore-based zebrafish assembly has not yet been released. Over 50% of the zebrafish genome consists of difficult to map, highly repetitive, low complexity elements that pose inherent problems for short-read sequencers and assemblers. We used nanopore sequencing to improve upon and resolve the issues plaguing the current zebrafish reference assembly (GRCz11). Our long-read assembly improved the current resolution of the reference genome by identifying 1,697 novel insertions and deletions over 1Kb in length and placing 106 previously unlocalized scaffolds. We also discovered additional sites of retrotransposon integration previously unreported in GRCz11 and observed their expression in adult zebrafish under physiologic conditions, implying they have active mobility in the zebrafish genome and contribute to the ever-changing genomic landscape.

Download Full-text

Cas9 targeted enrichment of mobile elements using nanopore sequencing

10.1101/2021.02.10.430605 ◽

2021 ◽

Author(s):

Torrin L. McDonald ◽

Weichen Zhou ◽

Christopher Castro ◽

Camille Mumm ◽

Jessica A. Switzenberg ◽

...

Keyword(s):

Genetic Disorders ◽

Mobile Element ◽

Flow Cell ◽

Read Length ◽

Nanopore Sequencing ◽

Short Read Sequencing ◽

Human Genomes ◽

Long Read ◽

Genomic Regions ◽

Targeted Enrichment

AbstractMobile element insertions (MEIs) are highly repetitive genomic sequences that contribute to inter- and intra-individual genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9 targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on targeted signals. We show an individual flow cell can recover a remarkable fraction of MEIs (97% L1Hs, 93% AluYb, 51% AluYa, 99% SVA_F, and 65% SVA_E). We identify twenty-one non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements.

Download Full-text

Accurate profiling of forensic autosomal STRs using the Oxford Nanopore Technologies MinION device

10.1101/2021.07.01.450747 ◽

2021 ◽

Author(s):

Courtney L. Hall ◽

Rupesh K. Kesharwani ◽

Nicole R. Phillips ◽

John V. Planz ◽

Fritz J. Sedlazeck ◽

...

Keyword(s):

Specific Method ◽

Nanopore Sequencing ◽

Autosomal Strs ◽

Str Typing ◽

Str Loci ◽

Oxford Nanopore ◽

Long Read ◽

Flanking Region ◽

Sequencing Platforms ◽

Oxford Nanopore Technologies

The high variability characteristic of short tandem repeat (STR) markers is harnessed for human identification in forensic genetic analyses. Despite the power and reliability of current typing techniques, sequence-level information both within and around STRs are masked in the length-based profiles generated. Forensic STR typing using next generation sequencing (NGS) has therefore gained attention as an alternative to traditional capillary electrophoresis (CE) approaches. In this proof-of-principle study, we evaluate the forensic applicability of the newest and smallest NGS platform available — the Oxford Nanopore Technologies (ONT) MinION device. Although nanopore sequencing on the handheld MinION offers numerous advantages, including on-site sample processing, the relatively high error rate and lack of forensic-specific analysis software has prevented accurate profiling across STR panels in previous studies. Here we present STRspy, a streamlined method capable of producing length- and sequence-based STR allele designations from noisy, long-read data. To demonstrate the capabilities of STRspy, seven reference samples (female: n = 2; male: n = 5) were amplified at 15 and 30 PCR cycles using the Promega PowerSeq 46GY System and sequenced on the ONT MinION device in triplicate. Basecalled reads were processed with STRspy using a custom database containing alleles reported in the STRSeq BioProject NIST 1036 dataset. Resultant STR allele designations and flanking region single nucleotide polymorphism (SNP) calls were compared to the manufacturer-validated genotypes for each sample. STRspy generated robust and reliable genotypes across all autosomal STR loci amplified with 30 PCR cycles, achieving 100% concordance based on both length and sequence. Furthermore, we were able to identify flanking region SNPs with >90% accuracy. These results demonstrate that nanopore sequencing platforms are capable of revealing additional variation in and around STR loci depending on read coverage. As the first long-read platform-specific method to successfully profile the entire panel of autosomal STRs amplified by a commercially available multiplex, STRspy significantly increases the feasibility of nanopore sequencing in forensic applications.

Download Full-text

Latest techniques to study DNA methylation

Essays in Biochemistry ◽

10.1042/ebc20190027 ◽

2019 ◽

Vol 63 (6) ◽

pp. 639-648 ◽

Cited By ~ 9

Author(s):

Quentin Gouil ◽

Andrew Keniry

Keyword(s):

Dna Methylation ◽

Bisulfite Sequencing ◽

Dna Degradation ◽

Regions Of Interest ◽

Nanopore Sequencing ◽

Short Read ◽

Long Read ◽

Base Modifications ◽

Genomic Regions ◽

Derivatives Of

Abstract Bisulfite sequencing is a powerful technique to detect 5-methylcytosine in DNA that has immensely contributed to our understanding of epigenetic regulation in plants and animals. Meanwhile, research on other base modifications, including 6-methyladenine and 4-methylcytosine that are frequent in prokaryotes, has been impeded by the lack of a comparable technique. Bisulfite sequencing also suffers from a number of drawbacks that are difficult to surmount, among which DNA degradation, lack of specificity, or short reads with low sequence diversity. In this review, we explore the recent refinements to bisulfite sequencing protocols that enable targeting genomic regions of interest, detecting derivatives of 5-methylcytosine, and mapping single-cell methylomes. We then present the unique advantage of long-read sequencing in detecting base modifications in native DNA and highlight the respective strengths and weaknesses of PacBio and Nanopore sequencing for this application. Although analysing epigenetic data from long-read platforms remains challenging, the ability to detect various modified bases from a universal sample preparation, in addition to the mapping and phasing advantages of the longer read lengths, provide long-read sequencing with a decisive edge over short-read bisulfite sequencing for an expanding number of applications across kingdoms.

Download Full-text

Cas9 targeted enrichment of mobile elements using nanopore sequencing

Nature Communications ◽

10.1038/s41467-021-23918-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Torrin L. McDonald ◽

Weichen Zhou ◽

Christopher P. Castro ◽

Camille Mumm ◽

Jessica A. Switzenberg ◽

...

Keyword(s):

Genetic Disorders ◽

Mobile Element ◽

Read Length ◽

Whole Genome ◽

Nanopore Sequencing ◽

Short Read Sequencing ◽

Human Genomes ◽

Long Read ◽

Genomic Regions ◽

Targeted Enrichment

AbstractMobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on-targeted signals and exhibiting a 13.4-54x enrichment over whole-genome approaches. We show an individual flow cell can recover most MEIs (97% L1Hs, 93% AluYb, 51% AluYa, 99% SVA_F, and 65% SVA_E). We identify seventeen non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements.

Download Full-text

Assessment of Evolutionary Relationships for Prioritization of Myxobacteria for Natural Product Discovery

Microorganisms ◽

10.3390/microorganisms9071376 ◽

2021 ◽

Vol 9 (7) ◽

pp. 1376

Author(s):

Andrew Ahearne ◽

Hanan Albataineh ◽

Scot E. Dowd ◽

D. Cole Stevens

Keyword(s):

Comparative Genomics ◽

Natural Product ◽

Genome Sequencing ◽

High Similarity ◽

Taxonomic Assignment ◽

Natural Product Discovery ◽

16S Gene ◽

Long Read ◽

Traditional Approaches

Discoveries of novel myxobacteria have started to unveil the potentially vast phylogenetic diversity within the family Myxococcaceae and have brought about an updated approach to myxobacterial classification. While traditional approaches focused on morphology, 16S gene sequences, and biochemistry, modern methods including comparative genomics have provided a more thorough assessment of myxobacterial taxonomy. Herein, we utilize long-read genome sequencing for two myxobacteria previously classified as Archangium primigenium and Chondrococcus macrosporus, as well as four environmental myxobacteria newly isolated for this study. Average nucleotide identity and digital DNA–DNA hybridization scores from comparative genomics suggest previously classified as A. primigenium to instead be a novel member of the genus Melittangium, C. macrosporus to be a potentially novel member of the genus Corallococcus with high similarity to Corallococcus exercitus, and the four isolated myxobacteria to include another novel Corallococcus species, a novel Pyxidicoccus species, a strain of Corallococcus exiguus, and a potentially novel Myxococcus species with high similarity to Myxococcus stipitatus. We assess the biosynthetic potential of each sequenced myxobacterium and suggest that genus-level conservation of biosynthetic pathways support our preliminary taxonomic assignment. Altogether, we suggest that long-read genome sequencing benefits the classification of myxobacteria and improves determination of biosynthetic potential for prioritization of natural product discovery.

Download Full-text

Integrative utility of long read sequencing-based whole genome analysis and phenotypic assay on differentiating isoniazid-resistant signature of Mycobacterium tuberculosis

Journal of Biomedical Science ◽

10.1186/s12929-021-00783-x ◽

2021 ◽

Vol 28 (1) ◽

Author(s):

Ming-Chih Yu ◽

Ching-Sheng Hung ◽

Chun-Kai Huang ◽

Cheng-Hui Wang ◽

Yu-Chih Liang ◽

...

Keyword(s):

Mycobacterium Tuberculosis ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Susceptibility Test ◽

Whole Genome ◽

Nanopore Sequencing ◽

Drug Resistant ◽

Clinical Scenarios ◽

Wide Range ◽

Long Read

Abstract Background With the advancement of next generation sequencing technologies (NGS), whole-genome sequencing (WGS) has been deployed to a wide range of clinical scenarios. Rapid and accurate classification of drug-resistant Mycobacterium tuberculosis (MTB) would be advantageous in reducing the amplification of additional drug resistance and disease transmission. Methods In this study, a long-read sequencing approach was subjected to the whole-genome sequencing of clinical MTB clones with susceptibility test profiles, including isoniazid (INH) susceptible clones (n = 10) and INH resistant clones (n = 42) isolated from clinical specimens. Non-synonymous variants within the katG or inhA gene associated with INH resistance was identified using Nanopore sequencing coupled with a corresponding analytical workflow. Results In total, 54 nucleotide variants within the katG gene and 39 variants within the inhA gene associated with INH resistance were identified. Consistency among the results of genotypic profiles, susceptibility test, and minimal inhibitory concentration, the high-INH resistance signature was estimated using the area under the receiver operating characteristic curve with the existence of Ser315Thr (AUC = 0.822) or Thr579Asn (AUC = 0.875). Conclusions Taken together, we curated lists of coding variants associated with differential INH resistance using Nanopore sequencing, which may constitute an emerging platform for rapid and accurate identification of drug-resistant MTB clones.

Download Full-text

NanoSplicer: Accurate identification of splice junctions using Oxford Nanopore sequencing

10.1101/2021.10.23.465402 ◽

2021 ◽

Author(s):

Yupei You ◽

Michael B. Clark ◽

Heejung Shim

Keyword(s):

Cancer Cell Line ◽

Splice Junction ◽

Electrical Current ◽

Lung Cancer Cell Line ◽

Nanopore Sequencing ◽

Lung Cancer Cell ◽

Accurate Identification ◽

Oxford Nanopore ◽

Long Read ◽

Splice Junctions

Motivation: Long read sequencing methods have considerable advantages for characterising RNA isoforms. Oxford nanopore sequencing records changes in electrical current when nucleic acid traverses through a pore. However, basecalling of this raw signal (known as a squiggle) is error prone, making it challenging to accurately identify splice junctions. Existing strategies include utilising matched short-read data and/or annotated splice junctions to correct nanopore reads but add expense or limit junctions to known (incomplete) annotations. Therefore, a method that could accurately identify splice junctions solely from nanopore data would have numerous advantages. Results: We developed "NanoSplicer" to identify splice junctions using raw nanopore signal (squiggles). For each splice junction the observed squiggle is compared to candidate squiggles representing potential junctions to identify the correct candidate. Measuring squiggle similarity enables us to compute the probability of each candidate junction and find the most likely one. We tested our method using 1. synthetic mRNAs with known splice junctions 2. biological mRNAs from a lung-cancer cell-line. The results from both datasets demonstrate NanoSplicer improves splice junction identification, especially when the basecalling error rate near the splice junction is elevated. Our method is implemented in the software package NanoSplicer, available at https://github.com/shimlab/NanoSplicer.

Download Full-text