Rapid characterization of complex genomic regions using Cas9 enrichment and Nanopore sequencing

AbstractLong-read sequencing approaches have considerably improved the quality and contiguity of genome assemblies. Such platforms bear the potential to resolve even extremely complex regions, such as multigenic families and repetitive stretches of DNA. Deep sequencing coverage, however, is required to overcome low nucleotide accuracy, especially in regions with high homopolymer density, copy number variation, and sequence similarity, such as the MHC and KIR gene clusters of the immune system. Therefore, we have adapted a targeted enrichment protocol in combination with long-read sequencing to efficiently annotate complex genomic regions. Using Cas9 endonuclease activity, segments of the complex KIR gene cluster were enriched and sequenced on an Oxford Nanopore Technologies platform. This provided sufficient coverage to accurately resolve and phase highly complex KIR haplotypes. Our strategy facilitates rapid characterization of large and complex multigenic regions, including its epigenetic footprint, in multiple species, even in the absence of a reference genome.

Download Full-text

Rapid Characterization of Complex Killer Cell Immunoglobulin-Like Receptor (KIR) Regions Using Cas9 Enrichment and Nanopore Sequencing

Frontiers in Immunology ◽

10.3389/fimmu.2021.722181 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jesse Bruijnesteijn ◽

Marit van der Wiel ◽

Natasja G. de Groot ◽

Ronald E. Bontrop

Keyword(s):

Sequence Similarity ◽

Killer Cell ◽

Gene Clusters ◽

Oxford Nanopore ◽

Long Read ◽

Number Variation ◽

Rapid Characterization ◽

Multiple Species ◽

Genome Assemblies

Long-read sequencing approaches have considerably improved the quality and contiguity of genome assemblies. Such platforms bear the potential to resolve even extremely complex regions, such as multigenic immune families and repetitive stretches of DNA. Deep sequencing coverage, however, is required to overcome low nucleotide accuracy, especially in regions with high homopolymer density, copy number variation, and sequence similarity, such as the MHC and KIR gene clusters of the immune system. Therefore, we have adapted a targeted enrichment protocol in combination with long-read sequencing to efficiently annotate complex KIR gene regions. Using Cas9 endonuclease activity, segments of the KIR gene cluster were enriched and sequenced on an Oxford Nanopore Technologies platform. This provided sufficient coverage to accurately resolve and phase highly complex KIR haplotypes. Our strategy eliminates PCR-induced amplification errors, facilitates rapid characterization of large and complex multigenic regions, including its epigenetic footprint, and is applicable in multiple species, even in the absence of a reference genome.

Download Full-text

Deciphering Neurodegenerative Diseases Using Long-Read Sequencing

Neurology ◽

10.1212/wnl.0000000000012466 ◽

2021 ◽

pp. 10.1212/WNL.0000000000012466

Author(s):

Yun Su ◽

Liyuan Fan ◽

Changhe Shi ◽

Tai Wang ◽

Huimin Zheng ◽

...

Keyword(s):

Neurodegenerative Diseases ◽

Single Molecule ◽

Direct Detection ◽

Gc Content ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read ◽

Repeat Expansions ◽

Genomic Regions

Neurodegenerative diseases exhibit chronic progressive lesions in the central and peripheral nervous systems with unclear causes. The search for pathogenic mutations in human neurodegenerative diseases has benefited from massively parallel short-read sequencers. However, genomic regions, including repetitive elements, especially with high/low GC content, are far beyond the capability of conventional approaches. Recently, long-read single-molecule DNA sequencing technologies have emerged and enabled researchers to study genomes, transcriptomes, and metagenomes at unprecedented resolutions. The identification of novel mutations in unresolved neurodegenerative disorders, the characterization of causative repeat expansions, and the direct detection of epigenetic modifications on naive DNA by virtue of long-read sequencers will further expand our understanding of neurodegenerative diseases. In this paper, we review and compare two prevailing long-read sequencing technologies, Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), and discuss their applications in neurodegenerative diseases.

Download Full-text

Genome assembly and characterization of a complex zfBED-NLR gene-containing disease resistance locus in Carolina Gold Select rice with Nanopore sequencing

10.1101/675678 ◽

2019 ◽

Author(s):

Andrew C. Read ◽

Matthew J. Moscou ◽

Aleksey V. Zimin ◽

Geo Pertea ◽

Rachel S. Meyer ◽

...

Keyword(s):

Disease Resistance ◽

Genome Assembly ◽

Sequence Similarity ◽

Rice Variety ◽

Gene Families ◽

Resistance Locus ◽

High Sequence Similarity ◽

Long Read ◽

Genomic Regions

AbstractBackgroundLong-read sequencing facilitates assembly of complex genomic regions. In plants, loci containing nucleotide-binding, leucine-rich repeat (NLR) disease resistance genes are an important example of such regions. NLR genes make up one of the largest gene families in plants and are often clustered, evolving via duplication, contraction, and transposition. We recently mapped the Xo1 locus for resistance to bacterial blight and bacterial leaf streak, found in the American heirloom rice variety Carolina Gold Select, to a region that in the Nipponbare reference genome is rich in NLR genes.ResultsToward identification of the Xo1 gene, we combined Nanopore and Illumina reads to generate a high-quality genome assembly for Carolina Gold Select. We identified 529 full or partial NLR genes and discovered, relative to the reference, an expansion of NLR genes at the Xo1 locus. One NLR gene at Xo1 has high sequence similarity to the cloned, functionally similar Xa1 gene. Both harbor an integrated zfBED domain and near-identical, tandem, C-terminal repeats. Across diverse Oryzeae, we identified two sub-clades of such NLR genes, varying in the presence of the zfBED domain and the number of repeats.ConclusionsWhole genome sequencing combining Nanopore and Illumina reads effectively resolves NLR gene loci, providing context as well as content. Our identification of an Xo1 candidate is an important step toward mechanistic characterization, including the role(s) of the zfBED domain. Further, the Carolina Gold Select genome assembly will facilitate identification and exploitation of other useful traits in this historically important rice variety.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

Dual Isoform Sequencing Reveals a Multifaceted Transcriptional Architecture of a Prototype Baculovirus

10.21203/rs.3.rs-637036/v1 ◽

2021 ◽

Author(s):

Gábor Torma ◽

Dóra Tombácz ◽

Norbert Moldován ◽

Ádám Fülöp ◽

István Prazsák ◽

...

Keyword(s):

Protein Coding ◽

Rna Molecules ◽

Non Coding Rna ◽

Oxford Nanopore ◽

The Pacific ◽

Viral Genes ◽

Long Read ◽

Oxford Nanopore Technologies ◽

Overlapping Transcripts

Abstract In this study, we used two long-read sequencing (LRS) techniques, Sequel from the Pacific Biosciences and MinION from Oxford Nanopore Technologies, for the transcriptional characterization of a prototype baculovirus, Autographacalifornica multiple nucleopolyhedrovirus. LRS is able to read full-length RNA molecules, and thereby to distinguish between transcript isoforms, mono- and polycistronic RNAs, and overlapping transcripts. Altogether, we detected 875 transcripts, of which 759 are novel and 116 have been annotated previously. These RNA molecules include 41 novel putative protein coding transcript (each containing 5’-truncated in-frame ORFs), 14 monocistronic transcripts, 99 multicistronic RNAs, 101 non-coding RNA, and 504 length isoforms. We also detected RNA methylation in 12 viral genes and RNA hyper-editing in the longer 5’-UTR transcript isoform of ORF 19 gene.

Download Full-text

Microbial diversity characterization of seawater in a pilot study using Oxford Nanopore Technologies long-read sequencing

10.21203/rs.3.rs-17068/v2 ◽

2020 ◽

Author(s):

Michael Liem ◽

Tonny Regensburg-Tuïnk ◽

Christiaan Henkel ◽

Hans Jansen ◽

Herman Spaink

Keyword(s):

Microbial Diversity ◽

Environmental Samples ◽

Sea Water ◽

Flow Cells ◽

Oxford Nanopore ◽

Challenging Tasks ◽

Long Read ◽

Close Relatives ◽

Oxford Nanopore Technologies

Abstract Objective: Currently the majority of non-culturable microbes in sea water are yet to be discovered, Nanopore offers a solution to overcome the challenging tasks to identify the genomes and complex composition of oceanic microbiomes. In this study we evaluate the utility of Oxford Nanopore Technologies (ONT) sequencing to characterize microbial diversity in seawater from multiple locations. We compared the microbial species diversity of retrieved environmental samples from two different locations and time points.Results: With only three ONT flow cells we were able to identify thousands of organisms, including bacteriophages, from which a large part at species level. It was possible to assemble genomes from environmental samples with Flye. In several cases this resulted in >1 Mbp contigs and in the particular case of a Thioglobus singularis species it even produced a near complete genome. k-mer analysis reveals that a large part of the data represents species of which close relatives have not yet been deposited to the database. These results show that our approach is suitable for scalable genomic investigations such as monitoring oceanic biodiversity and provides a new platform for education in biodiversity.

Download Full-text

TagSeqTools: a flexible and comprehensive analysis pipeline for NAD tagSeq data

10.1101/2020.03.09.982934 ◽

2020 ◽

Cited By ~ 1

Author(s):

Huan Zhong ◽

Zongwei Cai ◽

Zhu Yang ◽

Yiji Xia

Keyword(s):

Rna Sequencing ◽

Comprehensive Analysis ◽

Enzymatic Reactions ◽

Computational Tool ◽

Sequencing Data ◽

Analysis Pipeline ◽

Oxford Nanopore ◽

Long Read ◽

Identification And Characterization

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.

Download Full-text

Purification and characterization of a DNA-binding heterodimer of 52 and 100 kDa from HeLa cells

Biochemical Journal ◽

10.1042/bj2900267 ◽

1993 ◽

Vol 290 (1) ◽

pp. 267-272 ◽

Cited By ~ 36

Author(s):

W W Zhang ◽

L X Zhang ◽

R K Busch ◽

J Farrés ◽

H Busch

Keyword(s):

Dna Binding ◽

Hela Cells ◽

Topoisomerase Ii ◽

Sequence Similarity ◽

Peptide Sequence ◽

Upstream Region ◽

Cell Nuclei ◽

Sperm Dna ◽

Multiple Species

In studies of protein binding to the upstream region of the human proliferation-associated antigen p120 gene, a heterodimer of 52 and 100 kDa proteins was purified from HeLa cells. A 1:1 ratio of p52 and p100 was constant throughout the purification. The heterodimer was localized to cell nuclei, as shown by immunofluorescence. The pI values of the p52 and p100 were 7.8 and 8.6 respectively. The peptide sequences obtained for p52 (QSNKTFNLEKQNHTPRKKHQ and PLRGKQLRVRFAAHSASLTVR) and for p100 (PGGPKPGGGPGLSTPGGHPKPPHRGGGEPPRGRQ and GPGPGQSGPKPPIPPPPPHQQ) were not found in the computer databanks. One p52 peptide sequence, PLRGKQLRVRFA, shows considerable sequence similarity to a conserved motif in topoisomerase II of multiple species. The p52/100 heterodimer bound to different DNA probes. The binding was competed by poly(dI-dC), sonicated salmon sperm DNA, and circular or linearized plasmid DNA. The optimal DNA binding for the heterodimer was at pH 7-9 with low salt. The DNA-binding subunit of the heterodimer was the p100 polypeptide, as shown by u.v.-cross-linking assays and Southwestern blots.

Download Full-text

Plasmidome analysis of carbapenem-resistant Enterobacteriaceae isolated in Vietnam

10.1101/2020.03.18.996710 ◽

2020 ◽

Author(s):

Aki Hirabayashi ◽

Koji Yahara ◽

Satomi Mitsuhashi ◽

So Nakagawa ◽

Tadashi Imanishi ◽

...

Keyword(s):

Carbapenem Resistance ◽

Genomic Epidemiology ◽

Carbapenem Resistant ◽

Oxford Nanopore ◽

Carbapenemase Gene ◽

Long Read ◽

Severe Infections ◽

Oxford Nanopore Technologies ◽

Carbapenem Resistant Enterobacteriaceae

Carbapenem-resistant Enterobacteriaceae (CRE) represent a serious threat to public health due to limited management of severe infections and high mortality. The rate of resistance of Enterobacteriaceae isolates to major antimicrobials, including carbapenems, is much higher in Vietnam than in Western countries, but the reasons remain unknown due to the lack of genomic epidemiology research. A previous study suggested that carbapenem resistance genes, such as the carbapenemase gene bla NDM-1 , spread via plasmids among Enterobacteriaceae in Vietnam. In this study, we performed detection and molecular characterization of bla NDM-1 -carrying plasmids in CRE isolated in Vietnam, and identified several possible cases of horizontal transfer of plasmids both within and among species of bacteria. Twenty-five carbapenem-resistant isolates from Enterobacteriaceae clinically isolated in a reference medical institution in Hanoi were sequenced on Illumina short-read sequencers, and 12 isolates harboring bla NDM-1 were sequenced on an Oxford Nanopore Technologies long-read sequencer to obtain complete plasmid sequences. Most of the plasmids co-carried genes conferring resistance to clinically relevant antimicrobials, including third-generation cephalosporins, aminoglycosides, and fluoroquinolones, in addition to bla NDM-1 , leading to multidrug resistance of their bacterial hosts. These results provide insight into the genetic basis of CRE in Vietnam, and could help control nosocomial infections.

Download Full-text

Microbial diversity characterization of seawater in a pilot study using Oxford Nanopore Technologies long-read sequencing

10.21203/rs.3.rs-17068/v1 ◽

2020 ◽

Author(s):

Michael Liem ◽

A.J.G. Regensburg-Tuïnk ◽

C.V. Henkel ◽

H.P. Spaink

Keyword(s):

Microbial Diversity ◽

Environmental Samples ◽

Sea Water ◽

Flow Cells ◽

Oxford Nanopore ◽

Challenging Tasks ◽

Long Read ◽

Close Relatives ◽

Oxford Nanopore Technologies

Abstract Objective Currently the majority of non-culturable microbes in sea water are yet to be discovered, Nanopore offers a solution to overcome the challenging tasks to identify the genomes and complex composition of oceanic microbiomes. In this study we evaluate the utility of Oxford Nanopore Technologies (ONT) sequencing to characterize microbial diversity in seawater from multiple locations. We compared the microbial species diversity of retrieved environmental samples from two different locations and time points. Results With only three ONT flow cells we were able to identify thousands of organisms, including bacteriophages, from which a large part at species level. It was possible to assemble genomes from environmental samples with Flye. In several cases this resulted in >1 Mbp contigs and in the particular case of a Thioglobus singularis species it even produced a near complete genome. k-mer analysis reveals that a large part of the data represents species of which close relatives have not yet been deposited to the database. These results show that our approach is suitable for scalable genomic investigations such as monitoring oceanic biodiversity and provides a new platform for education in biodiversity.

Download Full-text