Rapid Characterization of Complex Killer Cell Immunoglobulin-Like Receptor (KIR) Regions Using Cas9 Enrichment and Nanopore Sequencing

Long-read sequencing approaches have considerably improved the quality and contiguity of genome assemblies. Such platforms bear the potential to resolve even extremely complex regions, such as multigenic immune families and repetitive stretches of DNA. Deep sequencing coverage, however, is required to overcome low nucleotide accuracy, especially in regions with high homopolymer density, copy number variation, and sequence similarity, such as the MHC and KIR gene clusters of the immune system. Therefore, we have adapted a targeted enrichment protocol in combination with long-read sequencing to efficiently annotate complex KIR gene regions. Using Cas9 endonuclease activity, segments of the KIR gene cluster were enriched and sequenced on an Oxford Nanopore Technologies platform. This provided sufficient coverage to accurately resolve and phase highly complex KIR haplotypes. Our strategy eliminates PCR-induced amplification errors, facilitates rapid characterization of large and complex multigenic regions, including its epigenetic footprint, and is applicable in multiple species, even in the absence of a reference genome.

Download Full-text

Rapid characterization of complex genomic regions using Cas9 enrichment and Nanopore sequencing

10.1101/2021.03.11.434935 ◽

2021 ◽

Author(s):

Jesse Bruijnesteijn ◽

Marit van der Wiel ◽

Natasja G. de Groot ◽

Ronald E. Bontrop

Keyword(s):

Sequence Similarity ◽

Gene Clusters ◽

Oxford Nanopore ◽

Long Read ◽

Number Variation ◽

Rapid Characterization ◽

Multiple Species ◽

Genomic Regions ◽

Genome Assemblies

AbstractLong-read sequencing approaches have considerably improved the quality and contiguity of genome assemblies. Such platforms bear the potential to resolve even extremely complex regions, such as multigenic families and repetitive stretches of DNA. Deep sequencing coverage, however, is required to overcome low nucleotide accuracy, especially in regions with high homopolymer density, copy number variation, and sequence similarity, such as the MHC and KIR gene clusters of the immune system. Therefore, we have adapted a targeted enrichment protocol in combination with long-read sequencing to efficiently annotate complex genomic regions. Using Cas9 endonuclease activity, segments of the complex KIR gene cluster were enriched and sequenced on an Oxford Nanopore Technologies platform. This provided sufficient coverage to accurately resolve and phase highly complex KIR haplotypes. Our strategy facilitates rapid characterization of large and complex multigenic regions, including its epigenetic footprint, in multiple species, even in the absence of a reference genome.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

Dual Isoform Sequencing Reveals a Multifaceted Transcriptional Architecture of a Prototype Baculovirus

10.21203/rs.3.rs-637036/v1 ◽

2021 ◽

Author(s):

Gábor Torma ◽

Dóra Tombácz ◽

Norbert Moldován ◽

Ádám Fülöp ◽

István Prazsák ◽

...

Keyword(s):

Protein Coding ◽

Rna Molecules ◽

Non Coding Rna ◽

Oxford Nanopore ◽

The Pacific ◽

Viral Genes ◽

Long Read ◽

Oxford Nanopore Technologies ◽

Overlapping Transcripts

Abstract In this study, we used two long-read sequencing (LRS) techniques, Sequel from the Pacific Biosciences and MinION from Oxford Nanopore Technologies, for the transcriptional characterization of a prototype baculovirus, Autographacalifornica multiple nucleopolyhedrovirus. LRS is able to read full-length RNA molecules, and thereby to distinguish between transcript isoforms, mono- and polycistronic RNAs, and overlapping transcripts. Altogether, we detected 875 transcripts, of which 759 are novel and 116 have been annotated previously. These RNA molecules include 41 novel putative protein coding transcript (each containing 5’-truncated in-frame ORFs), 14 monocistronic transcripts, 99 multicistronic RNAs, 101 non-coding RNA, and 504 length isoforms. We also detected RNA methylation in 12 viral genes and RNA hyper-editing in the longer 5’-UTR transcript isoform of ORF 19 gene.

Download Full-text

Microbial diversity characterization of seawater in a pilot study using Oxford Nanopore Technologies long-read sequencing

10.21203/rs.3.rs-17068/v2 ◽

2020 ◽

Author(s):

Michael Liem ◽

Tonny Regensburg-Tuïnk ◽

Christiaan Henkel ◽

Hans Jansen ◽

Herman Spaink

Keyword(s):

Microbial Diversity ◽

Environmental Samples ◽

Sea Water ◽

Flow Cells ◽

Oxford Nanopore ◽

Challenging Tasks ◽

Long Read ◽

Close Relatives ◽

Oxford Nanopore Technologies

Abstract Objective: Currently the majority of non-culturable microbes in sea water are yet to be discovered, Nanopore offers a solution to overcome the challenging tasks to identify the genomes and complex composition of oceanic microbiomes. In this study we evaluate the utility of Oxford Nanopore Technologies (ONT) sequencing to characterize microbial diversity in seawater from multiple locations. We compared the microbial species diversity of retrieved environmental samples from two different locations and time points.Results: With only three ONT flow cells we were able to identify thousands of organisms, including bacteriophages, from which a large part at species level. It was possible to assemble genomes from environmental samples with Flye. In several cases this resulted in >1 Mbp contigs and in the particular case of a Thioglobus singularis species it even produced a near complete genome. k-mer analysis reveals that a large part of the data represents species of which close relatives have not yet been deposited to the database. These results show that our approach is suitable for scalable genomic investigations such as monitoring oceanic biodiversity and provides a new platform for education in biodiversity.

Download Full-text

TagSeqTools: a flexible and comprehensive analysis pipeline for NAD tagSeq data

10.1101/2020.03.09.982934 ◽

2020 ◽

Cited By ~ 1

Author(s):

Huan Zhong ◽

Zongwei Cai ◽

Zhu Yang ◽

Yiji Xia

Keyword(s):

Rna Sequencing ◽

Comprehensive Analysis ◽

Enzymatic Reactions ◽

Computational Tool ◽

Sequencing Data ◽

Analysis Pipeline ◽

Oxford Nanopore ◽

Long Read ◽

Identification And Characterization

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.

Download Full-text

Purification and characterization of a DNA-binding heterodimer of 52 and 100 kDa from HeLa cells

Biochemical Journal ◽

10.1042/bj2900267 ◽

1993 ◽

Vol 290 (1) ◽

pp. 267-272 ◽

Cited By ~ 36

Author(s):

W W Zhang ◽

L X Zhang ◽

R K Busch ◽

J Farrés ◽

H Busch

Keyword(s):

Dna Binding ◽

Hela Cells ◽

Topoisomerase Ii ◽

Sequence Similarity ◽

Peptide Sequence ◽

Upstream Region ◽

Cell Nuclei ◽

Sperm Dna ◽

Multiple Species

In studies of protein binding to the upstream region of the human proliferation-associated antigen p120 gene, a heterodimer of 52 and 100 kDa proteins was purified from HeLa cells. A 1:1 ratio of p52 and p100 was constant throughout the purification. The heterodimer was localized to cell nuclei, as shown by immunofluorescence. The pI values of the p52 and p100 were 7.8 and 8.6 respectively. The peptide sequences obtained for p52 (QSNKTFNLEKQNHTPRKKHQ and PLRGKQLRVRFAAHSASLTVR) and for p100 (PGGPKPGGGPGLSTPGGHPKPPHRGGGEPPRGRQ and GPGPGQSGPKPPIPPPPPHQQ) were not found in the computer databanks. One p52 peptide sequence, PLRGKQLRVRFA, shows considerable sequence similarity to a conserved motif in topoisomerase II of multiple species. The p52/100 heterodimer bound to different DNA probes. The binding was competed by poly(dI-dC), sonicated salmon sperm DNA, and circular or linearized plasmid DNA. The optimal DNA binding for the heterodimer was at pH 7-9 with low salt. The DNA-binding subunit of the heterodimer was the p100 polypeptide, as shown by u.v.-cross-linking assays and Southwestern blots.

Download Full-text

Plasmidome analysis of carbapenem-resistant Enterobacteriaceae isolated in Vietnam

10.1101/2020.03.18.996710 ◽

2020 ◽

Author(s):

Aki Hirabayashi ◽

Koji Yahara ◽

Satomi Mitsuhashi ◽

So Nakagawa ◽

Tadashi Imanishi ◽

...

Keyword(s):

Carbapenem Resistance ◽

Genomic Epidemiology ◽

Carbapenem Resistant ◽

Oxford Nanopore ◽

Carbapenemase Gene ◽

Long Read ◽

Severe Infections ◽

Oxford Nanopore Technologies ◽

Carbapenem Resistant Enterobacteriaceae

Carbapenem-resistant Enterobacteriaceae (CRE) represent a serious threat to public health due to limited management of severe infections and high mortality. The rate of resistance of Enterobacteriaceae isolates to major antimicrobials, including carbapenems, is much higher in Vietnam than in Western countries, but the reasons remain unknown due to the lack of genomic epidemiology research. A previous study suggested that carbapenem resistance genes, such as the carbapenemase gene bla NDM-1 , spread via plasmids among Enterobacteriaceae in Vietnam. In this study, we performed detection and molecular characterization of bla NDM-1 -carrying plasmids in CRE isolated in Vietnam, and identified several possible cases of horizontal transfer of plasmids both within and among species of bacteria. Twenty-five carbapenem-resistant isolates from Enterobacteriaceae clinically isolated in a reference medical institution in Hanoi were sequenced on Illumina short-read sequencers, and 12 isolates harboring bla NDM-1 were sequenced on an Oxford Nanopore Technologies long-read sequencer to obtain complete plasmid sequences. Most of the plasmids co-carried genes conferring resistance to clinically relevant antimicrobials, including third-generation cephalosporins, aminoglycosides, and fluoroquinolones, in addition to bla NDM-1 , leading to multidrug resistance of their bacterial hosts. These results provide insight into the genetic basis of CRE in Vietnam, and could help control nosocomial infections.

Download Full-text

Microbial diversity characterization of seawater in a pilot study using Oxford Nanopore Technologies long-read sequencing

10.21203/rs.3.rs-17068/v1 ◽

2020 ◽

Author(s):

Michael Liem ◽

A.J.G. Regensburg-Tuïnk ◽

C.V. Henkel ◽

H.P. Spaink

Keyword(s):

Microbial Diversity ◽

Environmental Samples ◽

Sea Water ◽

Flow Cells ◽

Oxford Nanopore ◽

Challenging Tasks ◽

Long Read ◽

Close Relatives ◽

Oxford Nanopore Technologies

Abstract Objective Currently the majority of non-culturable microbes in sea water are yet to be discovered, Nanopore offers a solution to overcome the challenging tasks to identify the genomes and complex composition of oceanic microbiomes. In this study we evaluate the utility of Oxford Nanopore Technologies (ONT) sequencing to characterize microbial diversity in seawater from multiple locations. We compared the microbial species diversity of retrieved environmental samples from two different locations and time points. Results With only three ONT flow cells we were able to identify thousands of organisms, including bacteriophages, from which a large part at species level. It was possible to assemble genomes from environmental samples with Flye. In several cases this resulted in >1 Mbp contigs and in the particular case of a Thioglobus singularis species it even produced a near complete genome. k-mer analysis reveals that a large part of the data represents species of which close relatives have not yet been deposited to the database. These results show that our approach is suitable for scalable genomic investigations such as monitoring oceanic biodiversity and provides a new platform for education in biodiversity.

Download Full-text

Highly contiguous assemblies of 101 drosophilid genomes

10.1101/2020.12.14.422775 ◽

2020 ◽

Author(s):

Bernard Y Kim ◽

Jeremy Wang ◽

Danny E. Miller ◽

Olga Barmina ◽

Emily K. Delaney ◽

...

Keyword(s):

Community Resource ◽

High Quality ◽

Public Resource ◽

Oxford Nanopore ◽

Starting Point ◽

Long Read ◽

Wet Lab ◽

Species Groups ◽

Genome Assemblies ◽

High Quality Genome

Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long read sequencing allow high quality genome assemblies for tens or even hundreds of species to be generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of high-quality assemblies for 101 lines of 95 drosophilid species encompassing 14 species groups and 35 sub-groups with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. These assemblies, along with detailed wet lab protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution within this key group.

Download Full-text

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies

10.21203/rs.3.rs-712747/v1 ◽

2021 ◽

Author(s):

Arang Rhie ◽

Ann Mc Cartney ◽

Kishwar Shafin ◽

Michael Alonge ◽

Andrey Bzikadze ◽

...

Keyword(s):

Genome Assembly ◽

Tandem Repeats ◽

Hydatidiform Mole ◽

Segmental Duplications ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Human Genome Assembly ◽

Long Read ◽

Genome Assemblies ◽

Oxford Nanopore Technologies

Abstract Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first Telomere-to-Telomere (T2T) human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Though derived from highly accurate sequencing, evaluation revealed that the initial T2T draft assembly had evidence of small errors and structural misassemblies. To correct these errors, we designed a novel repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly QV to 73.9. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both PacBio HiFi and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies

Download Full-text