Nanopore sequencing of DNA concatemers reveals higher-order features of chromatin structure

AbstractHigher-order chromatin structure arises from the combinatorial physical interactions of many genomic loci. To investigate this aspect of genome architecture we developed Pore-C, which couples chromatin conformation capture with Oxford Nanopore Technologies (ONT) long reads to directly sequence multi-way chromatin contacts without amplification. In GM12878, we demonstrate that the pairwise interaction patterns implicit in Pore-C multi-way contacts are consistent with gold standard Hi-C pairwise contact maps at the compartment, TAD, and loop scales. In addition, Pore-C also detects higher-order chromatin structure at 18.5-fold higher efficiency and greater fidelity than SPRITE, a previously published higher-order chromatin profiling technology. We demonstrate Pore-C’s ability to detect and visualize multi-locus hubs associated with histone locus bodies and active / inactive nuclear compartments in GM12878. In the breast cancer cell line HCC1954, Pore-C contacts enable the reconstruction of complex and aneuploid rearranged alleles spanning multiple megabases and chromosomes. Finally, we apply Pore-C to generate a chromosome scalede novoassembly of the HG002 genome. Our results establish Pore-C as the most simple and scalable assay for the genome-wide assessment of combinatorial chromatin interactions, with additional applications for cancer rearrangement reconstruction andde novogenome assembly.

Download Full-text

De novo genome assembly of the olive fruit fly (Bactrocera oleae) developed through a combination of linked-reads and long-read technologies

10.1101/505040 ◽

2018 ◽

Cited By ~ 1

Author(s):

Haig Djambazian ◽

Anthony Bayega ◽

Konstantina T. Tsoumani ◽

Efthimia Sagri ◽

Maria-Eleni Gregoriou ◽

...

Keyword(s):

Y Chromosome ◽

De Novo ◽

Fruit Fly ◽

Bactrocera Oleae ◽

Olive Fruit Fly ◽

Olive Fruit ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Oxford Nanopore Technologies

AbstractLong-read sequencing has greatly contributed to the generation of high quality assemblies, albeit at a high cost. It is also not always clear how to combine sequencing platforms. We sequenced the genome of the olive fruit fly (Bactrocera oleae), the most important pest in the olive fruits agribusiness industry, using Illumina short-reads, mate-pairs, 10x Genomics linked-reads, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT). The 10x linked-reads assembly gave the most contiguous assembly with an N50 of 2.16 Mb. Scaffolding the linked-reads assembly using long-reads from ONT gave a more contiguous assembly with scaffold N50 of 4.59 Mb. We also present the most extensive transcriptome datasets of the olive fly derived from different tissues and stages of development. Finally, we used the Chromosome Quotient method to identify Y-chromosome scaffolds and show that the long-reads based assembly generates very highly contiguous Y-chromosome assembly.JR is a member of the MinION Access Program (MAP) and has received free-of-charge flow cells and sequencing kits from Oxford Nanopore Technologies for other projects. JR has had no other financial support from ONT.AB has received re-imbursement for travel costs associated with attending Nanopore Community meeting 2018, a meeting organized my Oxford Nanopore Technologies.

Download Full-text

Fast and accurate de novo genome assembly from long uncorrected reads

10.1101/068122 ◽

2016 ◽

Cited By ~ 8

Author(s):

Robert Vaser ◽

Ivan Sović ◽

Niranjan Nagarajan ◽

Mile Šikić

Keyword(s):

Error Correction ◽

De Novo ◽

High Quality ◽

De Novo Genome Assembly ◽

Consensus Sequences ◽

Long Reads ◽

Oxford Nanopore ◽

Order Of Magnitude ◽

Correction Step ◽

Consensus Module

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource intensive error correction and consensus generation steps to obtain high quality assemblies. We show that the error correction step can be omitted and high quality consensus sequences can be generated efficiently with a SIMD accelerated, partial order alignment based stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore datasets we show that Racon coupled with Miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.Racon is available open source under the MIT license at https://github.com/isovic/racon.git.

Download Full-text

De novo clustering of long-read transcriptome data using a greedy, quality-value based algorithm

10.1101/463463 ◽

2018 ◽

Cited By ~ 8

Author(s):

Kristoffer Sahlin ◽

Paul Medvedev

Keyword(s):

Clustering Algorithm ◽

De Novo ◽

Substantial Improvement ◽

Error Rates ◽

Reconstruction Algorithms ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Transcript Reconstruction ◽

Oxford Nanopore Technologies

AbstractLong-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (in order to scale) and makes use of quality values (in order to handle variable error rates). We test isONclust on three simulated and five biological datasets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large datasets. Our tool is available at https://github.com/ksahlin/isONclust.

Download Full-text

Clustering de Novo by Gene of Long Reads from Transcriptomics Data

10.1101/170035 ◽

2017 ◽

Cited By ~ 3

Author(s):

Camille Marchet ◽

Lolita Lecompte ◽

Corinne Da Silva ◽

Corinne Cruaud ◽

Jean-Marc Aury ◽

...

Keyword(s):

De Novo ◽

Free Access ◽

Sequencing Data ◽

Base Pairs ◽

Long Reads ◽

Oxford Nanopore ◽

Processing Step ◽

Whole Transcriptome Sequencing ◽

Long Read ◽

Transcriptomics Data

AbstractLong-read sequencing currently provides sequences of several thousand base pairs. This allows to obtain complete transcripts, which offers an un-precedented vision of the cellular transcriptome.However the literature is lacking tools to cluster such data de novo, in particular for Oxford Nanopore Technologies reads, because of the inherent high error rate compared to short reads.Our goal is to process reads from whole transcriptome sequencing data accurately and without a reference genome in order to reliably group reads coming from the same gene. This de novo approach is therefore particularly suitable for non-model species, but can also serve as a useful pre-processing step to improve read mapping. Our contribution is both to propose a new algorithm adapted to clustering of reads by gene and a practical and free access tool that permits to scale the complete processing of eukaryotic transcriptomes.We sequenced a mouse RNA sample using the MinION device, this dataset is used to compare our solution to other algorithms used in the context of biological clustering. We demonstrate its is better-suited for transcriptomics long reads. When a reference is available thus mapping possible, we show that it stands as an alternative method that predicts complementary clusters.

Download Full-text

Overlapping long sequence reads: Current innovations and challenges in developing sensitive, specific and scalable algorithms

10.1101/081596 ◽

2016 ◽

Author(s):

Justin Chu ◽

Hamid Mohamadi ◽

René L Warren ◽

Chen Yang ◽

Inanc Birol

Keyword(s):

De Novo ◽

Supplementary Information ◽

Scalable Algorithms ◽

Sequencing Technologies ◽

Memory Efficiency ◽

Computational Performance ◽

Resource Needs ◽

Long Reads ◽

Oxford Nanopore ◽

Alignment Problem

AbstractIdentifying overlaps between error-prone long reads, specifically those from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PB), is essential for certain downstream applications, including error correction and de novo assembly. Though akin to the read-to-reference alignment problem, read-to-read overlap detection is a distinct problem that can benefit from specialized algorithms that perform efficiently and robustly on high error rate long reads. Here, we review the current state-of-the-art read-to-read overlap tools for error-prone long reads, including BLASR, DALIGNER, MHAP, GraphMap, and Minimap. These specialized bioinformatics tools differ not just in their algorithmic designs and methodology, but also in their robustness of performance on a variety of datasets, time and memory efficiency, and scalability. We highlight the algorithmic features of these tools, as well as their potential issues and biases when utilizing any particular method. We benchmarked these tools, tracking their resource needs and computational performance, and assessed the specificity and precision of each. The concepts surveyed may apply to future sequencing technologies, as scalability is becoming more relevant with increased sequencing [email protected]; [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

De-novo Assembly of Limnospira fusiformis Using Ultra-Long Reads

Frontiers in Microbiology ◽

10.3389/fmicb.2021.657995 ◽

2021 ◽

Vol 12 ◽

Author(s):

McKenna Hicks ◽

Thuy-Khanh Tran-Dao ◽

Logan Mulroney ◽

David L. Bernick

Keyword(s):

Phylogenetic Analysis ◽

Type Strain ◽

Reference Genome ◽

De Novo ◽

Illumina Miseq ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Oxford Nanopore Technologies ◽

Rdna Analysis

The Limnospira genus is a recently established clade that is economically important due to its worldwide use in biotechnology and agriculture. This genus includes organisms that were reclassified from Arthrospira, which are commercially marketed as “Spirulina.” Limnospira are photoautotrophic organisms that are widely used for research in nutrition, medicine, bioremediation, and biomanufacturing. Despite its widespread use, there is no closed genome for the Limnospira genus, and no reference genome for the type strain, Limnospira fusiformis. In this work, the L. fusiformis genome was sequenced using Oxford Nanopore Technologies MinION and assembled using only ultra-long reads (>35 kb). This assembly was polished with Illumina MiSeq reads sourced from an axenic L. fusiformis culture; axenicity was verified via microscopy and rDNA analysis. Ultra-long read sequencing resulted in a 6.42 Mb closed genome assembled as a single contig with no plasmid. Phylogenetic analysis placed L. fusiformis in the Limnospira clade; some Arthrospira were also placed in this clade, suggesting a misclassification of these strains. This work provides a fully closed and accurate reference genome for the economically important type strain, L. fusiformis. We also present a rapid axenicity method to isolate L. fusiformis. These contributions enable future biotechnological development of L. fusiformis by way of genetic engineering.

Download Full-text

De novo whole-genome assembly of Chrysanthemum makinoi, a key wild chrysanthemum

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab358 ◽

2021 ◽

Author(s):

Natascha van Lieshout ◽

Martijn van Kaauwen ◽

Linda Kodde ◽

Paul Arens ◽

Marinus J M Smulders ◽

...

Keyword(s):

Ab Initio ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Its Sequence ◽

Whole Genome ◽

Annotation Pipeline ◽

Long Reads ◽

Oxford Nanopore ◽

The World

Abstract Chrysanthemum is among the top ten cut, potted and perennial garden flowers in the world. Despite this, to date, only the genomes of two wild diploid chrysanthemums have been sequenced and assembled. Here we present the most complete and contiguous chrysanthemum de novo assembly published so far, as well as a corresponding ab initio annotation. The cultivated hexaploid varieties are thought to originate from a hybrid of wild chrysanthemums, among which the diploid Chrysanthemum makinoi has been mentioned. Using a combination of Oxford Nanopore long reads, Pacific Biosciences long reads, Illumina short reads, Dovetail sequences and a genetic map, we assembled 3.1 Gb of its sequence into 9 pseudochromosomes, with an N50 of 330 Mb and BUSCO complete score of 92.1%. Our ab initio annotation pipeline predicted 95 074 genes and marked 80.0% of the genome as repetitive. This genome assembly of C. makinoi provides an important step forward in understanding the chrysanthemum genome, evolution and history.

Download Full-text

Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats

10.1101/300186 ◽

2018 ◽

Cited By ~ 3

Author(s):

Michael Schmid ◽

Daniel Frei ◽

Andrea Patrignani ◽

Ralph Schlapbach ◽

Jürg E. Frey ◽

...

Keyword(s):

Dark Matter ◽

Genome Assembly ◽

De Novo ◽

Bacterial Genomes ◽

De Novo Genome Assembly ◽

Assembly Algorithm ◽

Long Reads ◽

Oxford Nanopore ◽

Prokaryotic Genomes ◽

Genome Assemblies

AbstractGenerating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length. Beyond long repeats, the P19E3 assembly was further complicated by a shufflon region. Its complex genome could not be de novo assembled with long reads produced by Pacific Biosciences’ technology, but required very long reads from the Oxford Nanopore Technology. Another important factor for a full genomic resolution was the choice of assembly algorithm.Importantly, a repeat analysis indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this dark matter for de novo genome assembly of prokaryotes. Several of these dark matter genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assemblers capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.

Download Full-text

Hybrid Genome Assembly of a Neotropical Mutualistic Ant

Genome Biology and Evolution ◽

10.1093/gbe/evz159 ◽

2019 ◽

Vol 11 (8) ◽

pp. 2306-2311

Author(s):

Juliane Hartke ◽

Tilman Schell ◽

Evelien Jongepier ◽

Hanno Schmidt ◽

Philipp P Sprenger ◽

...

Keyword(s):

Communication System ◽

De Novo ◽

Draft Genome ◽

Cuticular Hydrocarbon ◽

Mechanistic Explanation ◽

High Quality ◽

Great Opportunity ◽

Long Reads ◽

Oxford Nanopore ◽

Hybrid Genome

Abstract The success of social insects is largely intertwined with their highly advanced chemical communication system that facilitates recognition and discrimination of species and nest-mates, recruitment, and division of labor. Hydrocarbons, which cover the cuticle of insects, not only serve as waterproofing agents but also constitute a major component of this communication system. Two cryptic Crematogaster species, which share their nest with Camponotus ants, show striking diversity in their cuticular hydrocarbon (CHC) profile. This mutualistic system therefore offers a great opportunity to study the genetic basis of CHC divergence between sister species. As a basis for further genome-wide studies high-quality genomes are needed. Here, we present the annotated draft genome for Crematogaster levior A. By combining the three most commonly used sequencing techniques—Illumina, PacBio, and Oxford Nanopore—we constructed a high-quality de novo ant genome. We show that even low coverage of long reads can add significantly to overall genome contiguity. Annotation of desaturase and elongase genes, which play a role in CHC biosynthesis revealed one of the largest repertoires in ants and a higher number of desaturases in general than in other Hymenoptera. This may provide a mechanistic explanation for the high diversity observed in C. levior CHC profiles.

Download Full-text

Robust Benchmark Structural Variant Calls of An Asian Using the State-of-Art Long Fragment Sequencing Technologies

10.1101/2020.08.10.245308 ◽

2020 ◽

Author(s):

Xiao Du ◽

Lili Li ◽

Fan Liang ◽

Sanyang Liu ◽

Wenxin Zhang ◽

...

Keyword(s):

B Lymphocyte ◽

De Novo ◽

Structural Variants ◽

High Confidence ◽

False Negatives ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Circular Consensus Sequencing

AbstractThe importance of structural variants (SVs) on phenotypes and human diseases is now recognized. Although a variety of SV detection platforms and strategies that vary in sensitivity and specificity have been developed, few benchmarking procedures are available to confidently assess their performances in biological and clinical research. To facilitate the validation and application of those approaches, our work established an Asian reference material comprising identified benchmark regions and high-confidence SV calls. We established a high-confidence SV callset with 8,938 SVs in an EBV immortalized B lymphocyte line, by integrating four alignment-based SV callers [from 109× PacBio continuous long read (CLR), 22× PacBio circular consensus sequencing (CCS) reads, 104× Oxford Nanopore long reads, and 114× optical mapping platform (Bionano)] and one de novo assembly-based SV caller using CCS reads. A total of 544 randomly selected SVs were validated by PCR and Sanger sequencing, proofing the robustness of our SV calls. Combining trio-binning based haplotype assemblies, we established an SV benchmark for identification of false negatives and false positives by constructing the continuous high confident regions (CHCRs), which cover 1.46Gb and 6,882 SVs supported by at least one diploid haplotype assembly. Establishing high-confidence SV calls for a benchmark sample that has been characterized by multiple technologies provides a valuable resource for investigating SVs in human biology, disease, and clinical diagnosis.

Download Full-text