Identifying the ‘unidentified’ fungi: a global-scale long-read third-generation sequencing approach

Abstract DNA sequencing was dominated by Sanger’s chain termination method until the mid-2000s, when it was progressively supplanted by new sequencing technologies that can generate much larger quantities of data in a shorter time. At the forefront of these developments, long-read sequencing technologies (third-generation sequencing) can produce reads that are several kilobases in length. This greatly improves the accuracy of genome assemblies by spanning the highly repetitive segments that cause difficulty for second-generation short-read technologies. Third-generation sequencing is especially appealing for plant genomes, which can be extremely large with long stretches of highly repetitive DNA. Until recently, the low basecalling accuracy of third-generation technologies meant that accurate genome assembly required expensive, high-coverage sequencing followed by computational analysis to correct for errors. However, today’s long-read technologies are more accurate and less expensive, making them the method of choice for the assembly of complex genomes. Oxford Nanopore Technologies (ONT), a third-generation platform for the sequencing of native DNA strands, is particularly suitable for the generation of high-quality assemblies of highly repetitive plant genomes. Here we discuss the benefits of ONT, especially for the plant science community, and describe the issues that remain to be addressed when using ONT for plant genome sequencing.

Download Full-text

LongQC: A Quality Control Tool for Third Generation Sequencing Long Read Data

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400864 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1193-1196

Author(s):

Yoshinori Fukasawa ◽

Luca Ermini ◽

Hai Wang ◽

Karen Carty ◽

Min-Sin Cheung

Keyword(s):

Quality Control ◽

Third Generation ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Quality Control Tool ◽

Long Read ◽

Automated Quality Control ◽

Oxford Nanopore Technologies ◽

Generation Sequencing ◽

Control Tool

We propose LongQC as an easy and automated quality control tool for genomic datasets generated by third generation sequencing (TGS) technologies such as Oxford Nanopore technologies (ONT) and SMRT sequencing from Pacific Bioscience (PacBio). Key statistics were optimized for long read data, and LongQC covers all major TGS platforms. LongQC processes and visualizes those statistics automatically and quickly.

Download Full-text

Third-Generation Sequencing: The Spearhead towards the Radical Transformation of Modern Genomics

Life ◽

10.3390/life12010030 ◽

2021 ◽

Vol 12 (1) ◽

pp. 30

Author(s):

Konstantina Athanasopoulou ◽

Michaela A. Boti ◽

Panagiotis G. Adamopoulos ◽

Paraskevi C. Skourou ◽

Andreas Scorilas

Keyword(s):

De Novo ◽

Direct Detection ◽

Transcriptional Profiling ◽

Third Generation ◽

De Novo Genome Assembly ◽

Rna Molecules ◽

Third Generation Sequencing ◽

Long Reads ◽

Long Read ◽

Generation Sequencing

Although next-generation sequencing (NGS) technology revolutionized sequencing, offering a tremendous sequencing capacity with groundbreaking depth and accuracy, it continues to demonstrate serious limitations. In the early 2010s, the introduction of a novel set of sequencing methodologies, presented by two platforms, Pacific Biosciences (PacBio) and Oxford Nanopore Sequencing (ONT), gave birth to third-generation sequencing (TGS). The innovative long-read technologies turn genome sequencing into an ease-of-handle procedure by greatly reducing the average time of library construction workflows and simplifying the process of de novo genome assembly due to the generation of long reads. Long sequencing reads produced by both TGS methodologies have already facilitated the decipherment of transcriptional profiling since they enable the identification of full-length transcripts without the need for assembly or the use of sophisticated bioinformatics tools. Long-read technologies have also provided new insights into the field of epitranscriptomics, by allowing the direct detection of RNA modifications on native RNA molecules. This review highlights the advantageous features of the newly introduced TGS technologies, discusses their limitations and provides an in-depth comparison regarding their scientific background and available protocols as well as their potential utility in research and clinical applications.

Download Full-text

Evaluating approaches to find exon chains based on long reads

10.1101/066241 ◽

2016 ◽

Author(s):

Anna Kuosmanen ◽

Veli Mäkinen

Keyword(s):

Second Generation ◽

Simulated Data ◽

Error Rates ◽

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Long Read ◽

Second Generation Sequencing ◽

Generation Sequencing

AbstractMotivationTranscript prediction can be modelled as a graph problem where exons are modelled as nodes and reads spanning two or more exons are modelled as exon chains. PacBio third-generation sequencing technology produces significantly longer reads than earlier second-generation sequencing technologies, which gives valuable information about longer exon chains in a graph. However, with the high error rates of third-generation sequencing, aligning long reads correctly around the splice sites is a challenging task. Incorrect alignments lead to spurious nodes and arcs in the graph, which in turn lead to incorrect transcript predictions.ResultsWe survey several approaches to find the exon chains corresponding to long reads in a splicing graph, and experimentally study the performance of these methods using simulated data to allow for sensitivity / precision analysis. Our experiments show that short reads from second-generation sequencing can be used to significantly improve exon chain correctness either by error-correcting the long reads before splicing graph creation, or by using them to create a splicing graph on which the long read alignments are then projected. We also study the memory and time consumption of various modules, and show that accurate exon chains lead to significantly increased transcript prediction accuracy.AvailabilityThe simulated data and in-house scripts used for this article are available at http://cs.helsinki.fi/u/aekuosma/exon_chain_evaluation_publish.tar.gz.

Download Full-text

HASLR: Fast Hybrid Assembly of Long Reads

10.1101/2020.01.27.921817 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ehsan Haghshenas ◽

Hossein Asghari ◽

Jens Stoye ◽

Cedric Chauve ◽

Faraz Hach

Keyword(s):

Unmet Need ◽

Effective Length ◽

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

The One ◽

Generation Sequencing

AbstractThird generation sequencing technologies from platforms such as Oxford Nanopore Technologies and Pacific Biosciences have paved the way for building more contiguous assemblies and complete reconstruction of genomes. The larger effective length of the reads generated with these technologies has provided a mean to overcome the challenges of short to mid-range repeats. Currently, accurate long read assemblers are computationally expensive while faster methods are not as accurate. Therefore, there is still an unmet need for tools that are both fast and accurate for reconstructing small and large genomes. Despite the recent advances in third generation sequencing, researchers tend to generate second generation reads for many of the analysis tasks. Here, we present HASLR, a hybrid assembler which uses both second and third generation sequencing reads to efficiently generate accurate genome assemblies. Our experiments show that HASLR is not only the fastest assembler but also the one with the lowest number of misassemblies on all the samples compared to other tested assemblers. Furthermore, the generated assemblies in terms of contiguity and accuracy are on par with the other tools on most of the samples.AvailabilityHASLR is an open source tool available at https://github.com/vpc-ccg/haslr.

Download Full-text

Long Read Error Correction Algorithm Based on the de Bruijn Graph for the Third-generation Sequencing

10.1109/icicsp54369.2021.9611869 ◽

2021 ◽

Author(s):

Bin Hou ◽

Rongshu Wang ◽

Jianhua Chen

Keyword(s):

De Bruijn Graph ◽

Third Generation ◽

Correction Algorithm ◽

The Third ◽

Third Generation Sequencing ◽

Long Read ◽

Read Error Correction ◽

De Bruijn ◽

Error Correction Algorithm ◽

Generation Sequencing

Download Full-text

Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms

10.1101/2020.03.16.993428 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nadège Guiglielmoni ◽

Antoine Houtain ◽

Alessandro Derzelle ◽

Karine van Doninck ◽

Jean-François Flot

Keyword(s):

Genome Assembly ◽

Model Organism ◽

Model Organisms ◽

Third Generation ◽

Post Processing ◽

Third Generation Sequencing ◽

Long Reads ◽

Long Read ◽

Generation Sequencing ◽

Chromosome Level

Third-generation sequencing, also called long-read sequencing, is revolutionizing genome assembly: as PacBio and Nanopore technologies become more accessible in technicity and in cost, long-read assemblers flourish and are starting to deliver chromosome-level assemblies. However, these long reads are also error-prone, making the generation of a haploid reference out of a diploid genome a difficult enterprise. Although failure to properly collapse haplotypes results in fragmented and/or structurally incorrect assemblies and wreaks havoc on orthology inference pipelines, this serious issue is rarely acknowledged and dealt with in genomic projects, and an independent, comparative benchmark of the capacity of assemblers and post-processing tools to properly collapse or purge haplotypes is still lacking. To fill this gap, we tested different assembly strategies on the genome of the rotifer Adineta vaga, a non-model organism for which high coverages of both PacBio and Nanopore reads were available. The assemblers we tested (Canu, Flye, NextDenovo, Ra, Raven, Shasta and wtdbg2) exhibited strikingly different behaviors when dealing with highly heterozygous regions, resulting in variable amounts of uncollapsed haplotypes. Filtering out shorter reads generally improved haploid assemblies, and we also benchmarked three post-processing tools aimed at detecting and purging uncollapsed haplotypes in long-read assemblies: HaploMerger2, purge_haplotigs and purge_dups. Testing these strategies separately and in combination revealed several approaches able to generate haploid assemblies with genome sizes, coverage distributions, and completeness close to expectations.

Download Full-text

BUILDING CATALOGUE OF LIFE: ULTRAHIGH THROUGHPUT DNA BARCODING USING THIRD GENERATION SEQUENCING

MOLECULAR PHYLOGENETICS ◽

10.30826/molphy2018-05 ◽

2018 ◽

Author(s):

P.D.N. HEBERT ◽

◽

T.W.A. BRAUKMANN ◽

S.W.J. PROSSER ◽

S. RATNASINGHAM ◽

...

Keyword(s):

Dna Barcoding ◽

Third Generation ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

IsoDetect: Detection of splice isoforms from third generation long reads based on short feature sequences

Current Bioinformatics ◽

10.2174/1574893615666200316101205 ◽

2020 ◽

Vol 15 ◽

Author(s):

Hongdong Li ◽

Wenjing Zhang ◽

Yuwen Luo ◽

Jianxin Wang

Keyword(s):

Sequence Similarity ◽

Detection Methods ◽

Sequence Information ◽

Third Generation ◽

Sequencing Data ◽

Splice Isoforms ◽

Third Generation Sequencing ◽

Long Reads ◽

Feature Sequence ◽

Generation Sequencing

Aims: Accurately detect isoforms from third generation sequencing data. Background: Transcriptome annotation is the basis for the analysis of gene expression and regulation. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of known isoforms. Objective: Develop an efficient method for isoform detection. Method: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junction is extracted from annotated isoforms as the “short feature sequence”, which is used to distinguish different splice isoforms. Second, we aligned these feature sequences to long reads and divided long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Result: Tested on two datasets from Calypte Anna and Zebra Finch, IsoDetect showed higher speed and compelling accuracy compared with four existing methods. Conclusion: IsoDetect is a promising method for isoform detection. Other: This paper was accepted by the CBC2019 conference.

Download Full-text