Moving Towards Third-Generation Sequencing Technologies

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

Scientific Reports ◽

10.1038/srep31900 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 133

Author(s):

Chengxi Ye ◽

Christopher M. Hill ◽

Shigang Wu ◽

Jue Ruan ◽

Zhanshan (Sam) Ma

Keyword(s):

Third Generation ◽

The Third ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Generation Sequencing ◽

Large Genomes

Download Full-text

RNA Transcriptome Mapping with GraphMap

10.1101/160085 ◽

2017 ◽

Cited By ~ 1

Author(s):

Krešimir Križanović ◽

Ivan Sović ◽

Ivan Krpelnik ◽

Mile Šikić

Keyword(s):

Third Generation ◽

Sequencing Data ◽

Mapping Algorithm ◽

Gene Annotations ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Rna Mapping ◽

Synthetic Datasets ◽

Generation Sequencing

AbstractNext generation sequencing technologies have made RNA sequencing widely accessible and applicable in many areas of research. In recent years, 3rd generation sequencing technologies have matured and are slowly replacing NGS for DNA sequencing. This paper presents a novel tool for RNA mapping guided by gene annotations. The tool is an adapted version of a previously developed DNA mapper – GraphMap, tailored for third generation sequencing data, such as those produced by Pacific Biosciences or Oxford Nanopore Technologies devices. It uses gene annotations to generate a transcriptome, uses a DNA mapping algorithm to map reads to the transcriptome, and finally transforms the mappings back to genome coordinates. Modified version of GraphMap is compared on several synthetic datasets to the state-of-the-art RNAseq mappers enabled to work with third generation sequencing data. The results show that our tool outperforms other tools in general mapping quality.

Download Full-text

Oxford Nanopore sequencing: new opportunities for plant genomics?

Journal of Experimental Botany ◽

10.1093/jxb/eraa263 ◽

2020 ◽

Vol 71 (18) ◽

pp. 5313-5322 ◽

Cited By ~ 2

Author(s):

Kathryn Dumschott ◽

Maximilian H-W Schmidt ◽

Harmeet Singh Chawla ◽

Rod Snowdon ◽

Björn Usadel

Keyword(s):

Plant Genome ◽

Third Generation ◽

Plant Genomics ◽

High Coverage ◽

Plant Genomes ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Long Read ◽

Generation Sequencing

Abstract DNA sequencing was dominated by Sanger’s chain termination method until the mid-2000s, when it was progressively supplanted by new sequencing technologies that can generate much larger quantities of data in a shorter time. At the forefront of these developments, long-read sequencing technologies (third-generation sequencing) can produce reads that are several kilobases in length. This greatly improves the accuracy of genome assemblies by spanning the highly repetitive segments that cause difficulty for second-generation short-read technologies. Third-generation sequencing is especially appealing for plant genomes, which can be extremely large with long stretches of highly repetitive DNA. Until recently, the low basecalling accuracy of third-generation technologies meant that accurate genome assembly required expensive, high-coverage sequencing followed by computational analysis to correct for errors. However, today’s long-read technologies are more accurate and less expensive, making them the method of choice for the assembly of complex genomes. Oxford Nanopore Technologies (ONT), a third-generation platform for the sequencing of native DNA strands, is particularly suitable for the generation of high-quality assemblies of highly repetitive plant genomes. Here we discuss the benefits of ONT, especially for the plant science community, and describe the issues that remain to be addressed when using ONT for plant genome sequencing.

Download Full-text

Third generation sequencing technologies applied to diagnostic microbiology: benefits and challenges in applications and data analysis

Expert Review of Molecular Diagnostics ◽

10.1080/14737159.2016.1217158 ◽

2016 ◽

Vol 16 (9) ◽

pp. 1011-1023 ◽

Cited By ~ 18

Author(s):

Enrico Lavezzo ◽

Luisa Barzon ◽

Stefano Toppo ◽

Giorgio Palù

Keyword(s):

Data Analysis ◽

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Diagnostic Microbiology ◽

Generation Sequencing

Download Full-text

TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data

GigaScience ◽

10.1093/gigascience/giaa101 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Davide Bolognini ◽

Alberto Magi ◽

Vladimir Benes ◽

Jan O Korbel ◽

Tobias Rausch

Keyword(s):

Tandem Repeat ◽

Error Rates ◽

Sequencing Error ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Third Generation ◽

Sequencing Data ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Generation Sequencing

Abstract Background Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. Results We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. Conclusions TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.

Download Full-text

On the study of microbial transcriptomes using second- and third-generation sequencing technologies

The Journal of Microbiology ◽

10.1007/s12275-016-6233-2 ◽

2016 ◽

Vol 54 (8) ◽

pp. 527-536 ◽

Cited By ~ 5

Author(s):

Sang Chul Choi

Keyword(s):

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

First Insights into the Guar (Cyamopsis tetragonoloba (L.) Taub.) Genome of the ‘Vavilovskij 130’ Accession, Using Second and Third-Generation Sequencing Technologies

Russian Journal of Genetics ◽

10.1134/s102279541911005x ◽

2019 ◽

Vol 55 (11) ◽

pp. 1406-1416 ◽

Cited By ~ 2

Author(s):

E. Grigoreva ◽

P. Ulianich ◽

C. Ben ◽

L. Gentzbittel ◽

E. Potokina

Keyword(s):

Cyamopsis Tetragonoloba ◽

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

The Elusive Mitochondrial Genomes of Apicomplexa: Where Are We Now?

Frontiers in Microbiology ◽

10.3389/fmicb.2021.751775 ◽

2021 ◽

Vol 12 ◽

Author(s):

Luisa Berná ◽

Natalia Rego ◽

María E. Francia

Keyword(s):

Drug Targets ◽

Genome Structure ◽

Cellular Respiration ◽

Mitochondrial Genomes ◽

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Sequencing Studies ◽

Promising Source ◽

Generation Sequencing

Mitochondria are vital organelles of eukaryotic cells, participating in key metabolic pathways such as cellular respiration, thermogenesis, maintenance of cellular redox potential, calcium homeostasis, cell signaling, and cell death. The phylum Apicomplexa is entirely composed of obligate intracellular parasites, causing a plethora of severe diseases in humans, wild and domestic animals. These pathogens include the causative agents of malaria, cryptosporidiosis, neosporosis, East Coast fever and toxoplasmosis, among others. The mitochondria in Apicomplexa has been put forward as a promising source of undiscovered drug targets, and it has been validated as the target of atovaquone, a drug currently used in the clinic to counter malaria. Apicomplexans present a single tubular mitochondria that varies widely both in structure and in genomic content across the phylum. The organelle is characterized by massive gene migrations to the nucleus, sequence rearrangements and drastic functional reductions in some species. Recent third generation sequencing studies have reignited an interest for elucidating the extensive diversity displayed by the mitochondrial genomes of apicomplexans and their intriguing genomic features. The underlying mechanisms of gene transcription and translation are also ill-understood. In this review, we present the state of the art on mitochondrial genome structure, composition and organization in the apicomplexan phylum revisiting topological and biochemical information gathered through classical techniques. We contextualize this in light of the genomic insight gained by second and, more recently, third generation sequencing technologies. We discuss the mitochondrial genomic and mechanistic features found in evolutionarily related alveolates, and discuss the common and distinct origins of the apicomplexan mitochondria peculiarities.

Download Full-text

Evaluating approaches to find exon chains based on long reads

10.1101/066241 ◽

2016 ◽

Author(s):

Anna Kuosmanen ◽

Veli Mäkinen

Keyword(s):

Second Generation ◽

Simulated Data ◽

Error Rates ◽

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Long Read ◽

Second Generation Sequencing ◽

Generation Sequencing

AbstractMotivationTranscript prediction can be modelled as a graph problem where exons are modelled as nodes and reads spanning two or more exons are modelled as exon chains. PacBio third-generation sequencing technology produces significantly longer reads than earlier second-generation sequencing technologies, which gives valuable information about longer exon chains in a graph. However, with the high error rates of third-generation sequencing, aligning long reads correctly around the splice sites is a challenging task. Incorrect alignments lead to spurious nodes and arcs in the graph, which in turn lead to incorrect transcript predictions.ResultsWe survey several approaches to find the exon chains corresponding to long reads in a splicing graph, and experimentally study the performance of these methods using simulated data to allow for sensitivity / precision analysis. Our experiments show that short reads from second-generation sequencing can be used to significantly improve exon chain correctness either by error-correcting the long reads before splicing graph creation, or by using them to create a splicing graph on which the long read alignments are then projected. We also study the memory and time consumption of various modules, and show that accurate exon chains lead to significantly increased transcript prediction accuracy.AvailabilityThe simulated data and in-house scripts used for this article are available at http://cs.helsinki.fi/u/aekuosma/exon_chain_evaluation_publish.tar.gz.

Download Full-text