BUILDING CATALOGUE OF LIFE: ULTRAHIGH THROUGHPUT DNA BARCODING USING THIRD GENERATION SEQUENCING

Aims: Accurately detect isoforms from third generation sequencing data. Background: Transcriptome annotation is the basis for the analysis of gene expression and regulation. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of known isoforms. Objective: Develop an efficient method for isoform detection. Method: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junction is extracted from annotated isoforms as the “short feature sequence”, which is used to distinguish different splice isoforms. Second, we aligned these feature sequences to long reads and divided long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Result: Tested on two datasets from Calypte Anna and Zebra Finch, IsoDetect showed higher speed and compelling accuracy compared with four existing methods. Conclusion: IsoDetect is a promising method for isoform detection. Other: This paper was accepted by the CBC2019 conference.

Download Full-text

Comparative and comprehensive analysis on bacterial communities of two full-scale wastewater treatment plants by second and third-generation sequencing

Bioresource Technology Reports ◽

10.1016/j.biteb.2020.100450 ◽

2020 ◽

Vol 11 ◽

pp. 100450

Author(s):

Bin Ji ◽

Shulian Wang ◽

Dabin Guo ◽

Heliang Pang

Keyword(s):

Wastewater Treatment ◽

Bacterial Communities ◽

Wastewater Treatment Plants ◽

Comprehensive Analysis ◽

Full Scale ◽

Third Generation ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm

Bioinformatics ◽

10.1093/bioinformatics/btaa179 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3669-3679 ◽

Cited By ~ 3

Author(s):

Can Firtina ◽

Jeremie S Kim ◽

Mohammed Alser ◽

Damla Senol Cali ◽

A Ercument Cicek ◽

...

Keyword(s):

Genome Analysis ◽

Supplementary Information ◽

Third Generation ◽

Sequencing Technology ◽

Base Pairs ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing ◽

Large Genomes

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Micropathogen community identification in ticks (Acari: Ixodidae) using third-generation sequencing

International Journal for Parasitology Parasites and Wildlife ◽

10.1016/j.ijppaw.2021.06.003 ◽

2021 ◽

Author(s):

Jin Luo ◽

Qiaoyun Ren ◽

Wenge Liu ◽

Xiangrui Li ◽

Hong Yin ◽

...

Keyword(s):

Third Generation ◽

Third Generation Sequencing ◽

Community Identification ◽

Generation Sequencing

Download Full-text

Workflow for generating HMW plant DNA for third generation sequencing with high N50 and high accuracy v1 (protocols.io.bafmibk6)

protocols.io ◽

10.17504/protocols.io.bafmibk6 ◽

2019 ◽

Author(s):

Patrick Driguez ◽

Karen Carty ◽

Alexander Putra ◽

Luca Ermini

Keyword(s):

High Accuracy ◽

Third Generation ◽

Plant Dna ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

Rapid virulence prediction and identification of Newcastle disease virus genotypes using third-generation sequencing

Virology Journal ◽

10.1186/s12985-018-1077-5 ◽

2018 ◽

Vol 15 (1) ◽

Cited By ~ 10

Author(s):

Salman L. Butt ◽

Tonya L. Taylor ◽

Jeremy D. Volkening ◽

Kiril M. Dimitrov ◽

Dawn Williams-Coplin ◽

...

Keyword(s):

Newcastle Disease Virus ◽

Disease Virus ◽

Newcastle Disease ◽

Third Generation ◽

Third Generation Sequencing ◽

Virus Genotypes ◽

Generation Sequencing

Download Full-text

Analysis of prospective microbiology research using third-generation sequencing technology

Biodiversity Science ◽

10.17520/biods.2018201 ◽

2019 ◽

Vol 27 (5) ◽

pp. 534-542

Author(s):

Xu Yakun ◽

◽

Ma Yue ◽

Hu Xiaoxi ◽

Wang Jun ◽

...

Keyword(s):

Third Generation ◽

Sequencing Technology ◽

Generation Sequencing Technology ◽

Third Generation Sequencing ◽

Microbiology Research ◽

Generation Sequencing

Download Full-text

Third-generation sequencing revises the molecular karyotype for Toxoplasma gondii and identifies emerging copy number variants in sexual recombinants

Genome Research ◽

10.1101/gr.262816.120 ◽

2021 ◽

Vol 31 (5) ◽

pp. 834-851

Author(s):

Jing Xia ◽

Aarthi Venkat ◽

Rachel E. Bainbridge ◽

Michael L. Reese ◽

Karine G. Le Roch ◽

...

Keyword(s):

Toxoplasma Gondii ◽

Copy Number ◽

Copy Number Variants ◽

Third Generation ◽

Third Generation Sequencing ◽

Molecular Karyotype ◽

Generation Sequencing

Download Full-text

Third-generation Sequencing Reveals Extensive Polycistronism and Transcriptional Overlapping in a Baculovirus

Scientific Reports ◽

10.1038/s41598-018-26955-8 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 27

Author(s):

Norbert Moldován ◽

Dóra Tombácz ◽

Attila Szűcs ◽

Zsolt Csabai ◽

Zsolt Balázs ◽

...

Keyword(s):

Third Generation ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

Quality of Third Generation Sequencing

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9630 ◽

2020 ◽

Vol 17 (12) ◽

pp. 5205-5209

Author(s):

Ali Elbialy ◽

M. A. El-Dosuky ◽

Ibrahim M. El-Henawy

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Gc Content ◽

Error Rates ◽

Third Generation ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing

Third generation sequencing (TGS) relates to long reads but with relatively high error rates. Quality of TGS is a hot topic, dealing with errors. This paper combines and investigates three quality related metrics. They are basecalling accuracy, Phred Quality Scores, and GC content. For basecalling accuracy, a deep neural network is adopted. The measured loss does not exceed 5.42.

Download Full-text