TOGGLe, a flexible framework for easily building complex workflows and performing robust large-scale NGS analyses

ABSTRACTThe advent of NGS has intensified the need for robust pipelines to perform high-performance automated analyses. The required softwares depend on the sequencing method used to produce raw data (e.g. Whole genome sequencing, Genotyping By Sequencing, RNASeq) as well as the kind of analyses to carry on (GWAS, population structure, differential expression). These tools have to be generic and scalable, and should meet the biologists needs.Here, we present the new version of TOGGLe (Toolbox for Generic NGS Analyses), a simple and highly flexible framework to easily and quickly generate pipelines for large-scale second- and third-generation sequencing analyses, including multi-sample and multi-threading support. TOGGLe is a workflow manager designed to be as effortless as possible to use for biologists, so the focus can remain on the analyses. Pipelines are easily customizable and supported analyses are reproducible and shareable. TOGGLe is designed as a generic, adaptable and fast evolutive solution, and has been tested and used in large-scale projects on various organisms. It is freely available at http://toggle.southgreen.fr/, under the GNU GPLv3/CeCill-C licenses) and can be deployed onto HPC clusters as well as on local machines.

Download Full-text

The optimal standard protocols for whole-genome sequencing of antibiotic-resistant pathogenic bacteria using third-generation sequencing platforms

Molecular & Cellular Toxicology ◽

10.1007/s13273-021-00157-2 ◽

2021 ◽

Author(s):

Tae-Min La ◽

Ji-hoon Kim ◽

Taesoo Kim ◽

Hong-Jae Lee ◽

Yoonsuk Lee ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Pathogenic Bacteria ◽

Whole Genome ◽

Third Generation ◽

Antibiotic Resistant ◽

Third Generation Sequencing ◽

Sequencing Platforms ◽

Generation Sequencing

Download Full-text

SMOOTH-seq: single-cell genome sequencing of human cells on a third-generation sequencing platform

Genome Biology ◽

10.1186/s13059-021-02406-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xiaoying Fan ◽

Cheng Yang ◽

Wen Li ◽

Xiuzhen Bai ◽

Xin Zhou ◽

...

Keyword(s):

Single Cell ◽

Genome Sequencing ◽

Single Molecule ◽

Human Cancer ◽

Whole Genome ◽

Third Generation ◽

Sequencing Platform ◽

Human Cancer Cell Lines ◽

Third Generation Sequencing ◽

Generation Sequencing

AbstractThere is no effective way to detect structure variations (SVs) and extra-chromosomal circular DNAs (ecDNAs) at single-cell whole-genome level. Here, we develop a novel third-generation sequencing platform-based single-cell whole-genome sequencing (scWGS) method named SMOOTH-seq (single-molecule real-time sequencing of long fragments amplified through transposon insertion). We evaluate the method for detecting CNVs, SVs, and SNVs in human cancer cell lines and a colorectal cancer sample and show that SMOOTH-seq reliably and effectively detects SVs and ecDNAs in individual cells, but shows relatively limited accuracy in detection of CNVs and SNVs. SMOOTH-seq opens a new chapter in scWGS as it generates high fidelity reads of kilobases long.

Download Full-text

Genome Survey Sequencing of Betula platyphylla

Forests ◽

10.3390/f10100826 ◽

2019 ◽

Vol 10 (10) ◽

pp. 826 ◽

Cited By ~ 4

Author(s):

Sui Wang ◽

Su Chen ◽

Caixia Liu ◽

Yi Liu ◽

Xiyang Zhao ◽

...

Keyword(s):

Genome Size ◽

Large Scale ◽

Repetitive Sequences ◽

Northern China ◽

Third Generation ◽

Betula Platyphylla ◽

Genome Survey ◽

Third Generation Sequencing ◽

Ssr Loci ◽

Generation Sequencing

Research Highlights: A rigorous genome survey helped us to estimate the genomic characteristics, remove the DNA contamination, and determine the sequencing scheme of Betula platyphylla. Background and Objectives: B. platyphylla is a common tree species in northern China that has high economic and medicinal value. However, there is a lack of complete genomic information for this species, which severely constrains the progress of relevant research. The objective of this study was to survey the genome of B. platyphylla and determine the large-scale sequencing scheme of this species. Materials and Methods: Next-generation sequencing was used to survey the genome. The genome size, heterozygosity rate, and repetitive sequences were estimated by k-mer analysis. After preliminary genome assembly, sequence contamination was identified and filtered by sequence alignment. Finally, we obtained sterilized plantlets of B. platyphylla by plant tissue culture, which can be used for third-generation sequencing. Results: We estimated the genome size to be 432.9 Mb and the heterozygosity rate to be 1.22%, with repetitive sequences accounting for 62.2%. Bacterial contamination was observed in the leaves taken from the field, and most of the contaminants may be from the genus Mycobacterium. A total of 249,784 simple sequence repeat (SSR) loci were also identified in the B. platyphylla genome. Among the SSRs, only 11,326 can be used as candidates to distinguish the three Betula species. Conclusions: The B. platyphylla genome is complex and highly heterozygous and repetitive. Higher-depth third-generation sequencing may yield better assembly results. Sterilized plantlets can be used for sequencing to avoid contamination.

Download Full-text

TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data

GigaScience ◽

10.1093/gigascience/giaa101 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Davide Bolognini ◽

Alberto Magi ◽

Vladimir Benes ◽

Jan O Korbel ◽

Tobias Rausch

Keyword(s):

Tandem Repeat ◽

Error Rates ◽

Sequencing Error ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Third Generation ◽

Sequencing Data ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Generation Sequencing

Abstract Background Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. Results We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. Conclusions TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.

Download Full-text

Genome assembly of Vitis rotundifolia Michx. using third-generation sequencing (Oxford Nanopore Technologies)

PROCEEDINGS ON APPLIED BOTANY GENETICS AND BREEDING ◽

10.30901/2227-8834-2021-2-63-71 ◽

2021 ◽

Vol 182 (2) ◽

pp. 63-71

Author(s):

M. M. Agakhanov ◽

E. A. Grigoreva ◽

E. K. Potokina ◽

P. S. Ulianich ◽

Y. V. Ukhatova

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

De Novo ◽

Whole Genome Sequence ◽

Whole Genome ◽

Third Generation ◽

Vitis Rotundifolia ◽

Third Generation Sequencing ◽

Genome Sequence Assembly ◽

Generation Sequencing

The immune North American grapevine species Vitis rotundifolia Michaux (subgen. Muscadinia Planch.) is regarded as a potential donor of disease resistance genes, withstanding such dangerous diseases of grapes as powdery and downy mildews. The cultivar ‘Dixie’ is the only representative of this species preserved ex situ in Russia: it is maintained by the N.I. Vavilov All-Russian Institute of Plant Genetic Resources (VIR) in the orchards of its branch, Krymsk Experiment Breeding Station. Third-generation sequencing on the MinION platform was performed to obtain information on the primary structure of the cultivar’s genomic DNA, employing also the results of Illumina sequencing available in databases. A detailed description of the technique with modifications at various stages is presented, as it was used for grapevine genome sequencing and whole-genome sequence assembly. The modified technique included the main stages of the original protocol recommended by the MinION producer: 1) DNA extraction; 2) preparation of libraries for sequencing; 3) MinION sequencing and bioinformatic data processing; 4) de novo whole-genome sequence assembly using only MinION data or hybrid assembly (MinION+Illumina data); and 5) functional annotation of the whole-genome assembly. Stage 4 included not only de novo sequencing, but also the analysis of the available bioinformatic data, thus minimizing errors and increasing precision during the assembly of the studied genome. The DNA isolated from the leaves of cv. ‘Dixie’ was sequenced using two MinION flow cells (R9.4.1).

Download Full-text

Whole-Genome Sequencing and Potassium-Solubilizing Mechanism of Bacillus aryabhattai SK1-7

Frontiers in Microbiology ◽

10.3389/fmicb.2021.722379 ◽

2022 ◽

Vol 12 ◽

Author(s):

Yifan Chen ◽

Hui Yang ◽

Zizhu Shen ◽

Jianren Ye

Keyword(s):

High Performance ◽

Fermentation Broth ◽

Culture Conditions ◽

Whole Genome ◽

Expression Levels ◽

Bacillus Aryabhattai ◽

Third Generation Sequencing ◽

Second Generation Sequencing ◽

Sulfuric Acid Method ◽

Generation Sequencing

To analyze the whole genome of Bacillus aryabhattai strain SK1-7 and explore its potassium solubilization characteristics and mechanism, thus providing a theoretical basis for analyzing the utilization and improvement of insoluble potassium resources in soil. Genome information for Bacillus aryabhattai SK1-7 was obtained by using Illumina NovaSeq second-generation sequencing and GridION Nanopore ONT third-generation sequencing technology. The contents of organic acids and polysaccharides in fermentation broth of Bacillus aryabhattai SK1-7 were determined by high-performance liquid chromatography and the anthrone sulfuric acid method, and the expression levels of the potassium solubilization-related genes ackA, epsB, gltA, mdh and ppc were compared by real-time fluorescence quantitative PCR under different potassium source culture conditions. The whole genome of the strain consisted of a complete chromosome sequence and four plasmid sequences. The sequence sizes of the chromosomes and plasmids P1, P2, P3 and P4 were 5,188,391 bp, 136,204 bp, 124,862 bp, 67,200 bp and 12,374 bp, respectively. The GC contents were 38.2, 34.4, 33.6, 32.8, and 33.7%. Strain SK1-7 mainly secreted malic, formic, acetic and citric acids under culture with an insoluble potassium source. The polysaccharide content produced with an insoluble potassium source was higher than that with a soluble potassium source. The expression levels of five potassium solubilization-related genes with the insoluble potassium source were higher than those with the soluble potassium source.

Download Full-text

BUILDING CATALOGUE OF LIFE: ULTRAHIGH THROUGHPUT DNA BARCODING USING THIRD GENERATION SEQUENCING

MOLECULAR PHYLOGENETICS ◽

10.30826/molphy2018-05 ◽

2018 ◽

Author(s):

P.D.N. HEBERT ◽

◽

T.W.A. BRAUKMANN ◽

S.W.J. PROSSER ◽

S. RATNASINGHAM ◽

...

Keyword(s):

Dna Barcoding ◽

Third Generation ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

IsoDetect: Detection of splice isoforms from third generation long reads based on short feature sequences

Current Bioinformatics ◽

10.2174/1574893615666200316101205 ◽

2020 ◽

Vol 15 ◽

Author(s):

Hongdong Li ◽

Wenjing Zhang ◽

Yuwen Luo ◽

Jianxin Wang

Keyword(s):

Sequence Similarity ◽

Detection Methods ◽

Sequence Information ◽

Third Generation ◽

Sequencing Data ◽

Splice Isoforms ◽

Third Generation Sequencing ◽

Long Reads ◽

Feature Sequence ◽

Generation Sequencing

Aims: Accurately detect isoforms from third generation sequencing data. Background: Transcriptome annotation is the basis for the analysis of gene expression and regulation. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of known isoforms. Objective: Develop an efficient method for isoform detection. Method: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junction is extracted from annotated isoforms as the “short feature sequence”, which is used to distinguish different splice isoforms. Second, we aligned these feature sequences to long reads and divided long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Result: Tested on two datasets from Calypte Anna and Zebra Finch, IsoDetect showed higher speed and compelling accuracy compared with four existing methods. Conclusion: IsoDetect is a promising method for isoform detection. Other: This paper was accepted by the CBC2019 conference.

Download Full-text

Comparative and comprehensive analysis on bacterial communities of two full-scale wastewater treatment plants by second and third-generation sequencing

Bioresource Technology Reports ◽

10.1016/j.biteb.2020.100450 ◽

2020 ◽

Vol 11 ◽

pp. 100450

Author(s):

Bin Ji ◽

Shulian Wang ◽

Dabin Guo ◽

Heliang Pang

Keyword(s):

Wastewater Treatment ◽

Bacterial Communities ◽

Wastewater Treatment Plants ◽

Comprehensive Analysis ◽

Full Scale ◽

Third Generation ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm

Bioinformatics ◽

10.1093/bioinformatics/btaa179 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3669-3679 ◽

Cited By ~ 3

Author(s):

Can Firtina ◽

Jeremie S Kim ◽

Mohammed Alser ◽

Damla Senol Cali ◽

A Ercument Cicek ◽

...

Keyword(s):

Genome Analysis ◽

Supplementary Information ◽

Third Generation ◽

Sequencing Technology ◽

Base Pairs ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing ◽

Large Genomes

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text