Global analysis insights into coupling relationship between lignan and lignin metabolism of flax

Abstract Background: Flax ( Linum usitatissimum L.) is one of the most important economic crops in the world. The lignin content of flax stems directly determines the quality of flax fibers. Flax seeds contain lignans of highly beneficial for human health.Results: To elucidate the metabolic relationship between these compounds and the regulatory nodes of their metabolic processes, third generation (PacBio Iso-Seq) and second generation (Illumina) sequencing technologies were used to sequence the transcriptomes of a pair of flax cultivars with significant differences in lignan content. It was discovered that the differential expressed genes (DEGs) are significantly enriched in the lignin and lignan biosynthesis pathways. Furthermore, there are seven genes with significant differences in expression that were annotated as UDP-glucosyl transferases ( UGTs ). We found that lignan and lignin content is significantly negatively correlated with each other. SEM observations on flax bast fibers provided further evidence of this relationship.Conclusions: This is the first full-length transcriptome analysis on flax plants using third-generation sequencing technologies, and it is also the first study to observe a negative correlation between lignin and lignan content of flax plants. Furthermore, it was found that UGTs are likely to be regulatory node genes for lignan and lignin metabolism.

Download Full-text

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm

Bioinformatics ◽

10.1093/bioinformatics/btaa179 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3669-3679 ◽

Cited By ~ 3

Author(s):

Can Firtina ◽

Jeremie S Kim ◽

Mohammed Alser ◽

Damla Senol Cali ◽

A Ercument Cicek ◽

...

Keyword(s):

Genome Analysis ◽

Supplementary Information ◽

Third Generation ◽

Sequencing Technology ◽

Base Pairs ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing ◽

Large Genomes

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Quality of Third Generation Sequencing

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9630 ◽

2020 ◽

Vol 17 (12) ◽

pp. 5205-5209

Author(s):

Ali Elbialy ◽

M. A. El-Dosuky ◽

Ibrahim M. El-Henawy

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Gc Content ◽

Error Rates ◽

Third Generation ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing

Third generation sequencing (TGS) relates to long reads but with relatively high error rates. Quality of TGS is a hot topic, dealing with errors. This paper combines and investigates three quality related metrics. They are basecalling accuracy, Phred Quality Scores, and GC content. For basecalling accuracy, a deep neural network is adopted. The measured loss does not exceed 5.42.

Download Full-text

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads

Genes ◽

10.3390/genes10010044 ◽

2019 ◽

Vol 10 (1) ◽

pp. 44 ◽

Cited By ~ 1

Author(s):

Wenjing Zhang ◽

Neng Huang ◽

Jiantao Zheng ◽

Xingyu Liao ◽

Jianxin Wang ◽

...

Keyword(s):

Quality Evaluation ◽

Training Data ◽

Third Generation ◽

Contig Assembly ◽

High Quality ◽

Promising Alternative ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing

The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms.

Download Full-text

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

Scientific Reports ◽

10.1038/srep31900 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 133

Author(s):

Chengxi Ye ◽

Christopher M. Hill ◽

Shigang Wu ◽

Jue Ruan ◽

Zhanshan (Sam) Ma

Keyword(s):

Third Generation ◽

The Third ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Generation Sequencing ◽

Large Genomes

Download Full-text

Influence of hemicelluloses and lignin content on structure and sorption properties of flax fibers (Linum usitatissimum L.)

Cellulose ◽

10.1007/s10570-017-1575-4 ◽

2017 ◽

Vol 25 (1) ◽

pp. 697-709 ◽

Cited By ~ 11

Author(s):

Biljana D. Lazić ◽

Biljana M. Pejić ◽

Ana D. Kramar ◽

Marija M. Vukčević ◽

Katarina R. Mihajlovski ◽

...

Keyword(s):

Linum Usitatissimum ◽

Lignin Content ◽

Sorption Properties ◽

Flax Fibers ◽

Linum Usitatissimum L ◽

Structure And Sorption Properties

Download Full-text

Moving Towards Third-Generation Sequencing Technologies

Tag-Based Next Generation Sequencing ◽

10.1002/9783527644582.ch20 ◽

2012 ◽

pp. 323-336 ◽

Cited By ~ 1

Author(s):

Karolina Janitz ◽

Michal Janitz

Keyword(s):

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

RNA Transcriptome Mapping with GraphMap

10.1101/160085 ◽

2017 ◽

Cited By ~ 1

Author(s):

Krešimir Križanović ◽

Ivan Sović ◽

Ivan Krpelnik ◽

Mile Šikić

Keyword(s):

Third Generation ◽

Sequencing Data ◽

Mapping Algorithm ◽

Gene Annotations ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Rna Mapping ◽

Synthetic Datasets ◽

Generation Sequencing

AbstractNext generation sequencing technologies have made RNA sequencing widely accessible and applicable in many areas of research. In recent years, 3rd generation sequencing technologies have matured and are slowly replacing NGS for DNA sequencing. This paper presents a novel tool for RNA mapping guided by gene annotations. The tool is an adapted version of a previously developed DNA mapper – GraphMap, tailored for third generation sequencing data, such as those produced by Pacific Biosciences or Oxford Nanopore Technologies devices. It uses gene annotations to generate a transcriptome, uses a DNA mapping algorithm to map reads to the transcriptome, and finally transforms the mappings back to genome coordinates. Modified version of GraphMap is compared on several synthetic datasets to the state-of-the-art RNAseq mappers enabled to work with third generation sequencing data. The results show that our tool outperforms other tools in general mapping quality.

Download Full-text

Oxford Nanopore sequencing: new opportunities for plant genomics?

Journal of Experimental Botany ◽

10.1093/jxb/eraa263 ◽

2020 ◽

Vol 71 (18) ◽

pp. 5313-5322 ◽

Cited By ~ 2

Author(s):

Kathryn Dumschott ◽

Maximilian H-W Schmidt ◽

Harmeet Singh Chawla ◽

Rod Snowdon ◽

Björn Usadel

Keyword(s):

Plant Genome ◽

Third Generation ◽

Plant Genomics ◽

High Coverage ◽

Plant Genomes ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Long Read ◽

Generation Sequencing

Abstract DNA sequencing was dominated by Sanger’s chain termination method until the mid-2000s, when it was progressively supplanted by new sequencing technologies that can generate much larger quantities of data in a shorter time. At the forefront of these developments, long-read sequencing technologies (third-generation sequencing) can produce reads that are several kilobases in length. This greatly improves the accuracy of genome assemblies by spanning the highly repetitive segments that cause difficulty for second-generation short-read technologies. Third-generation sequencing is especially appealing for plant genomes, which can be extremely large with long stretches of highly repetitive DNA. Until recently, the low basecalling accuracy of third-generation technologies meant that accurate genome assembly required expensive, high-coverage sequencing followed by computational analysis to correct for errors. However, today’s long-read technologies are more accurate and less expensive, making them the method of choice for the assembly of complex genomes. Oxford Nanopore Technologies (ONT), a third-generation platform for the sequencing of native DNA strands, is particularly suitable for the generation of high-quality assemblies of highly repetitive plant genomes. Here we discuss the benefits of ONT, especially for the plant science community, and describe the issues that remain to be addressed when using ONT for plant genome sequencing.

Download Full-text

Third generation sequencing technologies applied to diagnostic microbiology: benefits and challenges in applications and data analysis

Expert Review of Molecular Diagnostics ◽

10.1080/14737159.2016.1217158 ◽

2016 ◽

Vol 16 (9) ◽

pp. 1011-1023 ◽

Cited By ~ 18

Author(s):

Enrico Lavezzo ◽

Luisa Barzon ◽

Stefano Toppo ◽

Giorgio Palù

Keyword(s):

Data Analysis ◽

Third Generation ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Diagnostic Microbiology ◽

Generation Sequencing

Download Full-text

TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data

GigaScience ◽

10.1093/gigascience/giaa101 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Davide Bolognini ◽

Alberto Magi ◽

Vladimir Benes ◽

Jan O Korbel ◽

Tobias Rausch

Keyword(s):

Tandem Repeat ◽

Error Rates ◽

Sequencing Error ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Third Generation ◽

Sequencing Data ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Generation Sequencing

Abstract Background Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. Results We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. Conclusions TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.

Download Full-text