scholarly journals A high resolution single molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis

2021 ◽  
Author(s):  
Runxuan Zhang ◽  
Richard Kuo ◽  
Max Coulter ◽  
Cristiane P.G. Calixto ◽  
Juan Carlos Entizne ◽  
...  

Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single molecule long read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 160k transcripts - twice that of the best current Arabidopsis transcriptome and including over 1,500 novel genes. 79% of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We developed novel methods to determine splice junctions and transcription start and end sites accurately. Mis-match profiles around splice junctions provided a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identified high confidence transcription start/end sites and removed fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provided higher resolution of transcript expression profiling and identified cold- and light-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently available. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single molecule sequencing analysis from any species.

2017 ◽  
Author(s):  
Christopher Wilks ◽  
Phani Gaddipati ◽  
Abhinav Nellore ◽  
Ben Langmead

AbstractAs more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70,000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can also rank and score junctions according to tissue specificity or other criteria. Further, Snaptron can rank and score samples according to the relative frequency of different splicing patterns. We outline biological questions that can be explored with Snaptron queries, including a study of novel exons in annotated genes, of exonization of repetitive element loci, and of a recently discovered alternative transcription start site for the ALK gene. Web app and documentation are at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron under the MIT license.


2012 ◽  
Vol 7 (3) ◽  
pp. 562-578 ◽  
Author(s):  
Cole Trapnell ◽  
Adam Roberts ◽  
Loyal Goff ◽  
Geo Pertea ◽  
Daehwan Kim ◽  
...  

2020 ◽  
Author(s):  
Sudeep Mehrotra ◽  
Revital Bronstein ◽  
Daniel Navarro-Gomez ◽  
Ayellet V. Segrè ◽  
Eric A. Pierce

AbstractHigh-throughput transcriptome sequencing has become a powerful tool in the study of human diseases. Identification of causal mechanisms may entail analysis of differential gene expression (DGE), differential transcript/isoform expression (DTE) and identification, classification and quantification of alternative splicing (AS) and/or detection of novel AS events. For such a global transcriptome profiling execution of multi-level data analysis methodologies is required. Each level presents its own unique challenges and the questions about their performance remains. In this work we present results from systematic and consistent assessing and comparing a number of widely used methods for detecting DGE, DTE and AS using internal control “spike-in” sequences (Sequins) in RNA-seq data. We demonstrated that inclusion of internal controls in RNA-seq experiments allows accurate determination of lower bounds detection levels, and better assessment of DGE, DTE and AS accuracy and sensitivity. Tools for RNA-seq read alignment and detection of DGE performed reasonably. More efforts are needed to improve specificity and sensitivity of DTE and AS detection. Low expression of isoforms accompanied with sequencing depth does impact sensitivity and specificity of DTE and AS tools.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jingli Yang ◽  
Wanqiu Lv ◽  
Liying Shao ◽  
Yanrui Fu ◽  
Haimei Liu ◽  
...  

In eukaryotes, alternative splicing (AS) is a crucial regulatory mechanism that modulates mRNA diversity and stability. The contribution of AS to stress is known in many species related to stress, but the posttranscriptional mechanism in poplar under cold stress is still unclear. Recent studies have utilized the advantages of single molecular real-time (SMRT) sequencing technology from Pacific Bioscience (PacBio) to identify full-length transcripts. We, therefore, used a combination of single-molecule long-read sequencing and Illumina RNA sequencing (RNA-Seq) for a global analysis of AS in two poplar species (Populus trichocarpa and P. ussuriensis) under cold stress. We further identified 1,261 AS events in P. trichocarpa and 2,101 in P. ussuriensis among which intron retention, with a frequency of more than 30%, was the most prominent type under cold stress. RNA-Seq data analysis and annotation revealed the importance of calcium, abscisic acid, and reactive oxygen species signaling in cold stress response. Besides, the low temperature rapidly induced multiple splicing factors, transcription factors, and differentially expressed genes through AS. In P. ussuriensis, there was a rapid occurrence of AS events, which provided a new insight into the complexity and regulation of AS during cold stress response in different poplar species for the first time.


2016 ◽  
Author(s):  
Hui Y. Xiong ◽  
Leo J. Lee ◽  
Hannes Bretschneider ◽  
Jiexin Gao ◽  
Nebojsa Jojic ◽  
...  

AbstractWhen estimating expression of a transcript or part of a transcript using RNA-seq data, it is commonly assumed that reads are generated uniformly from positions within the transcript. While this assumption is acceptable for long transcript sequences where reads from many positions are averaged, it frequently leads to large errors for short sequences, e.g., less than 100 bp. Analysis of short sequences, such as when studying splice junctions and microRNAs, is increasingly important and necessitates addressing errors in short-sequence expression estimation. Indeed, when we examined RNA-seq data from diverse studies, we found that large errors are introduced by variations in RNA-seq coverage due to sequence content, experimental conditions and sample preparation.We developed a technique that we call the positional bootstrap, which quantifies the level of uncertainty in expression induced by non-uniform coverage. Unlike methods that attempt to correct for biases in coverage, but do so by making strong assumptions about the form of those biases, the positional bootstrap can quantify the noise induced by all types of bias, including unknown ones. Results obtained using independently generated RNA-seq datasets show that the positional bootstrap increases the accuracy of estimates of alternative splicing levels, tissue-differential alternative splicing and tissue differential expression, by a factor of up to 10.A Python implementation of the algorithm to quantify splicing levels is freely available from github.com/PSI-Lab/BENTO-Seq.


2015 ◽  
Author(s):  
Stefan Canzar ◽  
Sandro Andreotti ◽  
David Weese ◽  
Knut Reinert ◽  
Gunnar W. Klau

We present CIDANE, a novel framework for genome-based transcript reconstruction and quantification from RNA-seq reads. CIDANE assembles transcripts with significantly higher sensitivity and precision than existing tools, while competing in speed with the fastest methods. In addition to reconstructing transcripts ab initio, the algorithm also allows to make use of the growing annotation of known splice sites, transcription start and end sites, or full-length transcripts, which are available for most model organisms. CIDANE supports the integrated analysis of RNA-seq and additional gene-boundary data and recovers splice junctions that are invisible to other methods. CIDANE is available at http://ccb.jhu.edu/software/cidane/.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Sam Kovaka ◽  
Aleksey V. Zimin ◽  
Geo M. Pertea ◽  
Roham Razaghi ◽  
Steven L. Salzberg ◽  
...  

AbstractRNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.


2019 ◽  
Vol 23 (4) ◽  
pp. 521-535
Author(s):  
Alexander Atkins ◽  
Pratyush Gupta ◽  
Bing Melody Zhang ◽  
Wen-Sy Tsai ◽  
Julian Lucas ◽  
...  

2019 ◽  
Author(s):  
Sam Kovaka ◽  
Aleksey V. Zimin ◽  
Geo M. Pertea ◽  
Roham Razaghi ◽  
Steven L. Salzberg ◽  
...  

AbstractRNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new computational methods to handle the high error rate of long-read sequencing technology, which previous assemblers could not tolerate. It also offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of assemblies. On 33 short-read datasets from humans and two plant species, StringTie2 is 47.3% more precise and 3.9% more sensitive than Scallop. On multiple long read datasets, StringTie2 on average correctly assembles 8.3 and 2.6 times as many transcripts as FLAIR and Traphlor, respectively, with substantially higher precision. StringTie2 is also faster and has a smaller memory footprint than all comparable tools.


Sign in / Sign up

Export Citation Format

Share Document