A high resolution single molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis

Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single molecule long read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 160k transcripts - twice that of the best current Arabidopsis transcriptome and including over 1,500 novel genes. 79% of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We developed novel methods to determine splice junctions and transcription start and end sites accurately. Mis-match profiles around splice junctions provided a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identified high confidence transcription start/end sites and removed fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provided higher resolution of transcript expression profiling and identified cold- and light-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently available. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single molecule sequencing analysis from any species.

Download Full-text

Faculty Opinions recommendation of Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.14267340.15779565 ◽

2012 ◽

Author(s):

Marylyn Ritchie ◽

Stephen Turner

Keyword(s):

Expression Analysis ◽

Transcript Expression ◽

Rna Seq ◽

Differential Gene

Download Full-text

Snaptron: querying and visualizing splicing across tens of thousands of RNA-seq samples

10.1101/097881 ◽

2017 ◽

Cited By ~ 2

Author(s):

Christopher Wilks ◽

Phani Gaddipati ◽

Abhinav Nellore ◽

Ben Langmead

Keyword(s):

Tissue Specificity ◽

Rna Seq ◽

Sequencing Data ◽

Transcription Start ◽

Link Type ◽

Alternative Transcription ◽

Web App ◽

Inverted Indexing ◽

Splice Junctions ◽

Splicing Patterns

AbstractAs more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70,000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can also rank and score junctions according to tissue specificity or other criteria. Further, Snaptron can rank and score samples according to the relative frequency of different splicing patterns. We outline biological questions that can be explored with Snaptron queries, including a study of novel exons in annotated genes, of exonization of repetitive element loci, and of a recently discovered alternative transcription start site for the ALK gene. Web app and documentation are at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron under the MIT license.

Download Full-text

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

Nature Protocols ◽

10.1038/nprot.2012.016 ◽

2012 ◽

Vol 7 (3) ◽

pp. 562-578 ◽

Cited By ~ 7062

Author(s):

Cole Trapnell ◽

Adam Roberts ◽

Loyal Goff ◽

Geo Pertea ◽

Daehwan Kim ◽

...

Keyword(s):

Expression Analysis ◽

Transcript Expression ◽

Rna Seq ◽

Differential Gene

Download Full-text

Evaluating Methods for Differential Gene Expression And Alternative Splicing Using Internal Synthetic Controls

10.1101/2020.08.05.238295 ◽

2020 ◽

Author(s):

Sudeep Mehrotra ◽

Revital Bronstein ◽

Daniel Navarro-Gomez ◽

Ayellet V. Segrè ◽

Eric A. Pierce

Keyword(s):

Gene Expression ◽

Alternative Splicing ◽

Differential Gene Expression ◽

Internal Control ◽

Accurate Determination ◽

Transcriptome Profiling ◽

Rna Seq ◽

Specificity And Sensitivity ◽

Differential Gene ◽

Synthetic Controls

AbstractHigh-throughput transcriptome sequencing has become a powerful tool in the study of human diseases. Identification of causal mechanisms may entail analysis of differential gene expression (DGE), differential transcript/isoform expression (DTE) and identification, classification and quantification of alternative splicing (AS) and/or detection of novel AS events. For such a global transcriptome profiling execution of multi-level data analysis methodologies is required. Each level presents its own unique challenges and the questions about their performance remains. In this work we present results from systematic and consistent assessing and comparing a number of widely used methods for detecting DGE, DTE and AS using internal control “spike-in” sequences (Sequins) in RNA-seq data. We demonstrated that inclusion of internal controls in RNA-seq experiments allows accurate determination of lower bounds detection levels, and better assessment of DGE, DTE and AS accuracy and sensitivity. Tools for RNA-seq read alignment and detection of DGE performed reasonably. More efforts are needed to improve specificity and sensitivity of DTE and AS detection. Low expression of isoforms accompanied with sequencing depth does impact sensitivity and specificity of DTE and AS tools.

Download Full-text

PacBio and Illumina RNA Sequencing Identify Alternative Splicing Events in Response to Cold Stress in Two Poplar Species

Frontiers in Plant Science ◽

10.3389/fpls.2021.737004 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jingli Yang ◽

Wanqiu Lv ◽

Liying Shao ◽

Yanrui Fu ◽

Haimei Liu ◽

...

Keyword(s):

Alternative Splicing ◽

Stress Response ◽

Cold Stress ◽

Rna Sequencing ◽

Single Molecule ◽

Intron Retention ◽

Global Analysis ◽

Populus Trichocarpa ◽

Rna Seq ◽

Long Read

In eukaryotes, alternative splicing (AS) is a crucial regulatory mechanism that modulates mRNA diversity and stability. The contribution of AS to stress is known in many species related to stress, but the posttranscriptional mechanism in poplar under cold stress is still unclear. Recent studies have utilized the advantages of single molecular real-time (SMRT) sequencing technology from Pacific Bioscience (PacBio) to identify full-length transcripts. We, therefore, used a combination of single-molecule long-read sequencing and Illumina RNA sequencing (RNA-Seq) for a global analysis of AS in two poplar species (Populus trichocarpa and P. ussuriensis) under cold stress. We further identified 1,261 AS events in P. trichocarpa and 2,101 in P. ussuriensis among which intron retention, with a frequency of more than 30%, was the most prominent type under cold stress. RNA-Seq data analysis and annotation revealed the importance of calcium, abscisic acid, and reactive oxygen species signaling in cold stress response. Besides, the low temperature rapidly induced multiple splicing factors, transcription factors, and differentially expressed genes through AS. In P. ussuriensis, there was a rapid occurrence of AS events, which provided a new insight into the complexity and regulation of AS during cold stress response in different poplar species for the first time.

Download Full-text

Probabilistic estimation of short sequence expression using RNA-Seq data and the “positional bootstrap”

10.1101/046474 ◽

2016 ◽

Cited By ~ 2

Author(s):

Hui Y. Xiong ◽

Leo J. Lee ◽

Hannes Bretschneider ◽

Jiexin Gao ◽

Nebojsa Jojic ◽

...

Keyword(s):

Alternative Splicing ◽

Sample Preparation ◽

Differential Expression ◽

Short Sequence ◽

Rna Seq ◽

Experimental Conditions ◽

Probabilistic Estimation ◽

Uniform Coverage ◽

Splice Junctions ◽

Do So

AbstractWhen estimating expression of a transcript or part of a transcript using RNA-seq data, it is commonly assumed that reads are generated uniformly from positions within the transcript. While this assumption is acceptable for long transcript sequences where reads from many positions are averaged, it frequently leads to large errors for short sequences, e.g., less than 100 bp. Analysis of short sequences, such as when studying splice junctions and microRNAs, is increasingly important and necessitates addressing errors in short-sequence expression estimation. Indeed, when we examined RNA-seq data from diverse studies, we found that large errors are introduced by variations in RNA-seq coverage due to sequence content, experimental conditions and sample preparation.We developed a technique that we call the positional bootstrap, which quantifies the level of uncertainty in expression induced by non-uniform coverage. Unlike methods that attempt to correct for biases in coverage, but do so by making strong assumptions about the form of those biases, the positional bootstrap can quantify the noise induced by all types of bias, including unknown ones. Results obtained using independently generated RNA-seq datasets show that the positional bootstrap increases the accuracy of estimates of alternative splicing levels, tissue-differential alternative splicing and tissue differential expression, by a factor of up to 10.A Python implementation of the algorithm to quantify splicing levels is freely available from github.com/PSI-Lab/BENTO-Seq.

Download Full-text

CIDANE: Comprehensive isoform discovery and abundance estimation

10.1101/017939 ◽

2015 ◽

Cited By ~ 1

Author(s):

Stefan Canzar ◽

Sandro Andreotti ◽

David Weese ◽

Knut Reinert ◽

Gunnar W. Klau

Keyword(s):

Boundary Data ◽

Model Organisms ◽

Integrated Analysis ◽

Abundance Estimation ◽

Rna Seq ◽

Splice Sites ◽

Transcription Start ◽

Transcript Reconstruction ◽

Splice Junctions ◽

Higher Sensitivity

We present CIDANE, a novel framework for genome-based transcript reconstruction and quantification from RNA-seq reads. CIDANE assembles transcripts with significantly higher sensitivity and precision than existing tools, while competing in speed with the fastest methods. In addition to reconstructing transcripts ab initio, the algorithm also allows to make use of the growing annotation of known splice sites, transcription start and end sites, or full-length transcripts, which are available for most model organisms. CIDANE supports the integrated analysis of RNA-seq and additional gene-boundary data and recovers splice junctions that are invisible to other methods. CIDANE is available at http://ccb.jhu.edu/software/cidane/.

Download Full-text

Transcriptome assembly from long-read RNA-seq alignments with StringTie2

Genome Biology ◽

10.1186/s13059-019-1910-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 54

Author(s):

Sam Kovaka ◽

Aleksey V. Zimin ◽

Geo M. Pertea ◽

Roham Razaghi ◽

Steven L. Salzberg ◽

...

Keyword(s):

Single Molecule ◽

Transcriptome Assembly ◽

Rna Seq ◽

Ability To Work ◽

Single Molecule Sequencing ◽

Short Read ◽

New Methods ◽

Long Reads ◽

Long Read

Download Full-text

Detection of Circulating Tumor DNA with a Single-Molecule Sequencing Analysis Validated for Targeted and Immunotherapy Selection

Molecular Diagnosis & Therapy ◽

10.1007/s40291-019-00406-0 ◽

2019 ◽

Vol 23 (4) ◽

pp. 521-535

Author(s):

Alexander Atkins ◽

Pratyush Gupta ◽

Bing Melody Zhang ◽

Wen-Sy Tsai ◽

Julian Lucas ◽

...

Keyword(s):

Single Molecule ◽

Circulating Tumor Dna ◽

Sequencing Analysis ◽

Single Molecule Sequencing ◽

Tumor Dna

Download Full-text

Transcriptome assembly from long-read RNA-seq alignments with StringTie2

10.1101/694554 ◽

2019 ◽

Author(s):

Sam Kovaka ◽

Aleksey V. Zimin ◽

Geo M. Pertea ◽

Roham Razaghi ◽

Steven L. Salzberg ◽

...

Keyword(s):

Single Molecule ◽

Transcriptome Assembly ◽

Rna Seq ◽

High Error Rate ◽

Sequencing Technology ◽

Ability To Work ◽

Single Molecule Sequencing ◽

Long Reads ◽

Long Read

AbstractRNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new computational methods to handle the high error rate of long-read sequencing technology, which previous assemblers could not tolerate. It also offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of assemblies. On 33 short-read datasets from humans and two plant species, StringTie2 is 47.3% more precise and 3.9% more sensitive than Scallop. On multiple long read datasets, StringTie2 on average correctly assembles 8.3 and 2.6 times as many transcripts as FLAIR and Traphlor, respectively, with substantially higher precision. StringTie2 is also faster and has a smaller memory footprint than all comparable tools.

Download Full-text