Identifying differential isoform abundance with RATs: a universal tool and a warning

AbstractMotivationThe biological importance of changes in gene and transcript expression is well recognised and is reflected by the wide variety of tools available to characterise these changes. Regulation via Differential Transcript Usage (DTU) is emerging as an important phenomenon. Several tools exist for the detection of DTU from read alignment or assembly data, but options for detection of DTU from alignment-free quantifications are limited.ResultsWe present an R package named RATs – (Relative Abundance of Transcripts) – that identifies DTU transcriptome-wide directly from transcript abundance estimations. RATs is agnostic to quantification methods and exploits bootstrapped quantifications, if available, to inform the significance of detected DTU events. RATs contextualises the DTU results and shows good False Discovery performance (median FDR ≤0.05) at all replication levels. We applied RATs to a human RNA-seq dataset associated with idiopathic pulmonary fibrosis with three DTU events validated by qRT-PCR. RATs found all three genes exhibited statistically significant changes in isoform proportions based on Ensembl v60 annotations, but the DTU for two were not reliably reproduced across bootstrapped quantifications. RATs also identified 500 novel DTU events that are enriched for eleven GO terms related to regulation of the response to stimulus, regulation of immune system processes, and symbiosis/parasitism. Repeating this analysis with the Ensembl v87 annotation showed the isoform abundance profiles of two of the three validated DTU genes changed radically. RATs identified 414 novel DTU events that are enriched for five GO terms, none of which are in common with those previously identified. Only 141 of the DTU evens are common between the two analyses, and only 8 are among the 248 reported by the original study. Furthermore, the original qRT-PCR probes no longer match uniquely to their original transcripts, calling into question the interpretation of these data. We suggest parallel full-length isoform sequencing, annotation pre-filtering and sequencing of the transcripts captured by qRT-PCR primers as possible ways to improve the validation of RNA-seq results in future experiments.AvailabilityThe package is available through Github at https://github.com/bartongroup/Rats.

Download Full-text

Relative Abundance of Transcripts (RATs): Identifying differential isoform abundance from RNA-seq

F1000Research ◽

10.12688/f1000research.17916.1 ◽

2019 ◽

Vol 8 ◽

pp. 213 ◽

Cited By ~ 6

Author(s):

Kimon Froussios ◽

Kira Mourão ◽

Gordon Simpson ◽

Geoff Barton ◽

Nicholas Schurch

Keyword(s):

Matthews Correlation Coefficient ◽

Transcript Abundance ◽

R Package ◽

Effect Sizes ◽

Rna Seq ◽

Threshold Values ◽

Qrt Pcr ◽

False Discovery ◽

Abundance Estimates ◽

Higher Sensitivity

The biological importance of changes in RNA expression is reflected by the wide variety of tools available to characterise these changes from RNA-seq data. Several tools exist for detecting differential transcript isoform usage (DTU) from aligned or assembled RNA-seq data, but few exist for DTU detection from alignment-free RNA-seq quantifications. We present the RATs, an R package that identifies DTU transcriptome-wide directly from transcript abundance estimates. RATs is unique in applying bootstrapping to estimate the reliability of detected DTU events and shows good performance at all replication levels (median false positive fraction < 0.05). We compare RATs to two existing DTU tools, DRIM-Seq & SUPPA2, using two publicly available simulated RNA-seq datasets and a published human RNA-seq dataset, in which 248 genes have been previously identified as displaying significant DTU. RATs with default threshold values on the simulated Human data has a sensitivity of 0.55, a Matthews correlation coefficient of 0.71 and a false discovery rate (FDR) of 0.04, outperforming both other tools. Applying the same thresholds for SUPPA2 results in a higher sensitivity (0.61) but poorer FDR performance (0.33). RATs and DRIM-seq use different methods for measuring DTU effect-sizes complicating the comparison of results between these tools, however, for a likelihood-ratio threshold of 30, DRIM-Seq has similar FDR performance to RATs (0.06), but worse sensitivity (0.47). These differences persist for the simulated drosophila dataset. On the published human RNA-seq dataset the greatest agreement between the tools tested is 53%, observed between RATs and SUPPA2. The bootstrapping quality filter in RATs is responsible for removing the majority of DTU events called by SUPPA2 that are not reported by RATs. All methods, including the previously published qRT-PCR of three of the 248 detected DTU events, were found to be sensitive to annotation differences between Ensembl v60 and v87.

Download Full-text

Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation

10.1101/025767 ◽

2015 ◽

Cited By ~ 6

Author(s):

Michael I Love ◽

John B Hogenesch ◽

Rafael A Irizarry

Keyword(s):

Computational Methods ◽

Gc Content ◽

Science Research ◽

Transcript Abundance ◽

Transcript Expression ◽

Rna Seq ◽

Fold Reduction ◽

Visualization Tools ◽

Positive Results ◽

Sequence Bias

RNA-seq technology is widely used in biomedical and basic science research. These studies rely on complex computational methods that quantify expression levels for observed transcripts. We find that current computational methods can lead to hundreds of false positive results related to alternative isoform usage. This flaw in the current methodology stems from a lack of modeling sample-specific bias that leads to drops in coverage and is related to sequence features like fragment GC content and GC stretches. By incorporating features that explain this bias into transcript expression models, we greatly increase the specificity of transcript expression estimates, with more than a four-fold reduction in the number of false positives for reported changes in expression. We introduce alpine, a method for estimation of bias-corrected transcript abundance. The method is available as a Bioconductor package that includes data visualization tools useful for bias discovery.

Download Full-text

BP4RNAseq: a babysitter package for retrospective and newly generated RNA-seq data analyses using both alignment-based and alignment-free quantification method

Bioinformatics ◽

10.1093/bioinformatics/btaa832 ◽

2020 ◽

Author(s):

Shanwen Sun ◽

Lei Xu ◽

Quan Zou ◽

Guohua Wang

Keyword(s):

R Package ◽

Supplementary Information ◽

Rna Seq ◽

Technical Parameters ◽

Alignment Free ◽

Data Analyses ◽

Gene Expression Quantification ◽

Free Quantification ◽

Automated Tool ◽

Expression Quantification

Abstract Summary Processing raw reads of RNA-sequencing (RNA-seq) data, no matter public or newly sequenced data, involves a lot of specialized tools and technical configurations that are often unfamiliar and time-consuming to learn for non-bioinformatics researchers. Here, we develop the R package BP4RNAseq, which integrates the state-of-art tools from both alignment-based and alignment-free quantification workflows. The BP4RNAseq package is a highly automated tool using an optimized pipeline to improve the sensitivity and accuracy of RNA-seq analyses. It can take only two non-technical parameters and output six formatted gene expression quantification at gene and transcript levels. The package applies to both retrospective and newly generated bulk RNA-seq data analyses and is also applicable for single-cell RNA-seq analyses. It, therefore, greatly facilitates the application of RNA-seq. Availability and implementation The BP4RNAseq package for R and its documentation are freely available at https://github.com/sunshanwen/BP4RNAseq. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Transcriptomes That Confer to Plant Defense against Powdery Mildew Disease inLagerstroemia indica

International Journal of Genomics ◽

10.1155/2015/528395 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Xinwang Wang ◽

Weibing Shi ◽

Timothy Rinehart

Keyword(s):

Powdery Mildew ◽

Plant Defense ◽

De Novo ◽

Antioxidant Activities ◽

Transcript Abundance ◽

Flavonoid Biosynthesis ◽

Biosynthesis Pathway ◽

Rna Seq ◽

Qrt Pcr ◽

Flavonoid Biosynthesis Pathway

Transcriptome analysis was conducted in two popularLagerstroemiacultivars: “Natchez” (NAT), a white flower and powdery mildew resistant interspecific hybrid and “Carolina Beauty” (CAB), a red flower and powdery mildew susceptibleL. indicacultivar. RNA-seq reads were generated fromErysiphe australianainfected leaves andde novoassembled. A total of 37,035 unigenes from 224,443 assembled contigs in both genotypes were identified. Approximately 85% of these unigenes have known function. Of them, 475 KEGG genes were found significantly different between the two genotypes. Five of the top ten differentially expressed genes (DEGs) involved in the biosynthesis of secondary metabolites (plant defense) and four in flavonoid biosynthesis pathway (antioxidant activities or flower coloration). Furthermore, 5 of the 12 assembled unigenes in benzoxazinoid biosynthesis and 7 of 11 in flavonoid biosynthesis showed higher transcript abundance in NAT. The relative abundance of transcripts for 16 candidate DEGs (9 from CAB and 7 from NAT) detected by qRT-PCR showed general agreement with the abundances of the assembled transcripts in NAT. This study provided the first transcriptome analyses inL. indica. The differential transcript abundance between two genotypes indicates that it is possible to identify candidate genes that are associated with the plant defenses or flower coloration.

Download Full-text

Concordance between RNA-sequencing data and DNA microarray data in transcriptome analysis of proliferative and quiescent fibroblasts

Royal Society Open Science ◽

10.1098/rsos.150402 ◽

2015 ◽

Vol 2 (9) ◽

pp. 150402 ◽

Cited By ~ 14

Author(s):

Brett Trost ◽

Catherine A. Moir ◽

Zoe E. Gillespie ◽

Anthony Kusalik ◽

Jennifer A. Mitchell ◽

...

Keyword(s):

Rna Sequencing ◽

Microarray Data ◽

Dna Microarrays ◽

Geometric Mean ◽

Transcript Abundance ◽

Cdna Libraries ◽

Rna Seq ◽

Sequencing Data ◽

High Throughput Analysis ◽

Qrt Pcr

DNA microarrays and RNA sequencing (RNA-seq) are major technologies for performing high-throughput analysis of transcript abundance. Recently, concerns have been raised regarding the concordance of data derived from the two techniques. Using cDNA libraries derived from normal human foreskin fibroblasts, we measured changes in transcript abundance as cells transitioned from proliferative growth to quiescence using both DNA microarrays and RNA-seq. The internal reproducibility of the RNA-seq data was greater than that of the microarray data. Correlations between the RNA-seq data and the individual microarrays were low, but correlations between the RNA-seq values and the geometric mean of the microarray values were moderate. The two technologies had good agreement when considering probes with the largest (both positive and negative) fold change (FC) values. An independent technique, quantitative reverse-transcription PCR (qRT-PCR), was used to measure the FC of 76 genes between proliferative and quiescent samples, and a higher correlation was observed between the qRT-PCR data and the RNA-seq data than between the qRT-PCR data and the microarray data.

Download Full-text

Faculty Opinions recommendation of Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.14267340.15779565 ◽

2012 ◽

Author(s):

Marylyn Ritchie ◽

Stephen Turner

Keyword(s):

Expression Analysis ◽

Transcript Expression ◽

Rna Seq ◽

Differential Gene

Download Full-text

Comprehensive Analysis of Transcriptomics and Metabolomics between the Heads and Tails of Angelica Sinensis: Genes Related to Phenylpropanoid Biosynthesis Pathway

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323999201103221952 ◽

2020 ◽

Vol 23 ◽

Author(s):

Jie Yang ◽

Chi Zhang ◽

Wei-Hong Li ◽

Tian-Er Zhang ◽

Guang-Zhong Fan ◽

...

Keyword(s):

Ferulic Acid ◽

Metabolic Regulation ◽

Comprehensive Analysis ◽

Angelica Sinensis ◽

Rna Seq ◽

Targeted Metabolomics ◽

Kegg Pathways ◽

Qrt Pcr ◽

Phenylpropanoid Biosynthesis ◽

Combined Strategy

Background:: In Traditional Chinese Medicine (TCM), the heads and tails of Angelica sinensis (Oliv.) Diels (AS) is used in treating different diseases due to their different pharmaceutical efficacies. The underline mechanisms, however, have not been fully explored. Objective:: Novel mechanisms responsible for the discrepant activities between AS heads and tails were explored by a combined strategy of transcriptomes and metabolomics. Method:: Six pairs of the heads and tails of AS roots were collected in Min County, China. Total RNA and metabolites, which were used for RNA-seq and untargeted metabolomics analysis, were respectively isolated from each AS sample (0.1 g) by Trizol and methanol reagent. Subsequently, differentially expressed genes (DEGs) and discrepant pharmaceutical metabolites were identified for comparing AS heads and tails. Key DEGs and metabolites were quantified by qRT-PCR and targeted metabolomics experiment. Results:: Comprehensive analysis of transcriptomes and metabolomics results suggested that five KEGG pathways with significant differences included 57 DEGs. Especially, fourteen DEGs and six key metabolites were relation to the metabolic regulation of Phenylpropanoid biosynthesis (PB) pathway. Results of qRT-PCR and targeted metabolomics indicated that higher levels of expression of crucial genes in PB pathway, such as PAL, CAD, COMT and peroxidase in the tail of AS were positively correlated with levels of ferulic acid-related metabolites. The average content of ferulic acid in tails (569.58162.39 nmol/g) was higher than those in the heads (168.73  67.30 nmol/g) (P˂0.01); Caffeic acid in tails (3.82  0.88 nmol/g) vs heads (1.37  0.41 nmol/g) (P˂0.01), and Cinnamic acid in tails (0.24  0.09 nmol/g) vs heads (0.14  0.02 nmol/g) (P˂0.05). Conclusion:: Our work demonstrated that overexpressed genes and accumulated metabolites derived from PB pathway might be responsible for the discrepant pharmaceutical efficacies between AS heads and tails.

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BloodGen3Module: Blood transcriptional module repertoire analysis and visualization using R

Bioinformatics ◽

10.1093/bioinformatics/btab121 ◽

2021 ◽

Author(s):

Darawan Rinchai ◽

Jessica Roelands ◽

Mohammed Toufiq ◽

Wouter Hendrickx ◽

Matthew C Altman ◽

...

Keyword(s):

Transcript Abundance ◽

R Package ◽

Supplementary Information ◽

Illustrative Case ◽

Bioinformatic Tools ◽

Transcriptional Module ◽

Wide Range ◽

Downstream Analysis ◽

Computing Module ◽

Parallel Workflow

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

movAPA: modeling and visualization of dynamics of alternative polyadenylation across biological samples

Bioinformatics ◽

10.1093/bioinformatics/btaa997 ◽

2020 ◽

Author(s):

Wenbin Ye ◽

Tao Liu ◽

Hongjuan Fu ◽

Congting Ye ◽

Guoli Ji ◽

...

Keyword(s):

Biological Samples ◽

Tissue Specificity ◽

Single Cells ◽

Alternative Polyadenylation ◽

R Package ◽

Supplementary Information ◽

Rna Seq ◽

Mouse Sperm ◽

High Scalability ◽

A Site

Abstract Motivation Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3′ end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. Results We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3′ UTR shortening/lengthening events between conditions. APA site switching involving non-3′ UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. Availability and implementation https://github.com/BMILAB/movAPA. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text