scholarly journals Identifying differential isoform abundance with RATs: a universal tool and a warning

2017 ◽  
Author(s):  
Kimon Froussios ◽  
Kira Mourão ◽  
Gordon G. Simpson ◽  
Geoffrey J. Barton ◽  
Nick J. Schurch

AbstractMotivationThe biological importance of changes in gene and transcript expression is well recognised and is reflected by the wide variety of tools available to characterise these changes. Regulation via Differential Transcript Usage (DTU) is emerging as an important phenomenon. Several tools exist for the detection of DTU from read alignment or assembly data, but options for detection of DTU from alignment-free quantifications are limited.ResultsWe present an R package named RATs – (Relative Abundance of Transcripts) – that identifies DTU transcriptome-wide directly from transcript abundance estimations. RATs is agnostic to quantification methods and exploits bootstrapped quantifications, if available, to inform the significance of detected DTU events. RATs contextualises the DTU results and shows good False Discovery performance (median FDR ≤0.05) at all replication levels. We applied RATs to a human RNA-seq dataset associated with idiopathic pulmonary fibrosis with three DTU events validated by qRT-PCR. RATs found all three genes exhibited statistically significant changes in isoform proportions based on Ensembl v60 annotations, but the DTU for two were not reliably reproduced across bootstrapped quantifications. RATs also identified 500 novel DTU events that are enriched for eleven GO terms related to regulation of the response to stimulus, regulation of immune system processes, and symbiosis/parasitism. Repeating this analysis with the Ensembl v87 annotation showed the isoform abundance profiles of two of the three validated DTU genes changed radically. RATs identified 414 novel DTU events that are enriched for five GO terms, none of which are in common with those previously identified. Only 141 of the DTU evens are common between the two analyses, and only 8 are among the 248 reported by the original study. Furthermore, the original qRT-PCR probes no longer match uniquely to their original transcripts, calling into question the interpretation of these data. We suggest parallel full-length isoform sequencing, annotation pre-filtering and sequencing of the transcripts captured by qRT-PCR primers as possible ways to improve the validation of RNA-seq results in future experiments.AvailabilityThe package is available through Github at https://github.com/bartongroup/Rats.

F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 213 ◽  
Author(s):  
Kimon Froussios ◽  
Kira Mourão ◽  
Gordon Simpson ◽  
Geoff Barton ◽  
Nicholas Schurch

The biological importance of changes in RNA expression is reflected by the wide variety of tools available to characterise these changes from RNA-seq data. Several tools exist for detecting differential transcript isoform usage (DTU) from aligned or assembled RNA-seq data, but few exist for DTU detection from alignment-free RNA-seq quantifications. We present the RATs, an R package that identifies DTU transcriptome-wide directly from transcript abundance estimates. RATs is unique in applying bootstrapping to estimate the reliability of detected DTU events and shows good performance at all replication levels (median false positive fraction < 0.05). We compare RATs to two existing DTU tools, DRIM-Seq & SUPPA2, using two publicly available simulated RNA-seq datasets and a published human RNA-seq dataset, in which 248 genes have been previously identified as displaying significant DTU. RATs with default threshold values on the simulated Human data has a sensitivity of 0.55, a Matthews correlation coefficient of 0.71 and a false discovery rate (FDR) of 0.04, outperforming both other tools. Applying the same thresholds for SUPPA2 results in a higher sensitivity (0.61) but poorer FDR performance (0.33). RATs and DRIM-seq use different methods for measuring DTU effect-sizes complicating the comparison of results between these tools, however, for a likelihood-ratio threshold of 30, DRIM-Seq has similar FDR performance to RATs (0.06), but worse sensitivity (0.47). These differences persist for the simulated drosophila dataset. On the published human RNA-seq dataset the greatest agreement between the tools tested is 53%, observed between RATs and SUPPA2. The bootstrapping quality filter in RATs is responsible for removing the majority of DTU events called by SUPPA2 that are not reported by RATs. All methods, including the previously published qRT-PCR of three of the 248 detected DTU events, were found to be sensitive to annotation differences between Ensembl v60 and v87.


2015 ◽  
Author(s):  
Michael I Love ◽  
John B Hogenesch ◽  
Rafael A Irizarry

RNA-seq technology is widely used in biomedical and basic science research. These studies rely on complex computational methods that quantify expression levels for observed transcripts. We find that current computational methods can lead to hundreds of false positive results related to alternative isoform usage. This flaw in the current methodology stems from a lack of modeling sample-specific bias that leads to drops in coverage and is related to sequence features like fragment GC content and GC stretches. By incorporating features that explain this bias into transcript expression models, we greatly increase the specificity of transcript expression estimates, with more than a four-fold reduction in the number of false positives for reported changes in expression. We introduce alpine, a method for estimation of bias-corrected transcript abundance. The method is available as a Bioconductor package that includes data visualization tools useful for bias discovery.


Author(s):  
Shanwen Sun ◽  
Lei Xu ◽  
Quan Zou ◽  
Guohua Wang

Abstract Summary Processing raw reads of RNA-sequencing (RNA-seq) data, no matter public or newly sequenced data, involves a lot of specialized tools and technical configurations that are often unfamiliar and time-consuming to learn for non-bioinformatics researchers. Here, we develop the R package BP4RNAseq, which integrates the state-of-art tools from both alignment-based and alignment-free quantification workflows. The BP4RNAseq package is a highly automated tool using an optimized pipeline to improve the sensitivity and accuracy of RNA-seq analyses. It can take only two non-technical parameters and output six formatted gene expression quantification at gene and transcript levels. The package applies to both retrospective and newly generated bulk RNA-seq data analyses and is also applicable for single-cell RNA-seq analyses. It, therefore, greatly facilitates the application of RNA-seq. Availability and implementation The BP4RNAseq package for R and its documentation are freely available at https://github.com/sunshanwen/BP4RNAseq. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Xinwang Wang ◽  
Weibing Shi ◽  
Timothy Rinehart

Transcriptome analysis was conducted in two popularLagerstroemiacultivars: “Natchez” (NAT), a white flower and powdery mildew resistant interspecific hybrid and “Carolina Beauty” (CAB), a red flower and powdery mildew susceptibleL. indicacultivar. RNA-seq reads were generated fromErysiphe australianainfected leaves andde novoassembled. A total of 37,035 unigenes from 224,443 assembled contigs in both genotypes were identified. Approximately 85% of these unigenes have known function. Of them, 475 KEGG genes were found significantly different between the two genotypes. Five of the top ten differentially expressed genes (DEGs) involved in the biosynthesis of secondary metabolites (plant defense) and four in flavonoid biosynthesis pathway (antioxidant activities or flower coloration). Furthermore, 5 of the 12 assembled unigenes in benzoxazinoid biosynthesis and 7 of 11 in flavonoid biosynthesis showed higher transcript abundance in NAT. The relative abundance of transcripts for 16 candidate DEGs (9 from CAB and 7 from NAT) detected by qRT-PCR showed general agreement with the abundances of the assembled transcripts in NAT. This study provided the first transcriptome analyses inL. indica. The differential transcript abundance between two genotypes indicates that it is possible to identify candidate genes that are associated with the plant defenses or flower coloration.


2015 ◽  
Vol 2 (9) ◽  
pp. 150402 ◽  
Author(s):  
Brett Trost ◽  
Catherine A. Moir ◽  
Zoe E. Gillespie ◽  
Anthony Kusalik ◽  
Jennifer A. Mitchell ◽  
...  

DNA microarrays and RNA sequencing (RNA-seq) are major technologies for performing high-throughput analysis of transcript abundance. Recently, concerns have been raised regarding the concordance of data derived from the two techniques. Using cDNA libraries derived from normal human foreskin fibroblasts, we measured changes in transcript abundance as cells transitioned from proliferative growth to quiescence using both DNA microarrays and RNA-seq. The internal reproducibility of the RNA-seq data was greater than that of the microarray data. Correlations between the RNA-seq data and the individual microarrays were low, but correlations between the RNA-seq values and the geometric mean of the microarray values were moderate. The two technologies had good agreement when considering probes with the largest (both positive and negative) fold change (FC) values. An independent technique, quantitative reverse-transcription PCR (qRT-PCR), was used to measure the FC of 76 genes between proliferative and quiescent samples, and a higher correlation was observed between the qRT-PCR data and the RNA-seq data than between the qRT-PCR data and the microarray data.


Author(s):  
Jie Yang ◽  
Chi Zhang ◽  
Wei-Hong Li ◽  
Tian-Er Zhang ◽  
Guang-Zhong Fan ◽  
...  

Background:: In Traditional Chinese Medicine (TCM), the heads and tails of Angelica sinensis (Oliv.) Diels (AS) is used in treating different diseases due to their different pharmaceutical efficacies. The underline mechanisms, however, have not been fully explored. Objective:: Novel mechanisms responsible for the discrepant activities between AS heads and tails were explored by a combined strategy of transcriptomes and metabolomics. Method:: Six pairs of the heads and tails of AS roots were collected in Min County, China. Total RNA and metabolites, which were used for RNA-seq and untargeted metabolomics analysis, were respectively isolated from each AS sample (0.1 g) by Trizol and methanol reagent. Subsequently, differentially expressed genes (DEGs) and discrepant pharmaceutical metabolites were identified for comparing AS heads and tails. Key DEGs and metabolites were quantified by qRT-PCR and targeted metabolomics experiment. Results:: Comprehensive analysis of transcriptomes and metabolomics results suggested that five KEGG pathways with significant differences included 57 DEGs. Especially, fourteen DEGs and six key metabolites were relation to the metabolic regulation of Phenylpropanoid biosynthesis (PB) pathway. Results of qRT-PCR and targeted metabolomics indicated that higher levels of expression of crucial genes in PB pathway, such as PAL, CAD, COMT and peroxidase in the tail of AS were positively correlated with levels of ferulic acid-related metabolites. The average content of ferulic acid in tails (569.58162.39 nmol/g) was higher than those in the heads (168.73  67.30 nmol/g) (P˂0.01); Caffeic acid in tails (3.82  0.88 nmol/g) vs heads (1.37  0.41 nmol/g) (P˂0.01), and Cinnamic acid in tails (0.24  0.09 nmol/g) vs heads (0.14  0.02 nmol/g) (P˂0.05). Conclusion:: Our work demonstrated that overexpressed genes and accumulated metabolites derived from PB pathway might be responsible for the discrepant pharmaceutical efficacies between AS heads and tails.


Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Darawan Rinchai ◽  
Jessica Roelands ◽  
Mohammed Toufiq ◽  
Wouter Hendrickx ◽  
Matthew C Altman ◽  
...  

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Wenbin Ye ◽  
Tao Liu ◽  
Hongjuan Fu ◽  
Congting Ye ◽  
Guoli Ji ◽  
...  

Abstract Motivation Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3′ end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. Results We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3′ UTR shortening/lengthening events between conditions. APA site switching involving non-3′ UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. Availability and implementation https://github.com/BMILAB/movAPA. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document