Gene-level differential analysis at transcript-level resolution

Bayesian inference of differentially expressed transcripts and their abundance from multi-condition RNA-seq data

10.1101/638817 ◽

2019 ◽

Author(s):

Xi Chen

Keyword(s):

Breast Cancer ◽

Simulation Analysis ◽

Differential Expression Analysis ◽

Cancer Recurrence ◽

Transcript Level ◽

Differentially Expressed ◽

Superior Performance ◽

Differential Analysis ◽

Rna Seq ◽

Transcription Expression

AbstractDeep sequencing of bulk RNA enables the differential expression analysis at transcript level. We develop a Bayesian approach to directly identify differentially expressed transcripts from RNA-seq data, which features a novel joint model of the sample variability and the differential state of individual transcripts. For each transcript, to minimize the inaccuracy of differential state caused by transcription abundance estimation, we estimate its expression abundance together with the differential state iteratively and enable the differential analysis of weakly expressed transcripts. Simulation analysis demonstrates that the proposed approach has a superior performance over conventional methods (estimating transcription expression first and then identifying differential state), particularly for lowly expressed transcripts. We further apply the proposed approach to a breast cancer RNA-seq data of patients treated by tamoxifen and identified a set of differentially expressed transcripts, providing insights into key signaling pathways associated with breast cancer recurrence.

Download Full-text

Faculty of 1000 evaluation for Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences.

F1000 - Post-publication peer review of the biomedical literature ◽

10.3410/f.726079641.793513319 ◽

2016 ◽

Author(s):

Wolfgang Huber

Keyword(s):

Transcript Level ◽

Rna Seq ◽

Gene Level

Download Full-text

Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification

F1000Research ◽

10.12688/f1000research.15398.1 ◽

2018 ◽

Vol 7 ◽

pp. 952 ◽

Cited By ~ 26

Author(s):

Michael I. Love ◽

Charlotte Soneson ◽

Rob Patro

Keyword(s):

Software Package ◽

Gene Expression Analysis ◽

Real Data ◽

Transcript Level ◽

Bioinformatic Analysis ◽

Rna Seq ◽

Statistical Framework ◽

Gene Level ◽

Show Evidence ◽

Differential Gene

Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor packages for analysis of DTU. We show how these packages can be used downstream of RNA-seq quantification using the Salmon software package. The entire pipeline is fast, benefiting from inference steps by Salmon to quantify expression at the transcript level. The workflow includes live, runnable code chunks for analysis using DRIMSeq and DEXSeq, as well as for performing two-stage testing of DTU using the stageR package, a statistical framework to screen at the gene level and then confirm which transcripts within the significant genes show evidence of DTU. We evaluate these packages and other related packages on a simulated dataset with parameters estimated from real data.

Download Full-text

Gene-level differential analysis at transcript-level resolution

Genome Biology ◽

10.1186/s13059-018-1419-z ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 37

Author(s):

Lynn Yi ◽

Harold Pimentel ◽

Nicolas L. Bray ◽

Lior Pachter

Keyword(s):

Transcript Level ◽

Differential Analysis ◽

Gene Level

Download Full-text

A general and powerful stage-wise testing procedure for differential expression and differential transcript usage

10.1101/109082 ◽

2017 ◽

Cited By ~ 1

Author(s):

Koen Van den Berge ◽

Charlotte Soneson ◽

Mark D. Robinson ◽

Lieven Clement

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Error Control ◽

Transcript Level ◽

Testing Procedure ◽

Cancer Case ◽

Rna Seq ◽

Post Hoc Analysis ◽

Gene Level ◽

Post Hoc

AbstractBackgroundReductions in sequencing cost and innovations in expression quantification have prompted an emergence of RNA-seq studies with complex designs and data analysis at transcript resolution. These applications involve multiple hypotheses per gene, leading to challenging multiple testing problems. Conventional approaches provide separate top-lists for every contrast and false discovery rate (FDR) control at individual hypothesis level. Hence, they fail to establish proper gene-level error control, which compromises downstream validation experiments. Tests that aggregate individual hypotheses are more powerful and provide gene-level FDR control, but in the RNA-seq literature no methods are available for post-hoc analysis of individual hypotheses.ResultsWe introduce a two-stage procedure that leverages the increased power of aggregated hypothesis tests while maintaining high biological resolution by post-hoc analysis of genes passing the screening hypothesis. Our method is evaluated on simulated and real RNA-seq experiments. It provides gene-level FDR control in studies with complex designs while boosting power for interaction effects without compromising the discovery of main effects. In a differential transcript usage/expression context, stage-wise testing gains power by aggregating hypotheses at the gene level, while providing transcript-level assessment of genes passing the screening stage. Finally, a prostate cancer case study highlights the relevance of combining gene with transcript level results.ConclusionStage-wise testing is a general paradigm that can be adopted whenever individual hypotheses can be aggregated. In our context, it achieves an optimal middle ground between biological resolution and statistical power while providing gene-level FDR control, which is beneficial for downstream biological interpretation and validation.

Download Full-text

Terminus enables the discovery of data-driven, robust transcript groups from RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa448 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i102-i110

Author(s):

Hirak Sarkar ◽

Avi Srivastava ◽

Héctor Corrada Bravo ◽

Michael I Love ◽

Rob Patro

Keyword(s):

Transcript Level ◽

Data Driven ◽

Supplementary Information ◽

Rna Seq ◽

Large Numbers ◽

Inference Algorithms ◽

Data Driven Approach ◽

Downstream Analysis ◽

Transcriptional Output ◽

Level Analysis

Abstract Motivation Advances in sequencing technology, inference algorithms and differential testing methodology have enabled transcript-level analysis of RNA-seq data. Yet, the inherent inferential uncertainty in transcript-level abundance estimation, even among the most accurate approaches, means that robust transcript-level analysis often remains a challenge. Conversely, gene-level analysis remains a common and robust approach for understanding RNA-seq data, but it coarsens the resulting analysis to the level of genes, even if the data strongly support specific transcript-level effects. Results We introduce a new data-driven approach for grouping together transcripts in an experiment based on their inferential uncertainty. Transcripts that share large numbers of ambiguously-mapping fragments with other transcripts, in complex patterns, often cannot have their abundances confidently estimated. Yet, the total transcriptional output of that group of transcripts will have greatly reduced inferential uncertainty, thus allowing more robust and confident downstream analysis. Our approach, implemented in the tool terminus, groups together transcripts in a data-driven manner allowing transcript-level analysis where it can be confidently supported, and deriving transcriptional groups where the inferential uncertainty is too high to support a transcript-level result. Availability and implementation Terminus is implemented in Rust, and is freely available and open source. It can be obtained from https://github.com/COMBINE-lab/Terminus. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A comprehensive RNA-Seq pipeline includes meta-analysis, interactivity and automatic reporting

10.7287/peerj.preprints.27317v1 ◽

2018 ◽

Author(s):

Giulio Spinozzi ◽

Valentina Tini ◽

Laura Mincarelli ◽

Brunangelo Falini ◽

Maria Paola Martelli

Keyword(s):

Gene Ontology ◽

Acute Myeloid Leukemia ◽

Myeloid Leukemia ◽

Meta Analysis ◽

Differential Expression Analysis ◽

Differential Analysis ◽

Rna Seq ◽

Shiny App ◽

Automated Pipeline ◽

Acute Myeloid

There are many methods available for each phase of the RNA-Seq analysis and each of them uses different algorithms. It is therefore useful to identify a pipeline that combines the best tools in terms of time and results. For this purpose, we compared five different pipelines, obtained by combining the most used tools in RNA-Seq analysis. Using RNA-Seq data on samples of different Acute Myeloid Leukemia (AML) cell lines, we compared five pipelines from the alignment to the differential expression analysis (DEA). For each one we evaluated the peak of RAM and time and then compared the differentially expressed genes identified by each pipeline. It emerged that the pipeline with shorter times, lower consumption of RAM and more reliable results, is that which involves the use ofHISAT2for alignment, featureCountsfor quantification and edgeRfor differential analysis. Finally, we developed an automated pipeline that recurs by default to the cited pipeline, but it also allows to choose between different tools. In addition, the pipeline makes a final meta-analysis that includes a Gene Ontology and Pathway analysis. The results can be viewed in an interactive Shiny Appand exported in a report (pdf, word or html formats).

Download Full-text

A tool for the comparison of transcript differential expression analysis pipelines

10.7287/peerj.preprints.2212 ◽

2016 ◽

Author(s):

Stefano Beretta ◽

Yuri Pirola ◽

Valeria Ranzani ◽

Grazisa Rossetti ◽

Raoul Bonnal ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

State Of The Art ◽

A Priori ◽

Differential Expression Analysis ◽

Workflow Management ◽

Transcript Level ◽

Rna Seq ◽

Art Methods ◽

Transcript Assembly

MOTIVATION Long non-coding RNAs (lncRNAs) have recently gained interest, especially for their involvement in controlling several cell processes, but a full understanding of their role is lacking. Differential Expression (DE) analysis is one of the most important tasks in the analysis of RNA-seq data, since it potentially points out genes involved in the regulation of the condition under study. However, a classical analysis at gene level may disregard the role of Alternative Splicing (AS) in regulating cell conditions. This is the case, for example, when a given gene is expressed in all the different conditions, but the expressed isoform is significantly diverse in the different conditions (that is an isoform switch). A transcript level analysis may better shed light on this case, especially in studies having as goal, for example, a better understanding of the behavior of lncRNAs in lymphocytes T cells, which are fundamental in studies of specific diseases, such as cancer. After Cufflinks/Cuffdiff, several approaches for DE analysis at isoform/transcript level have been proposed. However, their results are often sensitive to the upstream analysis such as read mapping, transcript reconstruction and quantification, and it is often hard to choose "a priori" the most appropriate combination of tools. This work presents a tool for assisting the user in this choice, and poses the bases for a study devoted to the characterization of lncRNAs and the identification of of isoform switch events. Our tool includes a framework for the description and the execution of a set of DE pipelines over the same input dataset, as well a set of tools for reconciling and comparing the results. METHOD We designed an automated and easily customizable tool which is able to execute a set of existing pipelines for DE analysis at transcript level starting from RNA-seq data. Our method is built upon Snakemake, a workflow management system, with the specific goal of reducing the complexity of creating workflows. This approach guarantees that the experimentation is fully replicable and easy to customize. Each considered pipeline is structured in three steps: (i) transcript assembly, (ii) quantification, and (iii) DE analysis. By default, our tool builds and compares 9 different pipelines, each taking as input the same set of RNA-seq reads, obtained by combining different state-of-the-art methods to perform the transcript assembly (TA step) with different state-of-the-art methods to perform quantification and differential expression analysis (Q+DE step). More precisely, the 9 pipelines are obtained by combining two tools (Cufflinks and StringTie) and a Reference Annotation (Ensembl annotated transcripts) for the TA step, with three tools (Cuffquant+Cuffdiff, StringTie-B+Ballgown and Kallisto+Sleuth) for the Q+DE step. Abstract truncated at 3,000 characters - the full version is available in the pdf file

Download Full-text

Nonparametric expression analysis using inferential replicate counts

10.1101/561084 ◽

2019 ◽

Author(s):

Anqi Zhu ◽

Avi Srivastava ◽

Joseph G. Ibrahim ◽

Rob Patro ◽

Michael I. Love

Keyword(s):

False Discovery Rate ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Transcript Level ◽

Parametric Model ◽

Statistical Testing ◽

Rna Seq ◽

Nonparametric Models ◽

False Discovery

AbstractA primary challenge in the analysis of RNA-seq data is to identify differentially expressed genes or transcripts while controlling for technical biases present in the observations. Ideally, a statistical testing procedure should incorporate information about the inherent uncertainty of the abundance estimates, whether at the gene or transcript level, that arise from quantification of abundance. Most popular methods for RNA-seq differential expression analysis fit a parametric model to the counts or scaled counts for each gene or transcript, and a subset of methods can incorporate information about the uncertainty of the counts. Previous work has shown that nonparametric models for RNA-seq differential expression may in some cases have better control of the false discovery rate, and adapt well to new data types without requiring reformulation of a parametric model. Existing nonparametric models do not take into account the inferential uncertainty of the observations, leading to an inflated false discovery rate, in particular at the transcript level. Here we propose a nonparametric model for differential expression analysis using inferential replicate counts, extending the existing SAMseq method to account for inferential uncertainty, batch effects, and sample pairing. We compare our method, “SAMseq With Inferential Samples Helps”, or Swish, with popular differential expression analysis methods. Swish has improved control of the false discovery rate, in particular for transcripts with high inferential uncertainty. We apply Swish to a singlecell RNA-seq dataset, assessing sensitivity to recover DE genes between sub-populations of cells, and compare its performance to the Wilcoxon rank sum test.

Download Full-text

A comprehensive RNA-Seq pipeline includes meta-analysis, interactivity and automatic reporting

10.7287/peerj.preprints.27317 ◽

2018 ◽

Author(s):

Giulio Spinozzi ◽

Valentina Tini ◽

Laura Mincarelli ◽

Brunangelo Falini ◽

Maria Paola Martelli

Keyword(s):

Gene Ontology ◽

Acute Myeloid Leukemia ◽

Myeloid Leukemia ◽

Meta Analysis ◽

Differential Expression Analysis ◽

Differential Analysis ◽

Rna Seq ◽

Shiny App ◽

Automated Pipeline ◽

Acute Myeloid

There are many methods available for each phase of the RNA-Seq analysis and each of them uses different algorithms. It is therefore useful to identify a pipeline that combines the best tools in terms of time and results. For this purpose, we compared five different pipelines, obtained by combining the most used tools in RNA-Seq analysis. Using RNA-Seq data on samples of different Acute Myeloid Leukemia (AML) cell lines, we compared five pipelines from the alignment to the differential expression analysis (DEA). For each one we evaluated the peak of RAM and time and then compared the differentially expressed genes identified by each pipeline. It emerged that the pipeline with shorter times, lower consumption of RAM and more reliable results, is that which involves the use ofHISAT2for alignment, featureCountsfor quantification and edgeRfor differential analysis. Finally, we developed an automated pipeline that recurs by default to the cited pipeline, but it also allows to choose between different tools. In addition, the pipeline makes a final meta-analysis that includes a Gene Ontology and Pathway analysis. The results can be viewed in an interactive Shiny Appand exported in a report (pdf, word or html formats).

Download Full-text