How to normalize metatranscriptomic count data for differential expression analysis

ABSTRACTBACKGROUNDDifferential expression analysis on the basis of RNA-Seq count data has become a standard tool in transcriptomics. Several studies have shown that prior normalization of the data is crucial for a reliable detection of transcriptional differences. Until now it is not clear whether and how the transcriptomic approach can be used for differential expression analysis in metatranscriptomics. The potential side effects that may result from direct application of transcriptomic tools to metatranscriptomic count data have not been studied so far.METHODSWe propose a model for differential expression in metatranscriptomics that explicitly accounts for variations in the taxonomic composition of transcripts across different samples. As a main consequence the correct normalization of metatranscriptomic count data requires the taxonomic separation of the data into organism-specific bins. Then the taxon-specific scaling of organism profiles yields a valid normalization and allows to recombine the scaled profiles into a metatranscriptomic count matrix. This matrix can then be analyzed with statistical tools for transcriptomic count data. For taxon-specific scaling and recombination of scaled counts we provide a simple R script.RESULTSWhen applying transcriptomic tools for differential expression analysis directly to metatranscriptomic data the organism-independent (global) scaling of counts implies a high risk of falsely predicted functional differences. In simulation studies we show that incorrect normalization not only tends to loose significant differences but especially can produce a large number of false positives. In contrast, taxon-specific scaling can equalize the variation of relative library sizes from different organisms and therefore shows a reliable detection of significant differences in all simulations. On real metatranscriptomic data the results from taxon-specific and global scaling can largely differ. In our study, global scaling shows a high number of extra predictions which are not supported by single transcriptome analyses. Inspection of the scaling error suggests that these extra predictions may actually correspond to artifacts of an incorrect normalization.CONCLUSIONSAs in transcriptomics, a proper normalization of count data is also essential for differential expression analysis in metatranscriptomics. Our model implies a taxon-specific scaling of counts for normalization of the data. The application of taxon-specific scaling consequently removes taxonomic composition variations from functional profiles and therefore effectively prevents the risk of false predictions due to incorrect normalization.

Download Full-text

How to normalize metatranscriptomic count data for differential expression analysis

PeerJ ◽

10.7717/peerj.3859 ◽

2017 ◽

Vol 5 ◽

pp. e3859 ◽

Cited By ~ 12

Author(s):

Heiner Klingenberg ◽

Peter Meinicke

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Count Data ◽

Differential Expression Analysis ◽

Real Data ◽

Taxonomic Composition ◽

Standard Tool ◽

Global Scaling ◽

Transcriptomic Changes ◽

Functional Profiles

Background Differential expression analysis on the basis of RNA-Seq count data has become a standard tool in transcriptomics. Several studies have shown that prior normalization of the data is crucial for a reliable detection of transcriptional differences. Until now it has not been clear whether and how the transcriptomic approach can be used for differential expression analysis in metatranscriptomics. Methods We propose a model for differential expression in metatranscriptomics that explicitly accounts for variations in the taxonomic composition of transcripts across different samples. As a main consequence the correct normalization of metatranscriptomic count data under this model requires the taxonomic separation of the data into organism-specific bins. Then the taxon-specific scaling of organism profiles yields a valid normalization and allows us to recombine the scaled profiles into a metatranscriptomic count matrix. This matrix can then be analyzed with statistical tools for transcriptomic count data. For taxon-specific scaling and recombination of scaled counts we provide a simple R script. Results When applying transcriptomic tools for differential expression analysis directly to metatranscriptomic data with an organism-independent (global) scaling of counts the resulting differences may be difficult to interpret. The differences may correspond to changing functional profiles of the contributing organisms but may also result from a variation of taxonomic abundances. Taxon-specific scaling eliminates this variation and therefore the resulting differences actually reflect a different behavior of organisms under changing conditions. In simulation studies we show that the divergence between results from global and taxon-specific scaling can be drastic. In particular, the variation of organism abundances can imply a considerable increase of significant differences with global scaling. Also, on real metatranscriptomic data, the predictions from taxon-specific and global scaling can differ widely. Our studies indicate that in real data applications performed with global scaling it might be impossible to distinguish between differential expression in terms of transcriptomic changes and differential composition in terms of changing taxonomic proportions. Conclusions As in transcriptomics, a proper normalization of count data is also essential for differential expression analysis in metatranscriptomics. Our model implies a taxon-specific scaling of counts for normalization of the data. The application of taxon-specific scaling consequently removes taxonomic composition variations from functional profiles and therefore provides a clear interpretation of the observed functional differences.

Download Full-text

TCC-GUI: a Shiny-based application for differential expression analysis of RNA-Seq count data

BMC Research Notes ◽

10.1186/s13104-019-4179-2 ◽

2019 ◽

Vol 12 (1) ◽

Cited By ~ 11

Author(s):

Wei Su ◽

Jianqiang Sun ◽

Kentaro Shimizu ◽

Koji Kadota

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Count Data ◽

Differential Expression Analysis ◽

Rna Seq

Download Full-text

A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data

PLoS ONE ◽

10.1371/journal.pone.0176185 ◽

2017 ◽

Vol 12 (5) ◽

pp. e0176185 ◽

Cited By ~ 32

Author(s):

Xiaohong Li ◽

Guy N. Brock ◽

Eric C. Rouchka ◽

Nigel G. F. Cooper ◽

Dongfeng Wu ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Rna Seq ◽

Gene Normalization ◽

Normalization Methods ◽

Global Scaling ◽

Per Gene

Download Full-text

Evaluation of methods for differential expression analysis on multi-group RNA-seq count data

BMC Bioinformatics ◽

10.1186/s12859-015-0794-7 ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 27

Author(s):

Min Tang ◽

Jianqiang Sun ◽

Kentaro Shimizu ◽

Koji Kadota

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Count Data ◽

Differential Expression Analysis ◽

Rna Seq ◽

Evaluation Of Methods

Download Full-text

Differential Expression Analysis on RNA-Seq Count Data Based on Penalized Matrix Decomposition

IEEE Transactions on NanoBioscience ◽

10.1109/tnb.2013.2296978 ◽

2014 ◽

Vol 13 (1) ◽

pp. 12-18 ◽

Cited By ~ 11

Author(s):

Jin-Xing Liu ◽

Ying-Lian Gao ◽

Yong Xu ◽

Chun-Hou Zheng ◽

Jane You

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Count Data ◽

Differential Expression Analysis ◽

Matrix Decomposition ◽

Rna Seq

Download Full-text

Faculty Opinions recommendation of Differential expression analysis for sequence count data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.6932959.7123057 ◽

2010 ◽

Author(s):

Sarah Teichmann ◽

Daniel Hebenstreit

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Count Data ◽

Differential Expression Analysis

Download Full-text

Best practices on the differential expression analysis of multi-species RNA-seq

Genome Biology ◽

10.1186/s13059-021-02337-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Matthew Chung ◽

Vincent M. Bruno ◽

David A. Rasko ◽

Christina A. Cuomo ◽

José F. Muñoz ◽

...

Keyword(s):

Best Practices ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Single Species ◽

Rna Seq ◽

Species Analysis ◽

Differential Gene ◽

Multiple Species ◽

Downstream Analysis

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.

Download Full-text

The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab028 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Xueyi Dong ◽

Luyi Tian ◽

Quentin Gouil ◽

Hasaru Kariyawasam ◽

Shian Su ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Transcriptomic Analysis ◽

Statistical Testing ◽

Rna Seq ◽

Sequencing Data ◽

Short Read ◽

Sequencing Platform ◽

Long Read

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.

Download Full-text