scholarly journals How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana

2019 ◽  
Vol 35 (18) ◽  
pp. 3372-3377 ◽  
Author(s):  
Kimon Froussios ◽  
Nick J Schurch ◽  
Katarzyna Mackinnon ◽  
Marek Gierliński ◽  
Céline Duc ◽  
...  

Abstract Motivation RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae. Results We show that, consistent with the results in S.cerevisiae, more gene expression measurements in A.thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A.thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution. Availability and implementation The raw data for the 17 WT Arabidopsis thaliana datasets is available from the European Nucleotide Archive (E-MTAB-5446). The processed and aligned data can be visualized in context using IGB (Freese et al., 2016), or downloaded directly, using our publicly available IGB quickload server at https://compbio.lifesci.dundee.ac.uk/arabidopsisQuickload/public_quickload/ under ‘RNAseq>Froussios2019’. All scripts and commands are available from github at https://github.com/bartongroup/KF_arabidopsis-GRNA. Supplementary information Supplementary data are available at Bioinformatics online.

2016 ◽  
Author(s):  
Kimon Froussios ◽  
Nick J. Schurch ◽  
Katarzyna Mackinnon ◽  
Marek Gierliński ◽  
Céline Duc ◽  
...  

AbstractRNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, Differential Gene Expression (DGE) tools typically assume the form of the underlying distribution of gene expression. A recent highly replicated study revealed that RNA-seq gene expression measurements in yeast are best represented as being drawn from an underlying negative binomial distribution. In this paper, the statistical properties of gene expression in the higher eukaryote Arabidopsis thaliana are shown to be essentially identical to those from yeast despite the large increase in the size and complexity of the transcriptome: Gene expression measurements from this model plant species are consistent with being drawn from an underlying negative binomial or log-normal distribution and the false positive rate performance of nine widely used DGE tools is not strongly affected by the additional size and complexity of the A. thaliana transcriptome. For RNA-seq data, we therefore recommend the use of DGE tools that are based on the negative binomial distribution.


2018 ◽  
Author(s):  
Brandon Monier ◽  
Adam McDermaid ◽  
Jing Zhao ◽  
Anne Fennell ◽  
Qin Ma

AbstractMotivationNext-Generation Sequencing has made available much more large-scale genomic and transcriptomic data. Studies with RNA-sequencing (RNA-seq) data typically involve generation of gene expression profiles that can be further analyzed, many times involving differential gene expression (DGE). This process enables comparison across samples of two or more factor levels. A recurring issue with DGE analyses is the complicated nature of the comparisons to be made, in which a variety of factor combinations, pairwise comparisons, and main or blocked main effects need to be tested.ResultsHere we present a tool called IRIS-DGE, which is a server-based DGE analysis tool developed using Shiny. It provides a straightforward, user-friendly platform for performing comprehensive DGE analysis, and crucial analyses that help design hypotheses and to determine key genomic features. IRIS-DGE integrates the three most commonly used R-based DGE tools to determine differentially expressed genes (DEGs) and includes numerous methods for performing preliminary analysis on user-provided gene expression information. Additionally, this tool integrates a variety of visualizations, in a highly interactive manner, for improved interpretation of preliminary and DGE analyses.AvailabilityIRIS-DGE is freely available at http://bmbl.sdstate.edu/IRIS/[email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Yanming Di ◽  
Daniel W Schafer ◽  
Jason S Cumbie ◽  
Jeff H Chang

We propose a new statistical test for assessing differential gene expression using RNA sequencing (RNA-Seq) data. Commonly used probability distributions, such as binomial or Poisson, cannot appropriately model the count variability in RNA-Seq data due to overdispersion. The small sample size that is typical in this type of data also prevents the uncritical use of tools derived from large-sample asymptotic theory. The test we propose is based on the NBP parameterization of the negative binomial distribution. It extends an exact test proposed by Robinson and Smyth (2007, 2008). In one version of Robinson and Smyth’s test, a constant dispersion parameter is used to model the count variability between biological replicates. We introduce an additional parameter to allow the dispersion parameter to depend on the mean. Our parametric method complements nonparametric regression approaches for modeling the dispersion parameter. We apply the test we propose to an Arabidopsis data set and a range of simulated data sets. The results show that the test is simple, powerful and reasonably robust against departures from model assumptions.


2019 ◽  
Vol 12 (1) ◽  
pp. 11-19 ◽  
Author(s):  
Jun-Young Shin ◽  
Sang-Heon Choi ◽  
Da-Woon Choi ◽  
Ye-Jin An ◽  
Jae-Hyuk Seo ◽  
...  

1969 ◽  
Vol 101 (8) ◽  
pp. 883-889 ◽  
Author(s):  
Gene D. Amman

AbstractPopulations of the balsam woolly aphid on Fraser fir trees were sampled without replacement. Sampling frequency was based on the embryological period of the aphid at mean monthly temperatures in the field. The sample for each date consisted of 16 pieces of bark, 1/2 in. diameter, from each of 10 trees. Precision of the method was usually within ±10% of the mean. The largest proportion of variance was within trees. Stratification of samples by levels within trees decreased variance estimates.Frequency distributions of counts of most stages of the aphid approximated the negative binomial distribution. Therefore, data were transformed to logarithms in order to approximate the normal distribution.


Sign in / Sign up

Export Citation Format

Share Document