On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments
Abstract In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated data. Mathematical descriptions of the data generating mechanism in pooled experiments are used to reinforce our interpretations from the empirical and simulation studies. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of biological samples, an adequate pooling strategy is effective in maintaining the power of testing for DGE for low to medium abundance levels, along with a substantial reduction of the total cost of the experiment.