scholarly journals RNAseqPS: A Web Tool for Estimating Sample Size and Power for RNAseq Experiment

2014 ◽  
Vol 13s6 ◽  
pp. CIN.S17688 ◽  
Author(s):  
Yan Guo ◽  
Shilin Zhao ◽  
Chung-I Li ◽  
Quanhu Sheng ◽  
Yu Shyr

Sample size and power determination is the first step in the experimental design of a successful study. Sample size and power calculation is required for applications for National Institutes of Health (NIH) funding. Sample size and power calculation is well established for traditional biological studies such as mouse model, genome wide association study (GWAS), and microarray studies. Recent developments in high-throughput sequencing technology have allowed RNAseq to replace microarray as the technology of choice for high-throughput gene expression profiling. However, the sample size and power analysis of RNAseq technology is an underdeveloped area. Here, we present RNAseqPS, an advanced online RNAseq power and sample size calculation tool based on the Poisson and negative binomial distributions. RNAseqPS was built using the Shiny package in R. It provides an interactive graphical user interface that allows the users to easily conduct sample size and power analysis for RNAseq experimental design. RNAseqPS can be accessed directly at http://cqs.mc.vanderbilt.edu/shiny/RNAseqPS/ .

Author(s):  
Hyun Kang

Appropriate sample size calculation and power analysis have become major issues in research and publication processes. However, the complexity and difficulty of calculating sample size and power require broad statistical knowledge, there is a shortage of personnel with programming skills, and commercial programs are often too expensive to use in practice. The review article aimed to explain the basic concepts of sample size calculation and power analysis; the process of sample estimation; and how to calculate sample size using G*Power software (latest ver. 3.1.9.7; Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany) with 5 statistical examples. The null and alternative hypothesis, effect size, power, alpha, type I error, and type II error should be described when calculating the sample size or power. G*Power is recommended for sample size and power calculations for various statistical methods (F, t, χ2, Z, and exact tests), because it is easy to use and free. The process of sample estimation consists of establishing research goals and hypotheses, choosing appropriate statistical tests, choosing one of 5 possible power analysis methods, inputting the required variables for analysis, and selecting the “Calculate” button. The G*Power software supports sample size and power calculation for various statistical methods (F, t, χ2, z, and exact tests). This software is helpful for researchers to estimate the sample size and to conduct power analysis.


2020 ◽  
Vol 26 (Supplement_1) ◽  
pp. S9-S9
Author(s):  
Svetlana Lakunina ◽  
Zipporah Iheozor-Ejiofor ◽  
Morris Gordon ◽  
Daniel Akintelure ◽  
Vassiliki Sinopoulou

Abstract Inflammatory bowel disease is a collection of disorders of the gastrointestinal tract, characterised by relapsing and remitting inflammation. Studies have reported several pharmacological or non-pharmacological interventions being effective in the management of the disease. Sample size estimation with power calculation is necessary for a trial to detect the effect of an intervention. This project critically evaluates the sample size estimation and power calculation reported by randomised controlled studies of inflammatory bowel disease management to effectively conclude appropriateness of the studies results. We conducted a literature search in the Cochrane database to identify systematic literature reviews. Their reference lists were screened, and studies were selected if they met the inclusion criteria. The data was extracted based on power calculation parameters and outcomes, results were analysed and summarised in percentages, means and graphs. We screened almost all trials about the management of inflammatory bowel disease published in the past 25 years. 232 studies were analysed, of which 167 reported power calculation. Less than half (48%) of these studies achieved their target sample size, needed for them to accurately conclude that the interventions were effective. Moreover, the average minimal difference those studies were aimed to detect was 30%, which could be not enough to prove the effect of an intervention. To conclude inaccurate power calculations and failure to achieve the target sample sizes can lead to errors in the results on how effective an intervention is in the management of inflammatory bowel disease.


2011 ◽  
Vol 17 (10) ◽  
pp. 1211-1217 ◽  
Author(s):  
Richard Nicholas ◽  
Sebastian Straube ◽  
Heinz Schmidli ◽  
Simon Schneider ◽  
Tim Friede

Background: Sample size calculation is a key aspect in the planning of any trial. Planning a randomized placebo-controlled trial in relapsing–remitting multiple sclerosis (RRMS) requires knowledge of the annualized relapse rate (ARR) in the placebo group. Objectives: This paper aims (i) to characterize the uncertainty in ARR by conducting a systematic review of placebo-controlled, randomized trials in RRMS and by modelling the ARR over time; and (ii) to assess the feasibility and utility of blinded sample size re-estimation (BSSR) procedures in RRMS. Methods: A systematic literature review was carried out by searching PubMed, Ovid Medline and the Cochrane Register of Controlled Trials. The placebo ARRs were modelled by negative binomial regression. Computer simulations were conducted to assess the utility of BSSR in RRMS. Results: Data from 26 placebo-controlled randomized trials were included in this analysis. The placebo ARR decreased by 6.2% per year ( p < 0.0001; 95% CI (4.2%; 8.1%)) resulting in substantial uncertainty in the planning of future trials. BSSR was shown to be feasible and to maintain power at a prespecified level also if the ARR was misspecified in the planning phase. Conclusions: Our investigations confirmed previously reported trends in ARR. In this context adaptive strategies such as BSSR designs are recommended for consideration in the planning of future trials in RRMS.


Author(s):  
Chung-I Li ◽  
Yu Shyr

AbstractAs RNA-seq rapidly develops and costs continually decrease, the quantity and frequency of samples being sequenced will grow exponentially. With proteomic investigations becoming more multivariate and quantitative, determining a study’s optimal sample size is now a vital step in experimental design. Current methods for calculating a study’s required sample size are mostly based on the hypothesis testing framework, which assumes each gene count can be modeled through Poisson or negative binomial distributions; however, these methods are limited when it comes to accommodating covariates. To address this limitation, we propose an estimating procedure based on the generalized linear model. This easy-to-use method constructs a representative exemplary dataset and estimates the conditional power, all without requiring complicated mathematical approximations or formulas. Even more attractive, the downstream analysis can be performed with current R/Bioconductor packages. To demonstrate the practicability and efficiency of this method, we apply it to three real-world studies, and introduce our on-line calculator developed to determine the optimal sample size for a RNA-seq study.


2013 ◽  
Vol 33 (3) ◽  
pp. 376-387 ◽  
Author(s):  
Haiyuan Zhu ◽  
Hassan Lakkis

2021 ◽  
Vol 1 (2) ◽  
pp. 47-63
Author(s):  
Xiaohong Li ◽  
Shesh N. Rai ◽  
Eric C. Rouchka ◽  
Timothy E. O’Toole ◽  
Nigel G. F. Cooper

Sample size calculation for adequate power analysis is critical in optimizing RNA-seq experimental design. However, the complexity increases for directly estimating sample size when taking into consideration confounding covariates. Although a number of approaches for sample size calculation have been proposed for RNA-seq data, most ignore any potential heterogeneity. In this study, we implemented a simulation-based and confounder-adjusted method to provide sample size recommendations for RNA-seq differential expression analysis. The data was generated using Monte Carlo simulation, given an underlined distribution of confounding covariates and parameters for a negative binomial distribution. The relationship between the sample size with the power and parameters, such as dispersion, fold change and mean read counts, can be visualized. We demonstrate that the adjusted sample size for a desired power and type one error rate of α is usually larger when taking confounding covariates into account. More importantly, our simulation study reveals that sample size may be underestimated by existing methods if a confounding covariate exists in RNA-seq data. Consequently, this underestimate could affect the detection power for the differential expression analysis. Therefore, we introduce confounding covariates for sample size estimation for heterogeneous RNA-seq data.


Sign in / Sign up

Export Citation Format

Share Document