Grouped False-Discovery Rate for Removing the Gene-set-Level Bias of RNA-seq

AbstractDetection and quantification of circular RNAs (circRNAs) face several significant challenges, including high false discovery rate, uneven rRNA depletion and RNase R treatment efficiency, and underestimation of back-spliced junction reads. Here, we propose a novel algorithm, CIRIquant, for accurate circRNA quantification and differential expression analysis. By constructing pseudo-circular reference for re-alignment of RNA-seq reads and employing sophisticated statistical models to correct RNase R treatment biases, CIRIquant can provide more accurate expression values for circRNAs with significantly reduced false discovery rate. We further develop a one-stop differential expression analysis pipeline implementing two independent measures, which helps unveil the regulation of competitive splicing between circRNAs and their linear counterparts. We apply CIRIquant to RNA-seq datasets of hepatocellular carcinoma, and characterize two important groups of linear-circular switching and circular transcript usage switching events, which demonstrate the promising ability to explore extensive transcriptomic changes in liver tumorigenesis.

Download Full-text

Sample size reassessment for a two-stage design controlling the false discovery rate

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2014-0025 ◽

2015 ◽

Vol 14 (5) ◽

Cited By ~ 1

Author(s):

Sonja Zehetmayer ◽

Alexandra C. Graf ◽

Martin Posch

Keyword(s):

Sample Size ◽

False Discovery Rate ◽

Effect Sizes ◽

Rna Seq ◽

Two Stage ◽

Stage Design ◽

False Discovery ◽

Sample Size Calculations ◽

Two Stage Design ◽

High Uncertainty

AbstractSample size calculations for gene expression microarray and NGS-RNA-Seq experiments are challenging because the overall power depends on unknown quantities as the proportion of true null hypotheses and the distribution of the effect sizes under the alternative. We propose a two-stage design with an adaptive interim analysis where these quantities are estimated from the interim data. The second stage sample size is chosen based on these estimates to achieve a specific overall power. The proposed procedure controls the power in all considered scenarios except for very low first stage sample sizes. The false discovery rate (FDR) is controlled despite of the data dependent choice of sample size. The two-stage design can be a useful tool to determine the sample size of high-dimensional studies if in the planning phase there is high uncertainty regarding the expected effect sizes and variability.

Download Full-text

Controlling the false-discovery rate by procedures adapted to the length bias of RNA-Seq

Journal of the Korean Statistical Society ◽

10.1016/j.jkss.2017.08.001 ◽

2018 ◽

Vol 47 (1) ◽

pp. 13-23 ◽

Cited By ~ 1

Author(s):

Tae Young Yang ◽

Seongmun Jeong

Keyword(s):

False Discovery Rate ◽

Rna Seq ◽

Length Bias ◽

False Discovery

Download Full-text

Nonparametric expression analysis using inferential replicate counts

10.1101/561084 ◽

2019 ◽

Author(s):

Anqi Zhu ◽

Avi Srivastava ◽

Joseph G. Ibrahim ◽

Rob Patro ◽

Michael I. Love

Keyword(s):

False Discovery Rate ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Transcript Level ◽

Parametric Model ◽

Statistical Testing ◽

Rna Seq ◽

Nonparametric Models ◽

False Discovery

AbstractA primary challenge in the analysis of RNA-seq data is to identify differentially expressed genes or transcripts while controlling for technical biases present in the observations. Ideally, a statistical testing procedure should incorporate information about the inherent uncertainty of the abundance estimates, whether at the gene or transcript level, that arise from quantification of abundance. Most popular methods for RNA-seq differential expression analysis fit a parametric model to the counts or scaled counts for each gene or transcript, and a subset of methods can incorporate information about the uncertainty of the counts. Previous work has shown that nonparametric models for RNA-seq differential expression may in some cases have better control of the false discovery rate, and adapt well to new data types without requiring reformulation of a parametric model. Existing nonparametric models do not take into account the inferential uncertainty of the observations, leading to an inflated false discovery rate, in particular at the transcript level. Here we propose a nonparametric model for differential expression analysis using inferential replicate counts, extending the existing SAMseq method to account for inferential uncertainty, batch effects, and sample pairing. We compare our method, “SAMseq With Inferential Samples Helps”, or Swish, with popular differential expression analysis methods. Swish has improved control of the false discovery rate, in particular for transcripts with high inferential uncertainty. We apply Swish to a singlecell RNA-seq dataset, assessing sensitivity to recover DE genes between sub-populations of cells, and compare its performance to the Wilcoxon rank sum test.

Download Full-text

Nonparametric expression analysis using inferential replicate counts

Nucleic Acids Research ◽

10.1093/nar/gkz622 ◽

2019 ◽

Vol 47 (18) ◽

pp. e105-e105 ◽

Cited By ~ 10

Author(s):

Anqi Zhu ◽

Avi Srivastava ◽

Joseph G Ibrahim ◽

Rob Patro ◽

Michael I Love

Keyword(s):

False Discovery Rate ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Parametric Model ◽

Statistical Testing ◽

Wilcoxon Test ◽

Rna Seq ◽

Nonparametric Models ◽

False Discovery

Abstract A primary challenge in the analysis of RNA-seq data is to identify differentially expressed genes or transcripts while controlling for technical biases. Ideally, a statistical testing procedure should incorporate the inherent uncertainty of the abundance estimates arising from the quantification step. Most popular methods for RNA-seq differential expression analysis fit a parametric model to the counts for each gene or transcript, and a subset of methods can incorporate uncertainty. Previous work has shown that nonparametric models for RNA-seq differential expression may have better control of the false discovery rate, and adapt well to new data types without requiring reformulation of a parametric model. Existing nonparametric models do not take into account inferential uncertainty, leading to an inflated false discovery rate, in particular at the transcript level. We propose a nonparametric model for differential expression analysis using inferential replicate counts, extending the existing SAMseq method to account for inferential uncertainty. We compare our method, Swish, with popular differential expression analysis methods. Swish has improved control of the false discovery rate, in particular for transcripts with high inferential uncertainty. We apply Swish to a single-cell RNA-seq dataset, assessing differential expression between sub-populations of cells, and compare its performance to the Wilcoxon test.

Download Full-text

dearseq: a variance component score test for RNA-seq differential analysis that effectively controls the false discovery rate

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa093 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Marine Gauthier ◽

Denis Agniel ◽

Rodolphe Thiébaut ◽

Boris P Hejblum

Keyword(s):

False Discovery Rate ◽

Statistical Power ◽

Differential Expression Analysis ◽

Score Test ◽

Real Data ◽

Differential Analysis ◽

Rna Seq ◽

Data Set ◽

Mathematical Proofs ◽

False Discovery

Abstract RNA-seq studies are growing in size and popularity. We provide evidence that the most commonly used methods for differential expression analysis (DEA) may yield too many false positive results in some situations. We present dearseq, a new method for DEA that controls the false discovery rate (FDR) without making any assumption about the true distribution of RNA-seq data. We show that dearseq controls the FDR while maintaining strong statistical power compared to the most popular methods. We demonstrate this behavior with mathematical proofs, simulations and a real data set from a study of tuberculosis, where our method produces fewer apparent false positives.

Download Full-text

Faculty Opinions recommendation of An investigation of the false discovery rate and the misinterpretation of p-values.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.725432010.793514527 ◽

2016 ◽

Author(s):

Geoffrey Goodhill

Keyword(s):

False Discovery Rate ◽

P Values ◽

False Discovery

Download Full-text

A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis

Statistical Methods & Applications ◽

10.1007/s10260-021-00560-y ◽

2021 ◽

Author(s):

Ali Karimnezhad

Keyword(s):

Data Analysis ◽

False Discovery Rate ◽

Efficient Method ◽

Genome Wide Association ◽

Local False Discovery Rate ◽

Rate Estimation ◽

False Discovery ◽

Genome Wide ◽

False Discovery Rate Estimation ◽

Association Data

Download Full-text

False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation

Journal of the American Statistical Association ◽

10.1080/01621459.2021.1945459 ◽

2021 ◽

pp. 1-34

Author(s):

Lilun Du ◽

Xu Guo ◽

Wenguang Sun ◽

Changliang Zou

Keyword(s):

False Discovery Rate ◽

Rate Control ◽

Data Aggregation ◽

False Discovery Rate Control ◽

False Discovery

Download Full-text

False Discovery Rate in Linkage and Association Genome Screens for Complex Disorders

Genetics ◽

10.1093/genetics/164.2.829 ◽

2003 ◽

Vol 164 (2) ◽

pp. 829-833

Author(s):

Chiara Sabatti ◽

Susan Service ◽

Nelson Freimer

Keyword(s):

Gene Mapping ◽

False Discovery Rate ◽

Disease Gene ◽

Susceptibility Genes ◽

Complex Disorders ◽

Disease Gene Mapping ◽

False Discovery ◽

Simple Step ◽

Multiple Comparison Procedure ◽

Step Down

Abstract We explore the implications of the false discovery rate (FDR) controlling procedure in disease gene mapping. With the aid of simulations, we show how, under models commonly used, the simple step-down procedure introduced by Benjamini and Hochberg controls the FDR for the dependent tests on which linkage and association genome screens are based. This adaptive multiple comparison procedure may offer an important tool for mapping susceptibility genes for complex diseases.

Download Full-text