scholarly journals The Functional False Discovery Rate with Applications to Genomics

2017 ◽  
Author(s):  
Xiongzhi Chen ◽  
David G. Robinson ◽  
John D. Storey

AbstractThe false discovery rate measures the proportion of false discoveries among a set of hypothesis tests called significant. This quantity is typically estimated based on p-values or test statistics. In some scenarios, there is additional information available that may be used to more accurately estimate the false discovery rate. We develop a new framework for formulating and estimating false discovery rates and q-values when an additional piece of information, which we call an “informative variable”, is available. For a given test, the informative variable provides information about the prior probability a null hypothesis is true or the power of that particular test. The false discovery rate is then treated as a function of this informative variable. We consider two applications in genomics. Our first is a genetics of gene expression (eQTL) experiment in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene. Our second application is to detect differentially expressed genes in an RNA-seq study carried out in mice. The informative variable in this study is the per-gene read depth. The framework we develop is quite general, and it should be useful in a broad range of scientific applications.

2015 ◽  
Author(s):  
Simina M. Boca ◽  
Jeffrey T. Leek

AbstractModern scientific studies from many diverse areas of research abound with multiple hypothesis testing concerns. The false discovery rate is one of the most commonly used error rates for measuring and controlling rates of false discoveries when performing multiple tests. Adaptive false discovery rates rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here we propose a regression framework to estimate the proportion of null hypotheses conditional on observed covariates. This may then be used as a multiplication factor with the Benjamini-Hochberg adjusted p-values, leading to a plug-in false discovery rate estimator. Our case study concerns a genome-wise association meta-analysis which considers associations with body mass index. In our framework, we are able to use the sample sizes for the individual genomic loci and the minor allele frequencies as covariates. We further evaluate our approach via a number of simulation scenarios.


2018 ◽  
Author(s):  
Uri Keich ◽  
Kaipo Tamura ◽  
William Stafford Noble

AbstractDecoy database search with target-decoy competition (TDC) provides an intuitive, easy-to-implement method for estimating the false discovery rate (FDR) associated with spectrum identifications from shotgun proteomics data. However, the procedure can yield different results for a fixed dataset analyzed with different decoy databases, and this decoy-induced variability is particularly problematic for smaller FDR thresholds, datasets or databases. In such cases, the nominal FDR might be 1% but the true proportion of false discoveries might be 10%. The averaged TDC protocol combats this problem by exploiting multiple independently shuffled decoy databases to provide an FDR estimate with reduced variability. We provide a tutorial introduction to aTDC, describe an improved variant of the protocol that offers increased statistical power, and discuss how to deploy aTDC in practice using the Crux software toolkit.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 1238-1238
Author(s):  
Anita D'Souza ◽  
Sebastian M. Armasu ◽  
Mariza de Andrade ◽  
John A. Heit

Abstract Abstract 1238 Background: SNPs within genes encoding factor XI (F11), fibrinogen genes (FGA, FGG) and other candidate genes within the procoagulant, anticoagulant, fibrinolytic, innate immunity and endocrine pathways have been reported as associated with VTE. However, the independent risk of VTE associated with many of these SNPs after controlling for factor V Leiden, Prothrombin G20210A and ABO blood group non-O carrier status is uncertain. Objective: To replicate candidate gene SNPs previously reported as associated with VTE. Methods: As part of a large replication study, we included 17 SNPs previously reported as associated with VTE in a custom Illumina Golden gate (total n=1093 SNPs) genotyping array. We genotyped 1270 non-Hispanic adults of European ancestry with objectively-diagnosed VTE (cases; no cancer, venous catheter or antiphospholipid antibodies) and 1302 controls (frequency-matched on case age, gender, race, MI/stroke status). Genotyping results from high-quality control DNA (SNP call rate ≥ 95%) was used to generate a cluster algorithm. The primary outcome was VTE status, a binary measure. The covariates were age at interview or blood sample collection, sex, stroke and/or MI status, and state of residence. To adjust for population stratification, we performed the multidimensional scaling (MDS) analysis option in PLINK v 1.07 to identify outliers in our population using the ancestry informative markers. We tested for an association between each SNP and VTE using unconditional logistic regression, adjusting for age, sex, stroke/MI status, state of residence and ABO rs514659 (in high linkage disequilibrium with non-O blood type). The analyses were corrected for multiple comparisons using an extension of false discovery rates. The false discovery rate (reported as a Q-value) is an analogue measure of the p-value that takes into account the number of statistical tests and estimates the expected proportion of false positive tests incurred when a particular SNP is significant. All analyses were performed using PLINK v 1.07. Results: MDS gave no evidence of population stratification. Genotyping was unsuccessful for two of the 17 SNPs. We found significant associations between VTE and SNPs in F11, FGG, TC2D and FGA (Table). However, the false discovery rates for all significant SNPs except F11 rs3756008 were >0.05, suggesting that the observed associations were likely falsely positive due to multiple comparisons. Even at a false discovery rate of Q-value=0.0099, one would expect ∼13 SNPs (0.0099 × 1302 SNPs) to be falsely associated with VTE due to multiple comparisons. Consequently, even our observed association between F11 rs3756008 and VTE remains tentative. Conclusions: We were unable to replicate reported associations between 15 SNPs and VTE. Our results emphasize the necessity of replication studies in different populations to confirm reported associations of SNPs with VTE. Disclosures: Heit: Daiichi Sankyo: Consultancy, Honoraria.


mSystems ◽  
2017 ◽  
Vol 2 (6) ◽  
Author(s):  
Lingjing Jiang ◽  
Amnon Amir ◽  
James T. Morton ◽  
Ruth Heller ◽  
Ery Arias-Castro ◽  
...  

ABSTRACT DS-FDR can achieve higher statistical power to detect significant findings in sparse and noisy microbiome data compared to the commonly used Benjamini-Hochberg procedure and other FDR-controlling procedures. Differential abundance testing is a critical task in microbiome studies that is complicated by the sparsity of data matrices. Here we adapt for microbiome studies a solution from the field of gene expression analysis to produce a new method, discrete false-discovery rate (DS-FDR), that greatly improves the power to detect differential taxa by exploiting the discreteness of the data. Additionally, DS-FDR is relatively robust to the number of noninformative features, and thus removes the problem of filtering taxonomy tables by an arbitrary abundance threshold. We show by using a combination of simulations and reanalysis of nine real-world microbiome data sets that this new method outperforms existing methods at the differential abundance testing task, producing a false-discovery rate that is up to threefold more accurate, and halves the number of samples required to find a given difference (thus increasing the efficiency of microbiome experiments considerably). We therefore expect DS-FDR to be widely applied in microbiome studies. IMPORTANCE DS-FDR can achieve higher statistical power to detect significant findings in sparse and noisy microbiome data compared to the commonly used Benjamini-Hochberg procedure and other FDR-controlling procedures.


2018 ◽  
Author(s):  
LM Hall ◽  
AE Hendricks

AbstractBackgroundRecently, there has been increasing concern about the replicability, or lack thereof, of published research. An especially high rate of false discoveries has been reported in some areas motivating the creation of resource-intensive collaborations to estimate the replication rate of published research by repeating a large number of studies. The substantial amount of resources required by these replication projects limits the number of studies that can be repeated and consequently the generalizability of the findings.Methods and findingsIn 2013, Jager and Leek developed a method to estimate the empirical false discovery rate from journal abstracts and applied their method to five high profile journals. Here, we use the relative efficiency of Jager and Leek’s method to gather p-values from over 30,000 abstracts and to subsequently estimate the false discovery rate for 94 journals over a five-year time span. We model the empirical false discovery rate by journal subject area (cancer or general medicine), impact factor, and Open Access status. We find that the empirical false discovery rate is higher for cancer vs. general medicine journals (p = 5.14E-6). Within cancer journals, we find that this relationship is further modified by journal impact factor where a lower journal impact factor is associated with a higher empirical false discovery rates (p = 0.012, 95% CI: -0.010, -0.001). We find no significant differences, on average, in the false discovery rate for Open Access vs closed access journals (p = 0.256, 95% CI: -0.014, 0.051).ConclusionsWe find evidence of a higher false discovery rate in cancer journals compared to general medicine journals, especially those with a lower journal impact factor. For cancer journals, a lower journal impact factor of one point is associated with a 0.006 increase in the empirical false discovery rate, on average. For a false discovery rate of 0.05, this would result in over a 10% increase to 0.056. Conversely, we find no significant evidence of a higher false discovery rate, on average, for Open Access vs. closed access journals from InCites. Our results provide identify areas of research that may need of additional scrutiny and support to facilitate replicable science. Given our publicly available R code and data, others can complete a broad assessment of the empirical false discovery rate across other subject areas and characteristics of published research.


2021 ◽  
Vol 2 (2) ◽  
pp. p1
Author(s):  
Kirk Davis ◽  
Rodney Maiden

Although the limitations of null hypothesis significance testing (NHST) are well documented in the psychology literature, the accuracy paradox, which concisely states an important limitation of published research, is never mentioned. The accuracy paradox appears when a test with higher accuracy does a poorer job of correctly classifying a particular outcome than a test with lower accuracy, which suggests that a reliance on accuracy as a metric for a test’s usefulness is not always the best metric. Since accuracy is a function of type I and II error rates, it can be misleading to interpret a study’s results as accurate simply because these errors are minimized. Once a decision has been made regarding statistical significance, type I and II error rates are not directly informative to the reader. Instead, false discovery and false omission rates are more informative when evaluating the results of a study. Given the prevalence of publication bias and small effect sizes in the literature, the possibility of a false discovery is especially important to consider. When false discovery rates are estimated, it is easy to understand why many studies in psychology cannot be replicated.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 441
Author(s):  
Megan H. Murray ◽  
Jeffrey D. Blume

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 441
Author(s):  
Megan H. Murray ◽  
Jeffrey D. Blume

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.


Sign in / Sign up

Export Citation Format

Share Document