False Discovery Rates: A New Deal

Mapping Intimacies ◽

10.1101/038216 ◽

2016 ◽

Cited By ~ 4

Author(s):

Matthew Stephens

Keyword(s):

Empirical Bayes ◽

Large Scale ◽

R Package ◽

False Discovery Rates ◽

Interval Estimates ◽

False Discovery ◽

Unobserved Effects ◽

The One ◽

Significance Measures ◽

Discovery Rates

AbstractWe introduce a new Empirical Bayes approach for large-scale hypothesis testing, including estimating False Discovery Rates (FDRs), and effect sizes. This approach has two key differences from existing approaches to FDR analysis. First, it assumes that the distribution of the actual (unobserved) effects is unimodal, with a mode at 0. This “unimodal assumption” (UA), although natural in many contexts, is not usually incorporated into standard FDR analysis, and we demonstrate how incorporating it brings many benefits. Specifically, the UA facilitates efficient and robust computation – estimating the unimodal distribution involves solving a simple convex optimization problem – and enables more accurate inferences provided that it holds. Second, the method takes as its input two numbers for each test (an effect size estimate, and corresponding standard error), rather than the one number usually used (p value, or z score). When available, using two numbers instead of one helps account for variation in measurement precision across tests. It also facilitates estimation of effects, and unlike standard FDR methods our approach provides interval estimates (credible regions) for each effect in addition to measures of significance. To provide a bridge between interval estimates and significance measures we introduce the term “local false sign rate” to refer to the probability of getting the sign of an effect wrong, and argue that it is a superior measure of significance than the local FDR because it is both more generally applicable, and can be more robustly estimated. Our methods are implemented in an R package ashr available from http://github.com/stephens999/ashr.

Download Full-text

Significance estimation for large scale untargeted metabolomics annotations

10.1101/109389 ◽

2017 ◽

Cited By ~ 4

Author(s):

Kerstin Scheubert ◽

Franziska Hufsky ◽

Daniel Petras ◽

Mingxun Wang ◽

Louis-Félix Nothias ◽

...

Keyword(s):

Small Molecules ◽

Empirical Bayes ◽

Large Scale ◽

Estimation Methods ◽

Scale Analysis ◽

Reference Library ◽

False Discovery Rates ◽

False Discovery ◽

Large Scale Analysis ◽

Discovery Rates

AbstractThe annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate. Relying on estimations of false discovery rates, we explore the effect of different spectrum-spectrum match criteria on the number and the nature of the molecules annotated. We show that the spectral matching settings needs to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139% (ranging from −92% up to +5705%) when compared to a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to define the scoring criteria for large scale analysis of untargeted small molecule data that has been essential in the advancement of large scale proteomics, transcriptomics, and genomics science.

Download Full-text

Large and ancient linguistic areas

Language Dispersal, Diversification, and Contact ◽

10.1093/oso/9780198723813.003.0005 ◽

2020 ◽

pp. 78-100

Author(s):

Balthasar Bickel

Keyword(s):

Regression Models ◽

Large Scale ◽

Population History ◽

False Discovery Rates ◽

Ancient Population ◽

False Discovery ◽

Language Universals ◽

Discovery Rates ◽

Pacific Area

Large-scale areal patterns point to ancient population history and form a well-known confound for language universals. Despite their importance, demonstrating such patterns remains a challenge. This chapter argues that large-scale area hypotheses are better tested by modeling diachronic family biases than by controlling for genealogical relations in regression models. A case study of the Trans-Pacific area reveals that diachronic bias estimates do not depend much on the amount of phylogenetic information that is used when inferring them. After controlling for false discovery rates, about 39 variables in WALS and AUTOTYP show diachronic biases that differ significantly inside vs. outside the Trans-Pacific area. Nearly three times as many biases hold outside than inside the Trans-Pacific area, indicating that the Trans-Pacific area is not so much characterized by the spread of biases but rather by the retention of earlier diversity, in line with earlier suggestions in the literature.

Download Full-text

False discovery rates for large-scale model checking under certain dependence

Communication in Statistics- Theory and Methods ◽

10.1080/03610926.2017.1300279 ◽

2017 ◽

Vol 47 (1) ◽

pp. 64-79

Author(s):

Lu Deng ◽

Xuemin Zi ◽

Zhonghua Li

Keyword(s):

Model Checking ◽

Large Scale ◽

Scale Model ◽

False Discovery Rates ◽

False Discovery ◽

Large Scale Model ◽

Discovery Rates

Download Full-text

benchmarkR: an R package for benchmarking genome-scale methods

10.1101/018200 ◽

2015 ◽

Author(s):

Xiaobei Zhou ◽

Charity W Law ◽

Mark D Robinson

Keyword(s):

Receiver Operating Characteristic ◽

Statistical Methods ◽

Large Scale ◽

Operating Characteristic ◽

Scale Validation ◽

Roc Curves ◽

R Package ◽

False Discovery Rates ◽

False Discovery ◽

Receiver Operating

benchmarkR is an R package designed to assess and visualize the performance of statistical methods for datasets that have an independent truth (e.g., simulations or datasets with large-scale validation), in particular for methods that claim to control false discovery rates (FDR). We augment some of the standard performance plots (e.g., receiver operating characteristic, or ROC, curves) with information about how well the methods are calibrated (i.e., whether they achieve their expected FDR control). For example, performance plots are extended with a point to highlight the power or FDR at a user-set threshold (e.g., at a method's estimated 5% FDR). The package contains general containers to store simulation results (SimResults) and methods to create graphical summaries, such as receiver operating characteristic curves (rocX), false discovery plots (fdX) and power-to-achieved FDR plots (powerFDR); each plot is augmented with some form of calibration information. We find these plots to be an improved way to interpret relative performance of statistical methods for genomic datasets where many hypothesis tests are performed. The strategies, however, are general and will find applications in other domains.

Download Full-text

Quantifying, and correcting for, the impact of questionable research practices on false discovery rates in psychological science

10.31234/osf.io/fu9gy ◽

2020 ◽

Cited By ~ 1

Author(s):

Dwight Kravitz ◽

Stephen Mitroff

Keyword(s):

Large Scale ◽

Unintended Consequences ◽

Research Practices ◽

False Discovery Rates ◽

Questionable Research Practices ◽

Real Effects ◽

False Discovery ◽

Replication Crisis ◽

The Impact ◽

Discovery Rates

Large-scale replication failures have shaken confidence in the social sciences, psychology in particular. Most researchers acknowledge the problem, yet there is widespread debate about the causes and solutions. Using “big data,” the current project demonstrates that unintended consequences of three common questionable research practices (retaining pilot data, adding data after checking for significance, and not publishing null findings) can explain the lion’s share of the replication failures. A massive dataset was randomized to create a true null effect between two conditions, and then these three practices were applied. They produced false discovery rates far greater than 5% (the generally accepted rate), and were strong enough to obscure, or even reverse, the direction of real effects. These demonstrations suggest that much of the replication crisis might be explained by simple, misguided experimental choices. This approach also produces empirically-based corrections to account for these practices when they are unavoidable, providing a viable path forward.

Download Full-text

Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0

Journal of the American Society for Mass Spectrometry ◽

10.1007/s13361-016-1460-7 ◽

2016 ◽

Vol 27 (11) ◽

pp. 1719-1727 ◽

Cited By ~ 103

Author(s):

Matthew The ◽

Michael J. MacCoss ◽

William S. Noble ◽

Lukas Käll

Keyword(s):

Large Scale ◽

Data Sets ◽

Proteomics Data ◽

False Discovery Rates ◽

False Discovery ◽

Discovery Rates

Download Full-text

Dimension constraints improve hypothesis testing for large-scale, graph-associated, brain-image data

Biostatistics ◽

10.1093/biostatistics/kxab001 ◽

2021 ◽

Author(s):

Tien Vo ◽

Akshay Mishra ◽

Vamsi Ithapu ◽

Vikas Singh ◽

Michael A Newton

Keyword(s):

Empirical Bayes ◽

Large Scale ◽

Image Data ◽

Imaging Data ◽

False Discovery Rates ◽

Testing Procedures ◽

False Discovery ◽

Large Scale Testing ◽

Brain Changes ◽

Connected Subgraphs

Summary For large-scale testing with graph-associated data, we present an empirical Bayes mixture technique to score local false-discovery rates (FDRs). Compared to procedures that ignore the graph, the proposed Graph-based Mixture Model (GraphMM) method gains power in settings where non-null cases form connected subgraphs, and it does so by regularizing parameter contrasts between testing units. Simulations show that GraphMM controls the FDR in a variety of settings, though it may lose control with excessive regularization. On magnetic resonance imaging data from a study of brain changes associated with the onset of Alzheimer’s disease, GraphMM produces greater yield than conventional large-scale testing procedures.

Download Full-text