Large and ancient linguistic areas

Author(s):  
Balthasar Bickel

Large-scale areal patterns point to ancient population history and form a well-known confound for language universals. Despite their importance, demonstrating such patterns remains a challenge. This chapter argues that large-scale area hypotheses are better tested by modeling diachronic family biases than by controlling for genealogical relations in regression models. A case study of the Trans-Pacific area reveals that diachronic bias estimates do not depend much on the amount of phylogenetic information that is used when inferring them. After controlling for false discovery rates, about 39 variables in WALS and AUTOTYP show diachronic biases that differ significantly inside vs. outside the Trans-Pacific area. Nearly three times as many biases hold outside than inside the Trans-Pacific area, indicating that the Trans-Pacific area is not so much characterized by the spread of biases but rather by the retention of earlier diversity, in line with earlier suggestions in the literature.

2020 ◽  
Author(s):  
Dwight Kravitz ◽  
Stephen Mitroff

Large-scale replication failures have shaken confidence in the social sciences, psychology in particular. Most researchers acknowledge the problem, yet there is widespread debate about the causes and solutions. Using “big data,” the current project demonstrates that unintended consequences of three common questionable research practices (retaining pilot data, adding data after checking for significance, and not publishing null findings) can explain the lion’s share of the replication failures. A massive dataset was randomized to create a true null effect between two conditions, and then these three practices were applied. They produced false discovery rates far greater than 5% (the generally accepted rate), and were strong enough to obscure, or even reverse, the direction of real effects. These demonstrations suggest that much of the replication crisis might be explained by simple, misguided experimental choices. This approach also produces empirically-based corrections to account for these practices when they are unavoidable, providing a viable path forward.


2016 ◽  
Author(s):  
Matthew Stephens

AbstractWe introduce a new Empirical Bayes approach for large-scale hypothesis testing, including estimating False Discovery Rates (FDRs), and effect sizes. This approach has two key differences from existing approaches to FDR analysis. First, it assumes that the distribution of the actual (unobserved) effects is unimodal, with a mode at 0. This “unimodal assumption” (UA), although natural in many contexts, is not usually incorporated into standard FDR analysis, and we demonstrate how incorporating it brings many benefits. Specifically, the UA facilitates efficient and robust computation – estimating the unimodal distribution involves solving a simple convex optimization problem – and enables more accurate inferences provided that it holds. Second, the method takes as its input two numbers for each test (an effect size estimate, and corresponding standard error), rather than the one number usually used (p value, or z score). When available, using two numbers instead of one helps account for variation in measurement precision across tests. It also facilitates estimation of effects, and unlike standard FDR methods our approach provides interval estimates (credible regions) for each effect in addition to measures of significance. To provide a bridge between interval estimates and significance measures we introduce the term “local false sign rate” to refer to the probability of getting the sign of an effect wrong, and argue that it is a superior measure of significance than the local FDR because it is both more generally applicable, and can be more robustly estimated. Our methods are implemented in an R package ashr available from http://github.com/stephens999/ashr.


2017 ◽  
Author(s):  
Kerstin Scheubert ◽  
Franziska Hufsky ◽  
Daniel Petras ◽  
Mingxun Wang ◽  
Louis-Félix Nothias ◽  
...  

AbstractThe annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate. Relying on estimations of false discovery rates, we explore the effect of different spectrum-spectrum match criteria on the number and the nature of the molecules annotated. We show that the spectral matching settings needs to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139% (ranging from −92% up to +5705%) when compared to a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to define the scoring criteria for large scale analysis of untargeted small molecule data that has been essential in the advancement of large scale proteomics, transcriptomics, and genomics science.


2015 ◽  
Vol 15 (3) ◽  
pp. 989-1006 ◽  
Author(s):  
Gene Hart-Smith ◽  
Daniel Yagoub ◽  
Aidan P. Tay ◽  
Russell Pickford ◽  
Marc R. Wilkins

Biometrika ◽  
2011 ◽  
Vol 98 (2) ◽  
pp. 251-271 ◽  
Author(s):  
Bradley Efron ◽  
Nancy R. Zhang

Sign in / Sign up

Export Citation Format

Share Document