Threshold selection in feature screening for error rate control*

Author(s):  
Xu Guo ◽  
Haojie Ren ◽  
Changliang Zou ◽  
Runze Li
2020 ◽  
Vol 17 (6) ◽  
pp. 2062-2073 ◽  
Author(s):  
Zengyou He ◽  
Can Zhao ◽  
Hao Liang ◽  
Bo Xu ◽  
Quan Zou

2017 ◽  
Vol 13 (1) ◽  
Author(s):  
Inna Gerlovina ◽  
Mark J. van der Laan ◽  
Alan Hubbard

AbstractMultiple comparisons and small sample size, common characteristics of many types of “Big Data” including those that are produced by genomic studies, present specific challenges that affect reliability of inference. Use of multiple testing procedures necessitates calculation of very small tail probabilities of a test statistic distribution. Results based on large deviation theory provide a formal condition that is necessary to guarantee error rate control given practical sample sizes, linking the number of tests and the sample size; this condition, however, is rarely satisfied. Using methods that are based on Edgeworth expansions (relying especially on the work of Peter Hall), we explore the impact of departures of sampling distributions from typical assumptions on actual error rates. Our investigation illustrates how far the actual error rates can be from the declared nominal levels, suggesting potentially wide-spread problems with error rate control, specifically excessive false positives. This is an important factor that contributes to “reproducibility crisis”. We also review some other commonly used methods (such as permutation and methods based on finite sampling inequalities) in their application to multiple testing/small sample data. We point out that Edgeworth expansions, providing higher order approximations to the sampling distribution, offer a promising direction for data analysis that could improve reliability of studies relying on large numbers of comparisons with modest sample sizes.


2021 ◽  
pp. 096228022098338
Author(s):  
Jinjin Tian ◽  
Aaditya Ramdas

Biological research often involves testing a growing number of null hypotheses as new data are accumulated over time. We study the problem of online control of the familywise error rate, that is testing an a priori unbounded sequence of hypotheses ( p-values) one by one over time without knowing the future, such that with high probability there are no false discoveries in the entire sequence. This paper unifies algorithmic concepts developed for offline (single batch) familywise error rate control and online false discovery rate control to develop novel online familywise error rate control methods. Though many offline familywise error rate methods (e.g., Bonferroni, fallback procedures and Sidak’s method) can trivially be extended to the online setting, our main contribution is the design of new, powerful, adaptive online algorithms that control the familywise error rate when the p-values are independent or locally dependent in time. Our numerical experiments demonstrate substantial gains in power, that are also formally proved in an idealized Gaussian sequence model. A promising application to the International Mouse Phenotyping Consortium is described.


2016 ◽  
Vol 10 (1) ◽  
pp. 960-975 ◽  
Author(s):  
Lucas Janson ◽  
Weijie Su

Sign in / Sign up

Export Citation Format

Share Document