scholarly journals Smoothed Nested Testing on Directed Acyclic Graphs

Biometrika ◽  
2021 ◽  
Author(s):  
J H loper ◽  
L Lei ◽  
W Fithian ◽  
W Tansey

Summary We consider the problem of multiple hypothesis testing when there is a logical nested structure to the hypotheses. When one hypothesis is nested inside another, the outer hypothesis must be false if the inner hypothesis is false. We model the nested structure as a directed acyclic graph, including chain and tree graphs as special cases. Each node in the graph is a hypothesis and rejecting a node requires also rejecting all of its ancestors. We propose a general framework for adjusting node-level test statistics using the known logical constraints. Within this framework, we study a smoothing procedure that combines each node with all of its descendants to form a more powerful statistic. We prove a broad class of smoothing strategies can be used with existing selection procedures to control the familywise error rate, false discovery exceedance rate, or false discovery rate, so long as the original test statistics are independent under the null. When the null statistics are not independent but are derived from positively-correlated normal observations, we prove control for all three error rates when the smoothing method is arithmetic averaging of the observations. Simulations and an application to a real biology dataset demonstrate that smoothing leads to substantial power gains.

Biometrika ◽  
2020 ◽  
Vol 107 (3) ◽  
pp. 761-768 ◽  
Author(s):  
E Dobriban

Summary Multiple hypothesis testing problems arise naturally in science. This note introduces a new fast closed testing method for multiple testing which controls the familywise error rate. Controlling the familywise error rate is state-of-the-art in many important application areas and is preferred over false discovery rate control for many reasons, including that it leads to stronger reproducibility. The closure principle rejects an individual hypothesis if all global nulls of subsets containing it are rejected using some test statistics. It takes exponential time in the worst case. When the tests are symmetric and monotone, the proposed method is an exact algorithm for computing the closure, is quadratic in the number of tests, and is linear in the number of discoveries. Our framework generalizes most examples of closed testing, such as Holm’s method and the Bonferroni method. As a special case of the method, we propose the Simes and higher criticism fusion test, which is powerful both for detecting a few strong signals and for detecting many moderate signals.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e6035 ◽  
Author(s):  
Simina M. Boca ◽  
Jeffrey T. Leek

Modern scientific studies from many diverse areas of research abound with multiple hypothesis testing concerns. The false discovery rate (FDR) is one of the most commonly used approaches for measuring and controlling error rates when performing multiple tests. Adaptive FDRs rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here, we propose a regression framework to estimate the proportion of null hypotheses conditional on observed covariates. This may then be used as a multiplication factor with the Benjamini–Hochberg adjusted p-values, leading to a plug-in FDR estimator. We apply our method to a genome-wise association meta-analysis for body mass index. In our framework, we are able to use the sample sizes for the individual genomic loci and the minor allele frequencies as covariates. We further evaluate our approach via a number of simulation scenarios. We provide an implementation of this novel method for estimating the proportion of null hypotheses in a regression framework as part of the Bioconductor package swfdr.


Biometrika ◽  
2019 ◽  
Vol 106 (4) ◽  
pp. 841-856 ◽  
Author(s):  
Jelle J Goeman ◽  
Rosa J Meijer ◽  
Thijmen J P Krebs ◽  
Aldo Solari

Summary Closed testing procedures are classically used for familywise error rate control, but they can also be used to obtain simultaneous confidence bounds for the false discovery proportion in all subsets of the hypotheses, allowing for inference robust to post hoc selection of subsets. In this paper we investigate the special case of closed testing with Simes local tests. We construct a novel fast and exact shortcut and use it to investigate the power of this approach when the number of hypotheses goes to infinity. We show that if a minimal level of signal is present, the average power to detect false hypotheses at any desired false discovery proportion does not vanish. Additionally, we show that the confidence bounds for false discovery proportion are consistent estimators for the true false discovery proportion for every nonvanishing subset. We also show close connections between Simes-based closed testing and the procedure of Benjamini and Hochberg.


2015 ◽  
Author(s):  
Simina M. Boca ◽  
Jeffrey T. Leek

AbstractModern scientific studies from many diverse areas of research abound with multiple hypothesis testing concerns. The false discovery rate is one of the most commonly used error rates for measuring and controlling rates of false discoveries when performing multiple tests. Adaptive false discovery rates rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here we propose a regression framework to estimate the proportion of null hypotheses conditional on observed covariates. This may then be used as a multiplication factor with the Benjamini-Hochberg adjusted p-values, leading to a plug-in false discovery rate estimator. Our case study concerns a genome-wise association meta-analysis which considers associations with body mass index. In our framework, we are able to use the sample sizes for the individual genomic loci and the minor allele frequencies as covariates. We further evaluate our approach via a number of simulation scenarios.


2021 ◽  
Vol 11 (2) ◽  
Author(s):  
Cynthia Dwork ◽  
Weijie Su ◽  
Li Zhang

Differential privacy provides a rigorous framework for privacy-preserving data analysis. This paper proposes the first differentially private procedure for controlling the false discovery rate (FDR) in multiple hypothesis testing. Inspired by the Benjamini-Hochberg procedure (BHq), our approach is to first repeatedly add noise to the logarithms of the p-values to ensure differential privacy and to select an approximately smallest p-value serving as a promising candidate at each iteration; the selected p-values are further supplied to the BHq and our private procedure releases only the rejected ones. Moreover, we develop a new technique that is based on a backward submartingale for proving FDR control of a broad class of multiple testing procedures, including our private procedure, and both the BHq step- up and step-down procedures. As a novel aspect, the proof works for arbitrary dependence between the true null and false null test statistics, while FDR control is maintained up to a small multiplicative factor.


2019 ◽  
Vol 21 (Supplement_3) ◽  
pp. iii71-iii71
Author(s):  
T Kaisman-Elbaz ◽  
Y Elbaz ◽  
V Merkin ◽  
L Dym ◽  
A Noy ◽  
...  

Abstract BACKGROUND Glioblastoma is known for its dismal prognosis though its dependency on patients’ readily available RBCs parameters defining the patient’s anemic status such as hemoglobin level and Red blood cells distribution Width (RDW) is not fully established. Several works demonstrated a connection between low hemoglobin level or high RDW values to overall glioblastoma patient’s survival, but in other works, a clear connection was not found. This study addresses this unclarity. MATERIAL AND METHODS In this work, 170 glioblastoma patients, diagnosed and treated in Soroka University Medical Center (SUMC) in the last 12 years were retrospectively inspected for their survival dependency on pre-operative RBCs parameters using multivariate analysis followed by false discovery rate procedure due to the multiple hypothesis testing. A survival stratification tree and Kaplan-Meier survival curves that indicate the patient’s prognosis according to these parameters were prepared. RESULTS Beside KPS>70 and tumor resection supplemented by oncological treatment, age<70 (HR=0.4, 95% CI 0.24–0.65), low hemoglobin level (HR=1.79, 95% CI 1.06–2.99) and RDW<14% (HR=0.57, 95% CI 0.37–0.88) were found to be prognostic to patients’ overall survival in multivariate analysis, accounting for false discovery rate of less than 5%. CONCLUSION A survival stratification highlighted a non-anemic subgroup of nearly 30% of the cohort’s patients whose median overall survival was 21.1 months (95% CI 16.2–27.2) - higher than the average Stupp protocol overall median survival of about 15 months. A discussion on the beneficial or detrimental effect of RBCs parameters on glioblastoma prognosis and its possible causes is given.


Genetics ◽  
2002 ◽  
Vol 161 (2) ◽  
pp. 905-914 ◽  
Author(s):  
Hakkyo Lee ◽  
Jack C M Dekkers ◽  
M Soller ◽  
Massoud Malek ◽  
Rohan L Fernando ◽  
...  

Abstract Controlling the false discovery rate (FDR) has been proposed as an alternative to controlling the genomewise error rate (GWER) for detecting quantitative trait loci (QTL) in genome scans. The objective here was to implement FDR in the context of regression interval mapping for multiple traits. Data on five traits from an F2 swine breed cross were used. FDR was implemented using tests at every 1 cM (FDR1) and using tests with the highest test statistic for each marker interval (FDRm). For the latter, a method was developed to predict comparison-wise error rates. At low error rates, FDR1 behaved erratically; FDRm was more stable but gave similar significance thresholds and number of QTL detected. At the same error rate, methods to control FDR gave less stringent significance thresholds and more QTL detected than methods to control GWER. Although testing across traits had limited impact on FDR, single-trait testing was recommended because there is no theoretical reason to pool tests across traits for FDR. FDR based on FDRm was recommended for QTL detection in interval mapping because it provides significance tests that are meaningful, yet not overly stringent, such that a more complete picture of QTL is revealed.


2000 ◽  
Vol 25 (1) ◽  
pp. 60-83 ◽  
Author(s):  
Yoav Benjamini ◽  
Yosef Hochberg

A new approach to problems of multiple significance testing was presented in Benjamini and Hochberg (1995), which calls for controlling the expected ratio of the number of erroneous rejections to the number of rejections–the False Discovery Rate (FDR). The procedure given there was shown to control the FDR for independent test statistics. When some of the hypotheses are in fact false, that procedure is too conservative. We present here an adaptive procedure, where the number of true null hypotheses is estimated first as in Hochberg and Benjamini (1990), and this estimate is used in the procedure of Benjamini and Hochberg (1995). The result is still a simple stepwise procedure, to which we also give a graphical companion. The new procedure is used in several examples drawn from educational and behavioral studies, addressing problems in multi-center studies, subset analysis and meta-analysis. The examples vary in the number of hypotheses tested, and the implication of the new procedure on the conclusions. In a large simulation study of independent test statistics the adaptive procedure is shown to control the FDR and have substantially better power than the previously suggested FDR controlling method, which by itself is more powerful than the traditional family wise error-rate controlling methods. In cases where most of the tested hypotheses are far from being true there is hardly any penalty due to the simultaneous testing of many hypotheses.


2021 ◽  
Author(s):  
András Lánczky ◽  
Balázs Győrffy

UNSTRUCTURED Survival analysis is a cornerstone of medical research enabling the assessment of clinical outcome for disease progression and treatment efficiency. Despite its central importance, neither commonly used spreadsheet software can handle it nor is there a web server for its computation. Here we introduce a web-based tool capable to perform uni- and multivariate Cox proportional hazards survival analysis using data generated by genomic, transcriptomic, proteomic, or metabolomics studies. We implemented different methods to establish cutoff values for trichotomization or for the dichotomization of continuous data. False discovery rate is computed to correct for multiple hypothesis testing. Multivariate analysis option enables comparing omics data with clinical variables. The registration-free web-service is available at https://kmplot.com/custom_data. The tool fills a gap and will be an invaluable help for basic medical and clinical research.


Sign in / Sign up

Export Citation Format

Share Document