scholarly journals Visualizing the Costs and Benefits of Correcting P-Values for Multiple Hypothesis Testing in Omics Data

2021 ◽  
Author(s):  
Steven R. Shuken ◽  
Margaret W. McNerney

AbstractThe multiple hypothesis testing problem is inherent in high-throughput quantitative genomic, transcriptomic, proteomic, and other “omic” screens. The correction of p-values for multiple testing is a critical element of quantitative omic data analysis, yet many researchers are unfamiliar with the sensitivity costs and false discovery rate (FDR) benefits of p-value correction. We developed models of quantitative omic experiments, modeled the costs and benefits of p-value correction, and visualized the results with color-coded volcano plots. We developed an R Shiny web application for further exploration of these models which we call the Simulator of P-value Multiple Hypothesis Correction (SIMPLYCORRECT). We modeled experiments in which no analytes were truly differential between the control and test group (all null hypotheses true), all analytes were differential, or a mixture of differential and non-differential analytes were present. We corrected p-values using the Benjamini-Hochberg (BH), Bonferroni, and permutation FDR methods and compared the costs and benefits of each. By manipulating variables in the models, we demonstrated that increasing sample size or decreasing variability can reduce or eliminate the sensitivity cost of p-value correction and that permutation FDR correction can yield more hits than BH-adjusted and even unadjusted p-values in strongly differential data. SIMPLYCORRECT can serve as a tool in education and research to show how p-value adjustment and various parameters affect the results of quantitative omics experiments.

2013 ◽  
Vol 143 (4) ◽  
pp. 764-770 ◽  
Author(s):  
Shunpu Zhang ◽  
Huann-Sheng Chen ◽  
Ruth M. Pfeiffer

PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0245824
Author(s):  
Otília Menyhart ◽  
Boglárka Weltz ◽  
Balázs Győrffy

Scientists from nearly all disciplines face the problem of simultaneously evaluating many hypotheses. Conducting multiple comparisons increases the likelihood that a non-negligible proportion of associations will be false positives, clouding real discoveries. Drawing valid conclusions require taking into account the number of performed statistical tests and adjusting the statistical confidence measures. Several strategies exist to overcome the problem of multiple hypothesis testing. We aim to summarize critical statistical concepts and widely used correction approaches while also draw attention to frequently misinterpreted notions of statistical inference. We provide a step-by-step description of each multiple-testing correction method with clear examples and present an easy-to-follow guide for selecting the most suitable correction technique. To facilitate multiple-testing corrections, we developed a fully automated solution not requiring programming skills or the use of a command line. Our registration free online tool is available at www.multipletesting.com and compiles the five most frequently used adjustment tools, including the Bonferroni, the Holm (step-down), the Hochberg (step-up) corrections, allows to calculate False Discovery Rates (FDR) and q-values. The current summary provides a much needed practical synthesis of basic statistical concepts regarding multiple hypothesis testing in a comprehensible language with well-illustrated examples. The web tool will fill the gap for life science researchers by providing a user-friendly substitute for command-line alternatives.


2018 ◽  
Author(s):  
Tuomas Puoliväli ◽  
Satu Palva ◽  
J. Matias Palva

AbstractBackgroundReproducibility of research findings has been recently questioned in many fields of science, including psychology and neurosciences. One factor influencing reproducibility is the simultaneous testing of multiple hypotheses, which increases the number of false positive findings unless the p-values are carefully corrected. While this multiple testing problem is well known and has been studied for decades, it continues to be both a theoretical and practical problem.New MethodHere we assess the reproducibility of research involving multiple-testing corrected for family-wise error rate (FWER) or false discovery rate (FDR) by techniques based on random field theory (RFT), cluster-mass based permutation testing, adaptive FDR, and several classical methods. We also investigate the performance of these methods under two different models.ResultsWe found that permutation testing is the most powerful method among the considered approaches to multiple testing, and that grouping hypotheses based on prior knowledge can improve power. We also found that emphasizing primary and follow-up studies equally produced most reproducible outcomes.Comparison with Existing Method(s)We have extended the use of two-group and separate-classes models for analyzing reproducibility and provide a new open-source software “MultiPy” for multiple hypothesis testing.ConclusionsOur results suggest that performing strict corrections for multiple testing is not sufficient to improve reproducibility of neuroimaging experiments. The methods are freely available as a Python toolkit “MultiPy” and we aim this study to help in improving statistical data analysis practices and to assist in conducting power and reproducibility analyses for new experiments.


2021 ◽  
Author(s):  
Otília Menyhárt ◽  
Boglárka Weltz ◽  
Balázs Győrffy

ABSTRACTScientists from nearly all disciplines face the problem of simultaneously evaluating many hypotheses. Conducting multiple comparisons increases the likelihood that a non-negligible proportion of associations will be false positives, clouding real discoveries.Drawing valid conclusions require taking into account the number of performed statistical tests and adjusting the statistical confidence measures. Several strategies exist to overcome the problem of multiple hypothesis testing. We aim to summarize critical statistical concepts and widely used correction approaches while also draw attention to frequently misinterpreted notions of statistical inference.We provide a step-by-step description of each multiple-testing correction method with clear examples and present an easy-to-follow guide for selecting the most suitable correction technique.To facilitate multiple-testing corrections, we developed a fully automated solution not requiring programming skills or the use of a command line. Our registration free online tool is available at www.multipletesting.com and compiles the five most frequently used adjustment tools, including the Bonferroni, the Holm (step-down), the Hochberg (step-up) corrections, allows to calculate False Discovery Rates (FDR) and q-values.The current summary provides a much needed practical synthesis of basic statistical concepts regarding multiple hypothesis testing in a comprehensible language with well-illustrated examples. The web tool will fill the gap for life science researchers by providing a user-friendly substitute for command-line alternatives.


2018 ◽  
Author(s):  
Martin J. Zhang ◽  
Fei Xia ◽  
James Zou

Multiple hypothesis testing is an essential component of modern data science. Its goal is to maximize the number of discoveries while controlling the fraction of false discoveries. In many settings, in addition to the p-value, additional information/covariates for each hypothesis are available. For example, in eQTL studies, each hypothesis tests the correlation between a variant and the expression of a gene. We also have additional covariates such as the location, conservation and chromatin status of the variant, which could inform how likely the association is to be due to noise. However, popular multiple hypothesis testing approaches, such as Benjamini-Hochberg procedure (BH) and independent hypothesis weighting (IHW), either ignore these covariates or assume the covariate to be univariate. We introduce AdaFDR, a fast and flexible method that adaptively learns the optimal p-value threshold from covariates to significantly improve detection power. On eQTL analysis of the GTEx data, AdaFDR discovers 32% and 27% more associations than BH and IHW, respectively, at the same false discovery rate. We prove that AdaFDR controls false discovery proportion, and show that it makes substantially more discoveries while controlling FDR in extensive experiments. AdaFDR is computationally efficient and can process more than 100 million hypotheses within an hour and allows multi-dimensional covariates with both numeric and categorical values. It also provides exploratory plots for the user to interpret how each covariate affects the significance of hypotheses, making it broadly useful across many applications.


2020 ◽  
Vol 17 (2) ◽  
pp. 443-451
Author(s):  
Bindu Punathumparambath ◽  
Kannan Vadakkadath Meethal

Sign in / Sign up

Export Citation Format

Share Document