scholarly journals Magnitude Based Inference in Relation to One-sided Hypotheses Testing Procedures

2020 ◽  
Author(s):  
Janet Aisbett ◽  
Daniel Lakens ◽  
Kristin Sainani

Magnitude based inference (MBI) was widely adopted by sport science researchers as an alternative to null hypothesis significance tests. It has been criticized for lacking a theoretical framework, mixing Bayesian and frequentist thinking, and encouraging researchers to run small studies with high Type 1 error rates. MBI terminology describes the position of confidence intervals in relation to smallest meaningful effect sizes. We show these positions correspond to combinations of one-sided tests of hypotheses about the presence or absence of meaningful effects, and formally describe MBI as a multiple decision procedure. MBI terminology operates as if tests are conducted at multiple alpha levels. We illustrate how error rates can be controlled by limiting each one-sided hypothesis test to a single alpha level. To provide transparent error control in a Neyman-Pearson framework and encourage the use of standard statistical software, we recommend replacing MBI with one-sided tests against smallest meaningful effects, or pairs of such tests as in equivalence testing. Researchers should pre-specify their hypotheses and alpha levels, perform a priori sample size calculations, and justify all assumptions. Our recommendations show researchers what tests to use and how to design and report their statistical analyses to accord with standard frequentist practice.

2021 ◽  
Vol 18 (5) ◽  
pp. 521-528
Author(s):  
Eric S Leifer ◽  
James F Troendle ◽  
Alexis Kolecki ◽  
Dean A Follmann

Background/aims: The two-by-two factorial design randomizes participants to receive treatment A alone, treatment B alone, both treatments A and B( AB), or neither treatment ( C). When the combined effect of A and B is less than the sum of the A and B effects, called a subadditive interaction, there can be low power to detect the A effect using an overall test, that is, factorial analysis, which compares the A and AB groups to the C and B groups. Such an interaction may have occurred in the Action to Control Cardiovascular Risk in Diabetes blood pressure trial (ACCORD BP) which simultaneously randomized participants to receive intensive or standard blood pressure, control and intensive or standard glycemic control. For the primary outcome of major cardiovascular event, the overall test for efficacy of intensive blood pressure control was nonsignificant. In such an instance, simple effect tests of A versus C and B versus C may be useful since they are not affected by a subadditive interaction, but they can have lower power since they use half the participants of the overall trial. We investigate multiple testing procedures which exploit the overall tests’ sample size advantage and the simple tests’ robustness to a potential interaction. Methods: In the time-to-event setting, we use the stratified and ordinary logrank statistics’ asymptotic means to calculate the power of the overall and simple tests under various scenarios. We consider the A and B research questions to be unrelated and allocate 0.05 significance level to each. For each question, we investigate three multiple testing procedures which allocate the type 1 error in different proportions for the overall and simple effects as well as the AB effect. The Equal Allocation 3 procedure allocates equal type 1 error to each of the three effects, the Proportional Allocation 2 procedure allocates 2/3 of the type 1 error to the overall A (respectively, B) effect and the remaining type 1 error to the AB effect, and the Equal Allocation 2 procedure allocates equal amounts to the simple A (respectively, B) and AB effects. These procedures are applied to ACCORD BP. Results: Across various scenarios, Equal Allocation 3 had robust power for detecting a true effect. For ACCORD BP, all three procedures would have detected a benefit of intensive glycemia control. Conclusions: When there is no interaction, Equal Allocation 3 has less power than a factorial analysis. However, Equal Allocation 3 often has greater power when there is an interaction. The R package factorial2x2 can be used to explore the power gain or loss for different scenarios.


1986 ◽  
Vol 20 (2) ◽  
pp. 189-200 ◽  
Author(s):  
Kevin D. Bird ◽  
Wayne Hall

Statistical power is neglected in much psychiatric research, with the consequence that many studies do not provide a reasonable chance of detecting differences between groups if they exist in the population. This paper attempts to improve current practice by providing an introduction to the essential quantities required for performing a power analysis (sample size, effect size, type 1 and type 2 error rates). We provide simplified tables for estimating the sample size required to detect a specified size of effect with a type 1 error rate of α and a type 2 error rate of β, and for estimating the power provided by a given sample size for detecting a specified size of effect with a type 1 error rate of α. We show how to modify these tables to perform power analyses for multiple comparisons in univariate and some multivariate designs. Power analyses for each of these types of design are illustrated by examples.


2018 ◽  
Vol 28 (6) ◽  
pp. 1879-1892 ◽  
Author(s):  
Alexandra Christine Graf ◽  
Gernot Wassmer ◽  
Tim Friede ◽  
Roland Gerard Gera ◽  
Martin Posch

With the advent of personalized medicine, clinical trials studying treatment effects in subpopulations are receiving increasing attention. The objectives of such studies are, besides demonstrating a treatment effect in the overall population, to identify subpopulations, based on biomarkers, where the treatment has a beneficial effect. Continuous biomarkers are often dichotomized using a threshold to define two subpopulations with low and high biomarker levels. If there is insufficient information on the dependence structure of the outcome on the biomarker, several thresholds may be investigated. The nested structure of such subpopulations is similar to the structure in group sequential trials. Therefore, it has been proposed to use the corresponding critical boundaries to test such nested subpopulations. We show that for biomarkers with a prognostic effect that is not adjusted for in the statistical model, the variability of the outcome may vary across subpopulations which may lead to an inflation of the family-wise type 1 error rate. Using simulations we quantify the potential inflation of testing procedures based on group sequential designs. Furthermore, alternative hypotheses tests that control the family-wise type 1 error rate under minimal assumptions are proposed. The methodological approaches are illustrated by a trial in depression.


Filomat ◽  
2016 ◽  
Vol 30 (3) ◽  
pp. 681-688
Author(s):  
Farshin Hormozinejad

In this article the author considers the statistical hypotheses testing to make decision among hypotheses concerning many families of probability distributions. The statistician would like to control the overall error rate relative to draw statistically valid conclusions from each test, while being as efficient as possible. The familywise error (FWE) rate metric and the hypothesis test procedure while controlling both the type I and II FWEs are generalized. The proposed procedure shows simultaneous more reliability and less conservative error control relative to fixed sample and other recently proposed sequential procedures. Also, the characteristics of logarithmically asymptotically optimal (LAO) hypotheses testing are studied. The purpose of research is to express the optimal functional relation among the reliabilities of LAO hypotheses testing and to judge with FWE metric.


2018 ◽  
Author(s):  
James Liley ◽  
Chris Wallace

AbstractHigh-dimensional hypothesis testing is ubiquitous in the biomedical sciences, and informative covariates may be employed to improve power. The conditional false discovery rate (cFDR) is widely-used approach suited to the setting where the covariate is a set of p-values for the equivalent hypotheses for a second trait. Although related to the Benjamini-Hochberg procedure, it does not permit any easy control of type-1 error rate, and existing methods are over-conservative. We propose a new method for type-1 error rate control based on identifying mappings from the unit square to the unit interval defined by the estimated cFDR, and splitting observations so that each map is independent of the observations it is used to test. We also propose an adjustment to the existing cFDR estimator which further improves power. We show by simulation that the new method more than doubles potential improvement in power over unconditional analyses compared to existing methods. We demonstrate our method on transcriptome-wide association studies, and show that the method can be used in an iterative way, enabling the use of multiple covariates successively. Our methods substantially improve the power and applicability of cFDR analysis.


2020 ◽  
Vol 103 (6) ◽  
pp. 1667-1679
Author(s):  
Shizhen S Wang

Abstract Background There are several statistical methods for detecting a difference of detection rates between alternative and reference qualitative microbiological assays in a single laboratory validation study with a paired design. Objective We compared performance of eight methods including McNemar’s test, sign test, Wilcoxon signed-rank test, paired t-test, and the regression methods based on conditional logistic (CLOGIT), mixed effects complementary log-log (MCLOGLOG), mixed effects logistic (MLOGIT) models, and a linear mixed effects model (LMM). Methods We first compared the minimum detectable difference in the proportion of detections between the alternative and reference detection methods among these statistical methods for a varied number of test portions. We then compared power and type 1 error rates of these methods using simulated data. Results The MCLOGLOG and MLOGIT models had the lowest minimum detectable difference, followed by the LMM and paired t-test. The MCLOGLOG and MLOGIT models had the highest average power but were anticonservative when correlation between the pairs of outcome values of the alternative and reference methods was high. The LMM and paired t-test had mostly the highest average power when the correlation was low and the second highest average power when the correlation was high. Type 1 error rates of these last two methods approached the nominal value of significance level when the number of test portions was moderately large (n > 20). Highlights The LMM and paired t-test are better choices than other competing methods, and we provide an example using real data.


2002 ◽  
Vol 51 (3) ◽  
pp. 524-527 ◽  
Author(s):  
Mark Wilkinson ◽  
Pedro R. Peres-Neto ◽  
Peter G. Foster ◽  
Clive B. Moncrieff

2021 ◽  
Author(s):  
Essi Laajala ◽  
Viivi Halla-aho ◽  
Toni Grönroos ◽  
Ubaid Ullah ◽  
Mari Vähä-Mäkilä ◽  
...  

Background: The aim of this study was to detect differential methylation in umbilical cord blood that is associated with maternal and pregnancy-related variables, such as maternal age and gestational weight gain. These have been studied earlier with 450K microarrays but not with bisulfite sequencing. Methods: Reduced representation bisulfite sequencing (RRBS) analysis was performed on 200 umbilical cord blood samples. Altogether 24 clinical and technical covariates were included in a binomial mixed effects model, which was fit separately for each high-coverage CpG site, followed by spatial and multiple testing adjustment of P values. Inflation of spatially adjusted P values was discovered in a permutation analysis, which was then applied for empirical type 1 error control. Results: Empirical type 1 error control decreased the number of findings associated with each covariate to zero or a small fraction of the number that would have been discovered with standard cutoffs. In this collection of samples, some differential methylation was associated with sex, the usage of epidural anesthetic during delivery, 1 minute Apgar points, maternal age and height, gestational weight gain, maternal smoking, and maternal insulin-treated diabetes, but not with the birth weight of the newborn infant, maternal pre-pregnancy BMI, the number of earlier miscarriages, the mode of delivery, labor induction, or the cosine transformed month of birth. Conclusions: The autocorrelation-adjusted Z-test is a convenient tool for detecting differentially methylated regions, but the significance should either be determined empirically or before the spatial adjustment. With appropriate significance thresholds, the detected differentially methylated regions were reproducible across studies, technologies, and statistical models. Our RRBS data analysis workflow is available in https://github.com/EssiLaajala/RRBS_workflow. Keywords: DNA methylation, bisulfite sequencing, RRBS, umbilical cord blood, pregnancy, sex, spatial correlation, type 1 error, differential methylation, analysis workflow


Sign in / Sign up

Export Citation Format

Share Document