Multiple Testing and Statistical Power With Modified Bonferroni Procedures

1997 ◽  
Vol 22 (4) ◽  
pp. 389-406 ◽  
Author(s):  
Stephen Olejnik ◽  
Jianmin Li ◽  
Suchada Supattathum ◽  
Carl J. Huberty

The difference in statistical power between the original Bonferroni and five modified Bonferroni procedures that control the overall Type I error rate is examined in the context of a correlation matrix where multiple null hypotheses, H0 : ρ ij = 0 for all i ≠ j, are tested. Using 50 real correlation matrices reported in educational and psychological journals, a difference in the number of hypotheses rejected of less than 4% was observed among the procedures. When simulated data were used, very small differences were found among the six procedures in detecting at least one true relationship, but in detecting all true relationships the power of the modified Bonferroni procedures exceeded that of the original Bonferroni procedure by at least .18 and by as much as .55 when all null hypotheses were false. The power difference decreased as the number of true relationships decreased. Power differences obtained for the average power were of a much smaller magnitude but still favored the modified Bonferroni procedures. For the five modified Bonferroni procedures, power differences less than .05 were typically observed; the Holm procedure had the lowest power, and the Rom procedure had the highest.

Author(s):  
Thomas Verron ◽  
Xavier Cahours ◽  
Stéphane Colard

SummaryDuring the last two decades, an increase of tobacco product reporting requirements from regulators was observed, such as Europe, Canada or USA.However, the capacity to compare and discriminate accurately two products is impacted by the number of constituents used for the comparison. Indeed, performing a large number of simultaneous independent hypothesis tests increases the probability of rejection of the null hypothesis when it should not be rejected. This leads to virtually guarantee the presence of type I errors among the findings. Correction methods have been developed to overcome this issue like the Bonferroni or Benjamini & Hochberg ones. The performance of these methods was assessed by comparing identical tobacco products with different sizes of data sets. Results showed that multiple comparisons lead to erroneous conclusions if the risk of type I error is not corrected. Unfortunately, reducing the type I error impacts the statistical power of the tests. Consequently, strategies for dealing with multiplicity of data should provide a reasonable balance between testing requirement and statistical power of differentiation. Multiple testing for product comparison is less of a problem if studies restrict to the most relevant parameters for comparison.


2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


Biostatistics ◽  
2017 ◽  
Vol 18 (3) ◽  
pp. 477-494 ◽  
Author(s):  
Jakub Pecanka ◽  
Marianne A. Jonker ◽  
Zoltan Bochdanovits ◽  
Aad W. Van Der Vaart ◽  

Summary For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the “missing heritability” of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson’s disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.


Author(s):  
Luan L. Lee ◽  
Miguel G. Lizarraga ◽  
Natanael R. Gomes ◽  
Alessandro L. Koerich

This paper describes a prototype for Brazilian bankcheck recognition. The description is divided into three topics: bankcheck information extraction, digit amount recognition and signature verification. In bankcheck information extraction, our algorithms provide signature and digit amount images free of background patterns and bankcheck printed information. In digit amount recognition, we dealt with the digit amount segmentation and implementation of a complete numeral character recognition system involving image processing, feature extraction and neural classification. In signature verification, we designed and implemented a static signature verification system suitable for banking and commercial applications. Our signature verification algorithm is capable of detecting both simple, random and skilled forgeries. The proposed automatic bankcheck recognition prototype was intensively tested by real bankcheck data as well as simulated data providing the following performance results: for skilled forgeries, 4.7% equal error rate; for random forgeries, zero Type I error and 7.3% Type II error; for bankcheck numerals, 92.7% correct recognition rate.


2019 ◽  
Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.


2010 ◽  
Vol 23 (2) ◽  
pp. 200-229 ◽  
Author(s):  
Anna L. Macready ◽  
Laurie T. Butler ◽  
Orla B. Kennedy ◽  
Judi A. Ellis ◽  
Claire M. Williams ◽  
...  

In recent years there has been a rapid growth of interest in exploring the relationship between nutritional therapies and the maintenance of cognitive function in adulthood. Emerging evidence reveals an increasingly complex picture with respect to the benefits of various food constituents on learning, memory and psychomotor function in adults. However, to date, there has been little consensus in human studies on the range of cognitive domains to be tested or the particular tests to be employed. To illustrate the potential difficulties that this poses, we conducted a systematic review of existing human adult randomised controlled trial (RCT) studies that have investigated the effects of 24 d to 36 months of supplementation with flavonoids and micronutrients on cognitive performance. There were thirty-nine studies employing a total of 121 different cognitive tasks that met the criteria for inclusion. Results showed that less than half of these studies reported positive effects of treatment, with some important cognitive domains either under-represented or not explored at all. Although there was some evidence of sensitivity to nutritional supplementation in a number of domains (for example, executive function, spatial working memory), interpretation is currently difficult given the prevailing ‘scattergun approach’ for selecting cognitive tests. Specifically, the practice means that it is often difficult to distinguish between a boundary condition for a particular nutrient and a lack of task sensitivity. We argue that for significant future progress to be made, researchers need to pay much closer attention to existing human RCT and animal data, as well as to more basic issues surrounding task sensitivity, statistical power and type I error.


2020 ◽  
Vol 6 (2) ◽  
pp. 106-113
Author(s):  
A. M. Grjibovski ◽  
M. A. Gorbatova ◽  
A. N. Narkevich ◽  
K. A. Vinogradov

Sample size calculation in a planning phase is still uncommon in Russian research practice. This situation threatens validity of the conclusions and may introduce Type I error when the false null hypothesis is accepted due to lack of statistical power to detect the existing difference between the means. Comparing two means using unpaired Students’ ttests is the most common statistical procedure in the Russian biomedical literature. However, calculations of the minimal required sample size or retrospective calculation of the statistical power were observed only in very few publications. In this paper we demonstrate how to calculate required sample size for comparing means in unpaired samples using WinPepi and Stata software. In addition, we produced tables for minimal required sample size for studies when two means have to be compared and body mass index and blood pressure are the variables of interest. The tables were constructed for unpaired samples for different levels of statistical power and standard deviations obtained from the literature.


2021 ◽  
pp. 096228022110605
Author(s):  
Ujjwal Das ◽  
Ranojoy Basu

We consider partially observed binary matched-pair data. We assume that the incomplete subjects are missing at random. Within this missing framework, we propose an EM-algorithm based approach to construct an interval estimator of the proportion difference incorporating all the subjects. In conjunction with our proposed method, we also present two improvements to the interval estimator through some correction factors. The performances of the three competing methods are then evaluated through extensive simulation. Recommendation for the method is given based on the ability to preserve type-I error for various sample sizes. Finally, the methods are illustrated in two real-world data sets. An R-function is developed to implement the three proposed methods.


Author(s):  
Shengjie Liu ◽  
Jun Gao ◽  
Yuling Zheng ◽  
Lei Huang ◽  
Fangrong Yan

AbstractBioequivalence (BE) studies are an integral component of new drug development process, and play an important role in approval and marketing of generic drug products. However, existing design and evaluation methods are basically under the framework of frequentist theory, while few implements Bayesian ideas. Based on the bioequivalence predictive probability model and sample re-estimation strategy, we propose a new Bayesian two-stage adaptive design and explore its application in bioequivalence testing. The new design differs from existing two-stage design (such as Potvin’s method B, C) in the following aspects. First, it not only incorporates historical information and expert information, but further combines experimental data flexibly to aid decision-making. Secondly, its sample re-estimation strategy is based on the ratio of the information in interim analysis to total information, which is simpler in calculation than the Potvin’s method. Simulation results manifested that the two-stage design can be combined with various stop boundary functions, and the results are different. Moreover, the proposed method saves sample size compared to the Potvin’s method under the conditions that type I error rate is below 0.05 and statistical power reaches 80 %.


Sign in / Sign up

Export Citation Format

Share Document