scholarly journals Comparison of Power for Multiple Comparison Procedures

2013 ◽  
Vol 4 (1) ◽  
pp. 20 ◽  
Author(s):  
Robert S. Rodger ◽  
Mark Roberts

The number of methods for evaluating, and possibly making statistical decisions about, null contrasts - or their small sub-set, multiple comparisons - has grown extensively since the early 1950s. That demonstrates how important the subject is, but most of the growth consists of modest variations of the early methods. This paper examines nine fairly basic procedures, six of which are methods designed to evaluate contrasts chosen post hoc, i.e., after an examination of the test data. Three of these use experimentwise or familywise type 1 error rates (Scheffé 1953, Tukey 1953, Newman-Keuls, 1939 and 1952), two use decision-based type 1 error rates (Duncan 1951 and Rodger 1975a) and one (Fisher's LSD 1935) uses a mixture of the two type 1 error rate definitions. The other three methods examined are for evaluating, and possibly deciding about, a limited number of null contrasts that have been chosen independently of the sample data - preferably before the data are collected. One of these (planned t-tests) uses decision-based type 1 error rates and the other two (one based on Bonferroni's Inequality 1936, and the other Dunnett's 1964 Many-One procedure) use a familywise type 1 error rate. The use of these different type 1 error rate definitionsA creates quite large discrepancies in the capacities of the methods to detect true non-zero effects in the contrasts being evaluated. This article describes those discrepancies in power and, especially, how they are exacerbated by increases in the size of an investigation (i.e., an increase in J, the number of samples being examined). It is also true that the capacity of a multiple contrast procedure to 'unpick' 'true' differences from the sample data is influenced by the type of contrast the procedure permits. For example, multiple range procedures (such as that of Newman-Keuls and that of Duncan) permit only comparisons (i.e., two-group differences) and that greatly limits their discriminating capacity (which is not, technically speaking, their power). Many methods (those of Scheffé, Tukey's HSD, Newman-Keuls, Fisher's LSD, Bonferroni and Dunnett) place their emphasis on one particular question, "Are there any differences at all among the groups?" Some other procedures concentrate on individual contrasts (i.e., those of Duncan, Rodger and Planned Contrasts); so are more concerned with how many false null contrasts the method can detect. This results in two basically different definitions of detection capacity. Finally, there is a categorical difference between what post hoc methods and those evaluating pre-planned contrasts can find. The success of the latter depends on how wisely (or honestly well informed) the user has been in planning the limited number of statistically revealing contrasts to test. That can greatly affect the method's discriminating success, but it is often not included in power evaluations. These matters are elaborated upon as they arise in the exposition below. DOI:10.2458/azu_jmmss_v4i1_rodger

Author(s):  
Robert S. Rodger ◽  
Mark Roberts

The number of methods for evaluating, and possibly making statistical decisions about, null contrasts - or their small sub-set, multiple comparisons - has grown extensively since the early 1950s. That demonstrates how important the subject is, but most of the growth consists of modest variations of the early methods. This paper examines nine fairly basic procedures, six of which are methods designed to evaluate contrasts chosen post hoc, i.e., after an examination of the test data. Three of these use experimentwise or familywise type 1 error rates (Scheffé 1953, Tukey 1953, Newman-Keuls, 1939 and 1952), two use decision-based type 1 error rates (Duncan 1951 and Rodger 1975a) and one (Fisher's LSD 1935) uses a mixture of the two type 1 error rate definitions. The other three methods examined are for evaluating, and possibly deciding about, a limited number of null contrasts that have been chosen independently of the sample data - preferably before the data are collected. One of these (planned t-tests) uses decision-based type 1 error rates and the other two (one based on Bonferroni's Inequality 1936, and the other Dunnett's 1964 Many-One procedure) use a familywise type 1 error rate. The use of these different type 1 error rate definitionsA creates quite large discrepancies in the capacities of the methods to detect true non-zero effects in the contrasts being evaluated. This article describes those discrepancies in power and, especially, how they are exacerbated by increases in the size of an investigation (i.e., an increase in J, the number of samples being examined). It is also true that the capacity of a multiple contrast procedure to 'unpick' 'true' differences from the sample data is influenced by the type of contrast the procedure permits. For example, multiple range procedures (such as that of Newman-Keuls and that of Duncan) permit only comparisons (i.e., two-group differences) and that greatly limits their discriminating capacity (which is not, technically speaking, their power). Many methods (those of Scheffé, Tukey's HSD, Newman-Keuls, Fisher's LSD, Bonferroni and Dunnett) place their emphasis on one particular question, "Are there any differences at all among the groups?" Some other procedures concentrate on individual contrasts (i.e., those of Duncan, Rodger and Planned Contrasts); so are more concerned with how many false null contrasts the method can detect. This results in two basically different definitions of detection capacity. Finally, there is a categorical difference between what post hoc methods and those evaluating pre-planned contrasts can find. The success of the latter depends on how wisely (or honestly well informed) the user has been in planning the limited number of statistically revealing contrasts to test. That can greatly affect the method's discriminating success, but it is often not included in power evaluations. These matters are elaborated upon as they arise in the exposition below. DOI:10.2458/azu_jmmss_v4i1_rodger


1986 ◽  
Vol 20 (1) ◽  
pp. 46-54 ◽  
Author(s):  
Wayne Hall ◽  
Kevin D. Bird

Methods are presented for using linear contrasts to make inferences about differences between the means of several populations on continuous dependent variables. These methods control the experimentwise error rate (the probability of committing one or more type 1 errors in the set of decisions made within the experiment) for linear contrasts which compare some sub-sets of populations with others. Appropriate methods are outlined for testing contrasts which have been planned (i.e., specified independently of the data on which they are tested) and defined post hoc (i.e., after an inspection of the data). We show how these methods can be adapted to the analysis of data from factorial analysis of variance research designs.


1986 ◽  
Vol 20 (2) ◽  
pp. 189-200 ◽  
Author(s):  
Kevin D. Bird ◽  
Wayne Hall

Statistical power is neglected in much psychiatric research, with the consequence that many studies do not provide a reasonable chance of detecting differences between groups if they exist in the population. This paper attempts to improve current practice by providing an introduction to the essential quantities required for performing a power analysis (sample size, effect size, type 1 and type 2 error rates). We provide simplified tables for estimating the sample size required to detect a specified size of effect with a type 1 error rate of α and a type 2 error rate of β, and for estimating the power provided by a given sample size for detecting a specified size of effect with a type 1 error rate of α. We show how to modify these tables to perform power analyses for multiple comparisons in univariate and some multivariate designs. Power analyses for each of these types of design are illustrated by examples.


2021 ◽  
Vol 82 ◽  
pp. 99
Author(s):  
Daniel Joseph Tancredi ◽  
Danielle J. Harvey ◽  
Suzette Smiley-Jewell ◽  
Danh V. Nguyen
Keyword(s):  

2011 ◽  
Vol 2 (2) ◽  
pp. 63
Author(s):  
Mark Roberts

R.S. Rodger fully developed, more than three decades ago, probably the most powerful methodology which exists for detecting real differences among population means (μ’s) following an analysis of variance. Since it is a post hoc method, a theoretically infinite number of potential statistical decisions may be considered, but Rodger’s method limits the final number of decisions to a single set which contains exactly J-1 (i.e., v1, the number of means in a study minus one) of them. It also constrains the number of these J-1 decisions that may be declared statistically “significant.” Rodger’s method utilizes a decision-based error rate, and ensures that the expected rate of rejecting null contrasts that should not have been rejected (i.e., the type 1 error rate) will be less than or equal to either five or one percent, regardless of the number of contrasts examined by a researcher prior to finally deciding upon the scientifically optimal set of decisions.The greatest virtue of Rodger's method, though, is not its considerable power, but its explicit specification of the magnitude of the differences that the researcher will claim to exist among the population parameters. The implied true means that this method calculates are the theoretical population μ’s that are logically implied, and mathematically entailed, by the J-1 statistical decisions that the researcher has made. These implied true means can assist other researchers in confirming or disconfirming population parameter claims made by those who use Rodger’s method. A free computer program (SPS) that instantiates Rodger’s method, and thereby makes its use accessible to every researcher who has access to a Windows-based computer, is available from the author. DOI:10.2458/azu_jmmss_v2i2_roberts


2011 ◽  
Vol 2 (2) ◽  
pp. 63 ◽  
Author(s):  
Mark Roberts

R.S. Rodger fully developed, more than three decades ago, probably the most powerful methodology which exists for detecting real differences among population means (μ’s) following an analysis of variance. Since it is a post hoc method, a theoretically infinite number of potential statistical decisions may be considered, but Rodger’s method limits the final number of decisions to a single set which contains exactly J-1 (i.e., v1, the number of means in a study minus one) of them. It also constrains the number of these J-1 decisions that may be declared statistically “significant.” Rodger’s method utilizes a decision-based error rate, and ensures that the expected rate of rejecting null contrasts that should not have been rejected (i.e., the type 1 error rate) will be less than or equal to either five or one percent, regardless of the number of contrasts examined by a researcher prior to finally deciding upon the scientifically optimal set of decisions.The greatest virtue of Rodger's method, though, is not its considerable power, but its explicit specification of the magnitude of the differences that the researcher will claim to exist among the population parameters. The implied true means that this method calculates are the theoretical population μ’s that are logically implied, and mathematically entailed, by the J-1 statistical decisions that the researcher has made. These implied true means can assist other researchers in confirming or disconfirming population parameter claims made by those who use Rodger’s method. A free computer program (SPS) that instantiates Rodger’s method, and thereby makes its use accessible to every researcher who has access to a Windows-based computer, is available from the author. DOI:10.2458/azu_jmmss_v2i2_roberts


2019 ◽  
Vol 28 (4) ◽  
pp. 1411-1431 ◽  
Author(s):  
Lauren Bislick ◽  
William D. Hula

Purpose This retrospective analysis examined group differences in error rate across 4 contextual variables (clusters vs. singletons, syllable position, number of syllables, and articulatory phonetic features) in adults with apraxia of speech (AOS) and adults with aphasia only. Group differences in the distribution of error type across contextual variables were also examined. Method Ten individuals with acquired AOS and aphasia and 11 individuals with aphasia participated in this study. In the context of a 2-group experimental design, the influence of 4 contextual variables on error rate and error type distribution was examined via repetition of 29 multisyllabic words. Error rates were analyzed using Bayesian methods, whereas distribution of error type was examined via descriptive statistics. Results There were 4 findings of robust differences between the 2 groups. These differences were found for syllable position, number of syllables, manner of articulation, and voicing. Group differences were less robust for clusters versus singletons and place of articulation. Results of error type distribution show a high proportion of distortion and substitution errors in speakers with AOS and a high proportion of substitution and omission errors in speakers with aphasia. Conclusion Findings add to the continued effort to improve the understanding and assessment of AOS and aphasia. Several contextual variables more consistently influenced breakdown in participants with AOS compared to participants with aphasia and should be considered during the diagnostic process. Supplemental Material https://doi.org/10.23641/asha.9701690


2014 ◽  
Vol 56 (4) ◽  
pp. 614-630 ◽  
Author(s):  
Alexandra C. Graf ◽  
Peter Bauer ◽  
Ekkehard Glimm ◽  
Franz Koenig

2016 ◽  
Vol 148 (8) ◽  
pp. 24-31
Author(s):  
Kayode Ayinde ◽  
John Olatunde ◽  
Gbenga Sunday

Sign in / Sign up

Export Citation Format

Share Document