Simultaneous Multiple Comparison Procedures in Psychiatric Research

The number of methods for evaluating, and possibly making statistical decisions about, null contrasts - or their small sub-set, multiple comparisons - has grown extensively since the early 1950s. That demonstrates how important the subject is, but most of the growth consists of modest variations of the early methods. This paper examines nine fairly basic procedures, six of which are methods designed to evaluate contrasts chosen post hoc, i.e., after an examination of the test data. Three of these use experimentwise or familywise type 1 error rates (Scheffé 1953, Tukey 1953, Newman-Keuls, 1939 and 1952), two use decision-based type 1 error rates (Duncan 1951 and Rodger 1975a) and one (Fisher's LSD 1935) uses a mixture of the two type 1 error rate definitions. The other three methods examined are for evaluating, and possibly deciding about, a limited number of null contrasts that have been chosen independently of the sample data - preferably before the data are collected. One of these (planned t-tests) uses decision-based type 1 error rates and the other two (one based on Bonferroni's Inequality 1936, and the other Dunnett's 1964 Many-One procedure) use a familywise type 1 error rate. The use of these different type 1 error rate definitionsA creates quite large discrepancies in the capacities of the methods to detect true non-zero effects in the contrasts being evaluated. This article describes those discrepancies in power and, especially, how they are exacerbated by increases in the size of an investigation (i.e., an increase in J, the number of samples being examined). It is also true that the capacity of a multiple contrast procedure to 'unpick' 'true' differences from the sample data is influenced by the type of contrast the procedure permits. For example, multiple range procedures (such as that of Newman-Keuls and that of Duncan) permit only comparisons (i.e., two-group differences) and that greatly limits their discriminating capacity (which is not, technically speaking, their power). Many methods (those of Scheffé, Tukey's HSD, Newman-Keuls, Fisher's LSD, Bonferroni and Dunnett) place their emphasis on one particular question, "Are there any differences at all among the groups?" Some other procedures concentrate on individual contrasts (i.e., those of Duncan, Rodger and Planned Contrasts); so are more concerned with how many false null contrasts the method can detect. This results in two basically different definitions of detection capacity. Finally, there is a categorical difference between what post hoc methods and those evaluating pre-planned contrasts can find. The success of the latter depends on how wisely (or honestly well informed) the user has been in planning the limited number of statistically revealing contrasts to test. That can greatly affect the method's discriminating success, but it is often not included in power evaluations. These matters are elaborated upon as they arise in the exposition below. DOI:10.2458/azu_jmmss_v4i1_rodger

Download Full-text

Comparison of Power for Multiple Comparison Procedures

Journal of Methods and Measurement in the Social Sciences ◽

10.2458/jmm.v4i1.17775 ◽

2013 ◽

Vol 4 (1) ◽

pp. 20 ◽

Cited By ~ 10

Author(s):

Robert S. Rodger ◽

Mark Roberts

Keyword(s):

Error Rate ◽

Error Rates ◽

The Other ◽

Group Differences ◽

Type 1 Error ◽

Sample Data ◽

Bonferroni's Inequality ◽

Post Hoc ◽

Multiple Comparison Procedures

The number of methods for evaluating, and possibly making statistical decisions about, null contrasts - or their small sub-set, multiple comparisons - has grown extensively since the early 1950s. That demonstrates how important the subject is, but most of the growth consists of modest variations of the early methods. This paper examines nine fairly basic procedures, six of which are methods designed to evaluate contrasts chosen post hoc, i.e., after an examination of the test data. Three of these use experimentwise or familywise type 1 error rates (Scheffé 1953, Tukey 1953, Newman-Keuls, 1939 and 1952), two use decision-based type 1 error rates (Duncan 1951 and Rodger 1975a) and one (Fisher's LSD 1935) uses a mixture of the two type 1 error rate definitions. The other three methods examined are for evaluating, and possibly deciding about, a limited number of null contrasts that have been chosen independently of the sample data - preferably before the data are collected. One of these (planned t-tests) uses decision-based type 1 error rates and the other two (one based on Bonferroni's Inequality 1936, and the other Dunnett's 1964 Many-One procedure) use a familywise type 1 error rate. The use of these different type 1 error rate definitionsA creates quite large discrepancies in the capacities of the methods to detect true non-zero effects in the contrasts being evaluated. This article describes those discrepancies in power and, especially, how they are exacerbated by increases in the size of an investigation (i.e., an increase in J, the number of samples being examined). It is also true that the capacity of a multiple contrast procedure to 'unpick' 'true' differences from the sample data is influenced by the type of contrast the procedure permits. For example, multiple range procedures (such as that of Newman-Keuls and that of Duncan) permit only comparisons (i.e., two-group differences) and that greatly limits their discriminating capacity (which is not, technically speaking, their power). Many methods (those of Scheffé, Tukey's HSD, Newman-Keuls, Fisher's LSD, Bonferroni and Dunnett) place their emphasis on one particular question, "Are there any differences at all among the groups?" Some other procedures concentrate on individual contrasts (i.e., those of Duncan, Rodger and Planned Contrasts); so are more concerned with how many false null contrasts the method can detect. This results in two basically different definitions of detection capacity. Finally, there is a categorical difference between what post hoc methods and those evaluating pre-planned contrasts can find. The success of the latter depends on how wisely (or honestly well informed) the user has been in planning the limited number of statistically revealing contrasts to test. That can greatly affect the method's discriminating success, but it is often not included in power evaluations. These matters are elaborated upon as they arise in the exposition below. DOI:10.2458/azu_jmmss_v4i1_rodger

Download Full-text

Nine, Seven, Five, or Three: How Many Figures Do We Need for Assessing Body Image?

Perceptual and Motor Skills ◽

10.2466/pms.100.2.488-492 ◽

2005 ◽

Vol 100 (2) ◽

pp. 488-492 ◽

Cited By ~ 12

Author(s):

Neala Ambrosi-Randić ◽

Alessandra Pokrajac-Bulian ◽

Vladimir Takšić

Keyword(s):

Body Image ◽

Analysis Of Variance ◽

Optimal Number ◽

Female Students ◽

Validity And Reliability ◽

Main Effect ◽

Dependent Variables ◽

Post Hoc ◽

Main Effects

320 Croatian female students ( M = 20.4 yr.) were recruited to examine the validity and reliability of figural scales using different numbers of stimuli (3, 5, 7, and 9) and different serial presentation (serial and nonserial order). A two-way analysis of variance (4 numbers × 2 orders of stimuli) was performed on ratings of current self-size and ideal size as dependent variables. Analysis indicated a significant main effect of number of stimuli. This, together with post hoc tests indicated that ratings were significantly different for a scale of three figures from scales of more figures, which in turn did not differ among themselves. Main effects of order of stimuli, as well as the interaction, were not significant. The results support the hypothesis that the optimal number of figures on a scale is seven plus (or minus) two.

Download Full-text

MANOVA: A Procedure Whose Time Has Passed?

Gifted Child Quarterly ◽

10.1177/0016986219887200 ◽

2019 ◽

Vol 64 (1) ◽

pp. 56-60 ◽

Cited By ~ 1

Author(s):

Francis L. Huang

Keyword(s):

Multivariate Analysis ◽

Analysis Of Variance ◽

Research Question ◽

Statistical Procedure ◽

Large Majority ◽

Multivariate Analysis Of Variance ◽

Dependent Variables ◽

Research Questions ◽

Post Hoc ◽

Published Research

Multivariate analysis of variance (MANOVA) is a statistical procedure commonly used in fields such as education and psychology. However, MANOVA’s popularity may actually be for the wrong reasons. The large majority of published research using MANOVA focus on univariate research questions rather than on the multivariate questions that MANOVA is said to specifically address. Given the more complicated and limited nature of interpreting MANOVA effects (which researchers may not actually be interested in given the actual post hoc strategies employed) and that various flexible and well-known statistical alternatives are available, I suggest that researchers consult these better known, robust, and flexible procedures instead, given the proper match with the research question of interest. Just because a researcher has multiple dependent variables of interest does not mean that a MANOVA should be used at all.

Download Full-text

Statistical Power in Psychiatric Research

Australian & New Zealand Journal of Psychiatry ◽

10.3109/00048678609161331 ◽

1986 ◽

Vol 20 (2) ◽

pp. 189-200 ◽

Cited By ~ 19

Author(s):

Kevin D. Bird ◽

Wayne Hall

Keyword(s):

Sample Size ◽

Error Rate ◽

Statistical Power ◽

Error Rates ◽

Psychiatric Research ◽

Type 1 Error ◽

Type 2 Error ◽

Power Analyses

Statistical power is neglected in much psychiatric research, with the consequence that many studies do not provide a reasonable chance of detecting differences between groups if they exist in the population. This paper attempts to improve current practice by providing an introduction to the essential quantities required for performing a power analysis (sample size, effect size, type 1 and type 2 error rates). We provide simplified tables for estimating the sample size required to detect a specified size of effect with a type 1 error rate of α and a type 2 error rate of β, and for estimating the power provided by a given sample size for detecting a specified size of effect with a type 1 error rate of α. We show how to modify these tables to perform power analyses for multiple comparisons in univariate and some multivariate designs. Power analyses for each of these types of design are illustrated by examples.

Download Full-text

Requiring post-hoc power of 80% amounts to an unstated lowering of the type-1 error rate

NeuroToxicology ◽

10.1016/j.neuro.2020.11.004 ◽

2021 ◽

Vol 82 ◽

pp. 99

Author(s):

Daniel Joseph Tancredi ◽

Danielle J. Harvey ◽

Suzette Smiley-Jewell ◽

Danh V. Nguyen

Keyword(s):

Error Rate ◽

Type 1 Error ◽

Post Hoc

Download Full-text

Alternative to Tukey test

Ciência e Agrotecnologia ◽

10.1590/1413-7054202044008020 ◽

2020 ◽

Vol 44 ◽

Author(s):

Ben Dêivide de Oliveira Batista ◽

Daniel Furtado Ferreira

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Error Rate ◽

The Other ◽

Multiple Comparison ◽

Tukey Test ◽

Experimentwise Error Rate ◽

Other Hand ◽

Multiple Comparison Procedures

ABSTRACT In order to search for an ideal test for multiple comparison procedures, this study aimed to develop two tests, similar to the Tukey and SNK tests, based on the distribution of the externally studentized amplitude. The test names are Tukey Midrange (TM) and SNK Midrange (SNKM). The tests were evaluated based on the experimentwise error rate and power, using Monte Carlo simulation. The results showed that the TM test could be an alternative to the Tukey test, since it presented superior performances in some simulated scenarios. On the other hand, the SNKM test performed less than the SNK test.

Download Full-text

Simple, Powerful Statistics: An Instantiation of a Better ‘Mousetrap’

Journal of Methods and Measurement in the Social Sciences ◽

10.2458/jmm.v2i2.15989 ◽

2011 ◽

Vol 2 (2) ◽

pp. 63

Author(s):

Mark Roberts

Keyword(s):

Error Rate ◽

Population Parameter ◽

Population Means ◽

Type 1 Error ◽

Statistical Decisions ◽

Theoretical Population ◽

Post Hoc ◽

Optimal Set ◽

Single Set

R.S. Rodger fully developed, more than three decades ago, probably the most powerful methodology which exists for detecting real differences among population means (μ’s) following an analysis of variance. Since it is a post hoc method, a theoretically infinite number of potential statistical decisions may be considered, but Rodger’s method limits the final number of decisions to a single set which contains exactly J-1 (i.e., v1, the number of means in a study minus one) of them. It also constrains the number of these J-1 decisions that may be declared statistically “significant.” Rodger’s method utilizes a decision-based error rate, and ensures that the expected rate of rejecting null contrasts that should not have been rejected (i.e., the type 1 error rate) will be less than or equal to either five or one percent, regardless of the number of contrasts examined by a researcher prior to finally deciding upon the scientifically optimal set of decisions.The greatest virtue of Rodger's method, though, is not its considerable power, but its explicit specification of the magnitude of the differences that the researcher will claim to exist among the population parameters. The implied true means that this method calculates are the theoretical population μ’s that are logically implied, and mathematically entailed, by the J-1 statistical decisions that the researcher has made. These implied true means can assist other researchers in confirming or disconfirming population parameter claims made by those who use Rodger’s method. A free computer program (SPS) that instantiates Rodger’s method, and thereby makes its use accessible to every researcher who has access to a Windows-based computer, is available from the author. DOI:10.2458/azu_jmmss_v2i2_roberts

Download Full-text

Simple, Powerful Statistics: An Instantiation of a Better ‘Mousetrap’

Journal of Methods and Measurement in the Social Sciences ◽

10.2458/v2i2.15989 ◽

2011 ◽

Vol 2 (2) ◽

pp. 63 ◽

Cited By ~ 3

Author(s):

Mark Roberts

Keyword(s):

Error Rate ◽

Population Parameter ◽

Population Means ◽

Type 1 Error ◽

Statistical Decisions ◽

Theoretical Population ◽

Post Hoc ◽

Optimal Set ◽

Single Set

R.S. Rodger fully developed, more than three decades ago, probably the most powerful methodology which exists for detecting real differences among population means (μ’s) following an analysis of variance. Since it is a post hoc method, a theoretically infinite number of potential statistical decisions may be considered, but Rodger’s method limits the final number of decisions to a single set which contains exactly J-1 (i.e., v1, the number of means in a study minus one) of them. It also constrains the number of these J-1 decisions that may be declared statistically “significant.” Rodger’s method utilizes a decision-based error rate, and ensures that the expected rate of rejecting null contrasts that should not have been rejected (i.e., the type 1 error rate) will be less than or equal to either five or one percent, regardless of the number of contrasts examined by a researcher prior to finally deciding upon the scientifically optimal set of decisions.The greatest virtue of Rodger's method, though, is not its considerable power, but its explicit specification of the magnitude of the differences that the researcher will claim to exist among the population parameters. The implied true means that this method calculates are the theoretical population μ’s that are logically implied, and mathematically entailed, by the J-1 statistical decisions that the researcher has made. These implied true means can assist other researchers in confirming or disconfirming population parameter claims made by those who use Rodger’s method. A free computer program (SPS) that instantiates Rodger’s method, and thereby makes its use accessible to every researcher who has access to a Windows-based computer, is available from the author. DOI:10.2458/azu_jmmss_v2i2_roberts

Download Full-text