Statistical Conclusion Validity of Early Intervention Research with Handicapped Children

1989 ◽  
Vol 55 (6) ◽  
pp. 534-540 ◽  
Author(s):  
Kenneth J. Ottenbacher

The statistical conclusion validity of early intervention research studies was examined by conducting a post hoc power analysis of 484 statistical tests from 49 early intervention articles. Statistical power determinations were made based on Cohen's (1977) criteria for small, medium, and large effect sizes. The analysis revealed that the median power to detect small, medium, and large effect sizes ranged from .08 to .46. Four percent of early intervention studies had adequate power (.80 or greater) to detect medium intervention effects and 18% to detect large intervention effects. The power values suggest poor statistical conclusion validity in the analyzed research and should alert investigators to the possibility of Type II experimental errors in the early intervention research literature. The argument is made that low statistical conclusion validity has practical consequences in relation to program evaluation and cost-effectiveness determinations.

2021 ◽  
pp. 39-55
Author(s):  
R. Barker Bausell

This chapter explores three empirical concepts (the p-value, the effect size, and statistical power) integral to the avoidance of false positive scientific. Their relationship to reproducibility is explained in a nontechnical manner without formulas or statistical jargon, with p-values and statistical power presented in terms of probabilities from zero to 1.0 with the values of most interest to scientists being 0.05 (synonymous with a positive, hence, publishable result) and 0.80 (the most commonly recommended probability that a positive result will be obtained if the hypothesis that generated it was correct and the study will be properly designed and conducted). Unfortunately many scientists circumvent both by artifactually inflating the 0.05 criterion, overstating the available statistical power, and engaging in a number of other questionable research practices. These issues are discussed via statistical models from the genetic and psychological fields and then extended to a number of different p-values, statistical power levels, effect sizes, and the prevalence of “true,” effects expected to exist in the research literature. Among the basic conclusions of these modeling efforts are that employing more stringent p-values and larger sample sizes constitute the most effective statistical approaches for increasing the reproducibility of published results in all empirically based scientific literatures. This chapter thus lays the necessary foundation for understanding and appreciating the effects of appropriate p-values, sufficient statistical power, reaslistic effect sizes, and the avoidance of questionable research practices upon the production of reproducible results.


2007 ◽  
Vol 25 (23) ◽  
pp. 3482-3487 ◽  
Author(s):  
Philippe L. Bedard ◽  
Monika K. Krzyzanowska ◽  
Melania Pintilie ◽  
Ian F. Tannock

Purpose To investigate the prevalence of underpowered randomized controlled trials (RCTs) presented at American Society of Clinical Oncology (ASCO) annual meetings. Methods We surveyed all two-arm phase III RCTs presented at ASCO annual meetings from 1995 to 2003 for which negative results were obtained. Post hoc calculations were performed using a power of 80% and an α level of .05 (two sided) to determine sample sizes required to detect small, medium, and large effect sizes. For studies reporting a proportion or time-to-event as primary end point, effect size was expressed as an odds ratio (OR) or hazard ratio (HR), respectively, with a small effect size defined as OR/HR ≥ 1.3, medium effect size defined as OR/HR ≥ 1.5, and large effect size defined as OR/HR ≥ 2.0. Logistic regression was used to identify factors associated with lack of statistical power. Results Of 423 negative RCTs for which post hoc sample size calculations could be performed, 45 (10.6%), 138 (32.6%), and 233 (55.1%) had adequate sample size to detect small, medium, and large effect sizes, respectively. Only 35 negative RCTs (7.1%) reported a reason for inadequate sample size. In a multivariable model, studies that were presented at oral sessions (P = .0038), multicenter studies supported by a cooperative group (P < .0001), and studies with time to event as primary outcome (P < .0001) were more likely to have adequate sample size. Conclusion More than half of negative RCTs presented at ASCO annual meetings do not have an adequate sample to detect a medium-size treatment effect.


2019 ◽  
Vol 50 (5-6) ◽  
pp. 292-304 ◽  
Author(s):  
Mario Wenzel ◽  
Marina Lind ◽  
Zarah Rowland ◽  
Daniela Zahn ◽  
Thomas Kubiak

Abstract. Evidence on the existence of the ego depletion phenomena as well as the size of the effects and potential moderators and mediators are ambiguous. Building on a crossover design that enables superior statistical power within a single study, we investigated the robustness of the ego depletion effect between and within subjects and moderating and mediating influences of the ego depletion manipulation checks. Our results, based on a sample of 187 participants, demonstrated that (a) the between- and within-subject ego depletion effects only had negligible effect sizes and that there was (b) large interindividual variability that (c) could not be explained by differences in ego depletion manipulation checks. We discuss the implications of these results and outline a future research agenda.


2016 ◽  
Vol 52 (9) ◽  
pp. 1409-1421 ◽  
Author(s):  
Jelena Obradović ◽  
Aisha K. Yousafzai ◽  
Jenna E. Finch ◽  
Muneera A. Rasheed

2001 ◽  
Vol 88 (3_suppl) ◽  
pp. 1194-1198 ◽  
Author(s):  
F. Stephen Bridges ◽  
C. Bennett Williamson ◽  
Donna Rae Jarvis

Of 75 letters “lost” in the Florida Panhandle, 33 (44%) were returned in the mail by the finders (the altruistic response). Addressees' affiliations were significantly associated with different rates of return; fewer emotive Intercontinental Gay and Lesbian Outdoors Organization addressees were returned than nonemotive ones. The technique for power analysis by Gillett (1996) was applied to data from an earlier study and indicated our sample of 75 subjects would still yield a desired power level, i.e., 80, for the likely effect sizes. Statistical power was .83, and the effect was medium in size at .34.


Author(s):  
H. S. Styn ◽  
S. M. Ellis

The determination of significance of differences in means and of relationships between variables is of importance in many empirical studies. Usually only statistical significance is reported, which does not necessarily indicate an important (practically significant) difference or relationship. With studies based on probability samples, effect size indices should be reported in addition to statistical significance tests in order to comment on practical significance. Where complete populations or convenience samples are worked with, the determination of statistical significance is strictly speaking no longer relevant, while the effect size indices can be used as a basis to judge significance. In this article attention is paid to the use of effect size indices in order to establish practical significance. It is also shown how these indices are utilized in a few fields of statistical application and how it receives attention in statistical literature and computer packages. The use of effect sizes is illustrated by a few examples from the research literature.


Sign in / Sign up

Export Citation Format

Share Document