scholarly journals Obtaining Evidence for No Effect

2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Zoltan Dienes

Obtaining evidence that something does not exist requires knowing how big it would be were it to exist. Testing a theory that predicts an effect thus entails specifying the range of effect sizes consistent with the theory, in order to know when the evidence counts against the theory. Indeed, a theoretically relevant effect size must be specified for power calculations, equivalence testing, and Bayes factors in order that the inferential statistics test the theory. Specifying relevant effect sizes for power, or the equivalence region for equivalence testing, or the scale factor for Bayes factors, is necessary for many journal formats, such as registered reports, and should be necessary for all articles that use hypothesis testing. Yet there is little systematic advice on how to approach this problem. This article offers some principles and practical advice for specifying theoretically relevant effect sizes for hypothesis testing.

2020 ◽  
Author(s):  
Zoltan Dienes

Obtaining evidence that something does not exist requires knowing how big it would be were it to exist. Testing a theory that predicts an effect thus entails specifying the range of effect sizes consistent with the theory, in order to know when the evidence counts against the theory. Indeed, a theoretically relevant effect size must be specified for power calculations, equivalence testing, and Bayes factors in order that the inferential statistics test the theory. Specifying relevant effect sizes for power, or the equivalence region for equivalence testing, or the scale factor for Bayes factors, is necessary for many journal formats, such as registered reports, and should be necessary for all articles that use hypothesis testing. Yet there is little systematic advice on how to approach this problem. This article offers some principles and practical advice for specifying theoretically relevant effect sizes for hypothesis testing.


2020 ◽  
Author(s):  
Robbie Cornelis Maria van Aert ◽  
Joris Mulder

Meta-analysis methods are used to synthesize results of multiple studies on the same topic. The most frequently used statistical model in meta-analysis is the random-effects model containing parameters for the average effect, between-study variance in primary study's true effect size, and random effects for the study specific effects. We propose Bayesian hypothesis testing and estimation methods using the marginalized random-effects meta-analysis (MAREMA) model where the study specific true effects are regarded as nuisance parameters which are integrated out of the model. A flat prior distribution is placed on the overall effect size in case of estimation and a proper unit information prior for the overall effect size is proposed in case of hypothesis testing. For the between-study variance in true effect size, a proper uniform prior is placed on the proportion of total variance that can be attributed to between-study variability. Bayes factors are used for hypothesis testing that allow testing point and one-sided hypotheses. The proposed methodology has several attractive properties. First, the proposed MAREMA model encompasses models with a zero, negative, and positive between-study variance, which enables testing a zero between-study variance as it is not a boundary problem. Second, the methodology is suitable for default Bayesian meta-analyses as it requires no prior information about the unknown parameters. Third, the methodology can even be used in the extreme case when only two studies are available, because Bayes factors are not based on large sample theory. We illustrate the developed methods by applying it to two meta-analyses and introduce easy-to-use software in the R package BFpack to compute the proposed Bayes factors.


2019 ◽  
Vol 28 (4) ◽  
pp. 468-485 ◽  
Author(s):  
Paul HP Hanel ◽  
David MA Mehler

Transparent communication of research is key to foster understanding within and beyond the scientific community. An increased focus on reporting effect sizes in addition to p value–based significance statements or Bayes Factors may improve scientific communication with the general public. Across three studies ( N = 652), we compared subjective informativeness ratings for five effect sizes, Bayes Factor, and commonly used significance statements. Results showed that Cohen’s U3 was rated as most informative. For example, 440 participants (69%) found U3 more informative than Cohen’s d, while 95 (15%) found d more informative than U3, with 99 participants (16%) finding both effect sizes equally informative. This effect was not moderated by level of education. We therefore suggest that in general, Cohen’s U3 is used when scientific findings are communicated. However, the choice of the effect size may vary depending on what a researcher wants to highlight (e.g. differences or similarities).


2016 ◽  
Author(s):  
Felix D. Schönbrodt ◽  
Eric-Jan Wagenmakers ◽  
Michael Zehetleitner ◽  
Marco Perugini

Unplanned optional stopping rules have been criticized for inflating Type I error rates under the null hypothesis significance testing (NHST) paradigm. Despite these criticisms this research practice is not uncommon, probably as it appeals to researcher’s intuition to collect more data in order to push an indecisive result into a decisive region. In this contribution we investigate the properties of a procedure for Bayesian hypothesis testing that allows optional stopping with unlimited multiple testing, even after each participant. In this procedure, which we call Sequential Bayes Factors (SBF), Bayes factors are computed until an a priori defined level of evidence is reached. This allows flexible sampling plans and is not dependent upon correct effect size guesses in an a priori power analysis. We investigated the long-term rate of misleading evidence, the average expected sample sizes, and the biasedness of effect size estimates when an SBF design is applied to a test of mean differences between two groups. Compared to optimal NHST, the SBF design typically needs 50% to 70% smaller samples to reach a conclusion about the presence of an effect, while having the same or lower long-term rate of wrong inference.


1998 ◽  
Vol 21 (2) ◽  
pp. 210-211 ◽  
Author(s):  
Stephan Lewandowsky ◽  
Murray Maybery

We take up two issues discussed by Chow: the claim by critics of hypothesis testing that the null hypothesis (H0) is always false, and the claim that reporting effect sizes is more appropriate than relying on statistical significance. Concerning the former, we agree with Chow's sentiment despite noting serious shortcomings in his discussion. Concerning the latter, we agree with Chow that effect size need not translate into scientific relevance, and furthermore reiterate that with small samples effect size measures cannot substitute for significance.


Author(s):  
Robbie C. M. van Aert ◽  
Joris Mulder

AbstractMeta-analysis methods are used to synthesize results of multiple studies on the same topic. The most frequently used statistical model in meta-analysis is the random-effects model containing parameters for the overall effect, between-study variance in primary study’s true effect size, and random effects for the study-specific effects. We propose Bayesian hypothesis testing and estimation methods using the marginalized random-effects meta-analysis (MAREMA) model where the study-specific true effects are regarded as nuisance parameters which are integrated out of the model. We propose using a flat prior distribution on the overall effect size in case of estimation and a proper unit information prior for the overall effect size in case of hypothesis testing. For the between-study variance (which can attain negative values under the MAREMA model), a proper uniform prior is placed on the proportion of total variance that can be attributed to between-study variability. Bayes factors are used for hypothesis testing that allow testing point and one-sided hypotheses. The proposed methodology has several attractive properties. First, the proposed MAREMA model encompasses models with a zero, negative, and positive between-study variance, which enables testing a zero between-study variance as it is not a boundary problem. Second, the methodology is suitable for default Bayesian meta-analyses as it requires no prior information about the unknown parameters. Third, the proposed Bayes factors can even be used in the extreme case when only two studies are available because Bayes factors are not based on large sample theory. We illustrate the developed methods by applying it to two meta-analyses and introduce easy-to-use software in the R package to compute the proposed Bayes factors.


Methodology ◽  
2019 ◽  
Vol 15 (3) ◽  
pp. 97-105
Author(s):  
Rodrigo Ferrer ◽  
Antonio Pardo

Abstract. In a recent paper, Ferrer and Pardo (2014) tested several distribution-based methods designed to assess when test scores obtained before and after an intervention reflect a statistically reliable change. However, we still do not know how these methods perform from the point of view of false negatives. For this purpose, we have simulated change scenarios (different effect sizes in a pre-post-test design) with distributions of different shapes and with different sample sizes. For each simulated scenario, we generated 1,000 samples. In each sample, we recorded the false-negative rate of the five distribution-based methods with the best performance from the point of view of the false positives. Our results have revealed unacceptable rates of false negatives even with effects of very large size, starting from 31.8% in an optimistic scenario (effect size of 2.0 and a normal distribution) to 99.9% in the worst scenario (effect size of 0.2 and a highly skewed distribution). Therefore, our results suggest that the widely used distribution-based methods must be applied with caution in a clinical context, because they need huge effect sizes to detect a true change. However, we made some considerations regarding the effect size and the cut-off points commonly used which allow us to be more precise in our estimates.


2018 ◽  
Author(s):  
Nataly Beribisky ◽  
Heather Davidson ◽  
Rob Cribbie

Researchers often need to consider the practical significance of a relationship. For example, interpreting the magnitude of an effect size or establishing bounds in equivalence testing requires knowledge of the meaningfulness of a relationship. However, there has been little research exploring the degree of relationship among variables (e.g., correlation, mean difference) necessary for an association to be interpreted as meaningful or practically significant. In this study, we presented statistically trained and untrained participants with a collection of figures that displayed varying degrees of mean difference between groups or correlations among variables and participants indicated whether or not each relationship was meaningful. The results suggest that statistically trained and untrained participants differ in their qualification of a meaningful relationship, and that there is significant variability in how large a relationship must be before it is labeled meaningful. The results also shed some light on what degree of relationship is considered meaningful by individuals in a context-free setting.


Sign in / Sign up

Export Citation Format

Share Document