scholarly journals Low statistical power and overestimated anthropogenic impacts, exacerbated by publication bias, dominate field studies in global change biology

2021 ◽  
Author(s):  
Yefeng Yang ◽  
Helmut Hillebrand ◽  
Malgorzata Lagisz ◽  
Ian Cleasby ◽  
Shinichi Nakagawa

Field studies are essential to reliably quantify ecological responses to global change because they are exposed to realistic climate manipulations. Yet such studies are limited in replicates, resulting in less power and, therefore, unreliable effect estimates. Further, while manipulative field experiments are assumed to be more powerful than non-manipulative observations, it has rarely been scrutinized using extensive data. Here, using 3,847 field experiments that were designed to estimate the effect of environmental stressors on ecosystems, we systematically quantified their statistical power and magnitude (Type M) and sign (Type S) errors. Our investigations focused upon the reliability of field experiments to assess the effect of stressors on both ecosystem’s response magnitude and variability. When controlling for publication bias, single experiments were underpowered to detect response magnitude (median power: 18% – 38% depending on mean difference metrics). Single experiments also had much lower power to detect response variability (6% – 12% depending on variance difference metrics) than response magnitude. Such underpowered studies could exaggerate estimates of response magnitude by 2 – 3 times (Type M errors) and variability by 4 – 10 times. Type S errors were comparatively rare. These observations indicate that low power, coupled with publication bias, inflates the estimates of anthropogenic impacts. Importantly, we found that meta-analyses largely mitigated the issues of low power and exaggerated effect size estimates. Rather surprisingly, manipulative experiments and non-manipulative observations had very similar results in terms of their power, Type M and S errors. Therefore, the previous assumption about the superiority of manipulative experiments in terms of power is overstated. These results call for highly powered field studies to reliably inform theory building and policymaking, via more collaboration and team science, and large-scale ecosystem facilities. Future studies also require transparent reporting and open science practices to approach reproducible and reliable empirical work and evidence synthesis.

2021 ◽  
Vol 8 (10) ◽  
Author(s):  
David R. Shanks ◽  
Miguel A. Vadillo

Research on goal priming asks whether the subtle activation of an achievement goal can improve task performance. Studies in this domain employ a range of priming methods, such as surreptitiously displaying a photograph of an athlete winning a race, and a range of dependent variables including measures of creativity and workplace performance. Chen, Latham, Piccolo and Itzchakov (Chen et al. 2021 J. Appl. Psychol. 70 , 216–253) recently undertook a meta-analysis of this research and reported positive overall effects in both laboratory and field studies, with field studies yielding a moderate-to-large effect that was significantly larger than that obtained in laboratory experiments. We highlight a number of issues with Chen et al .'s selection of field studies and then report a new meta-analysis ( k = 13, N = 683) that corrects these. The new meta-analysis reveals suggestive evidence of publication bias and low power in goal priming field studies. We conclude that the available evidence falls short of demonstrating goal priming effects in the workplace, and offer proposals for how future research can provide stronger tests.


2019 ◽  
Vol 227 (4) ◽  
pp. 261-279 ◽  
Author(s):  
Frank Renkewitz ◽  
Melanie Keiner

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.


Insects ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 321
Author(s):  
Stefan Cristian Prazaru ◽  
Giulia Zanettin ◽  
Alberto Pozzebon ◽  
Paola Tirello ◽  
Francesco Toffoletto ◽  
...  

Outbreaks of the Nearctic leafhopper Erasmoneura vulnerata represent a threat to vinegrowers in Southern Europe, in particular in North-eastern Italy. The pest outbreaks are frequent in organic vineyards because insecticides labeled for organic viticulture show limited effectiveness towards leafhoppers. On the other hand, the naturally occurring predators and parasitoids of E. vulnerata in vineyards are often not able to keep leafhopper densities at acceptable levels for vine-growers. In this study, we evaluated the potential of two generalist, commercially available predators, Chrysoperla carnea and Orius majusculus, in suppressing E. vulnerata. Laboratory and semi-field experiments were carried out to evaluate both species’ predation capacity on E. vulnerata nymphs. The experiments were conducted on grapevine leaves inside Petri dishes (laboratory) and on potted and caged grapevines (semi-field); in both experiments, the leaves or potted plants were infested with E. vulnerata nymphs prior to predator releases. Both predator species exhibited a remarkable voracity and significantly reduced leafhopper densities in laboratory and semi-field experiments. Therefore, field studies were carried out over two growing seasons in two vineyards. We released 4 O. majusculus adults and 30 C. carnea larvae per m2 of canopy. Predator releases in vineyards reduced leafhopper densities by about 30% compared to the control plots. Results obtained in this study showed that the two predators have a potential to suppress the pest density, but more research is required to define appropriate predator–prey release ratios and release timing. Studies on intraguild interactions and competition with naturally occurring predators are also suggested.


2021 ◽  
Author(s):  
Jen Overbeck ◽  
Leigh Tost ◽  
Abbie Wazlawek

Monitoring is a common tactic used to constrain the behavior of organizational actors. Agency theory research on monitoring focuses at the institutional level on factors such as incentives, contracts, or self-interest, largely directed at those with high power. At the same time, significant monitoring is clearly directed at low-power workers, whose performance and compliance behaviors may be rigidly controlled; arguably, the degree to which monitoring is directed at low-power more than at high-power actors is disproportionate. In this paper, we examine a psychological predictor of decisions about whom to monitor: Specifically, we contend that people’s judgments of someone’s ethicality and thereby trustworthiness are predicted by the target’s power; and these inferences on the basis of power affect decisions about whom to monitor. As a consequence, institutions may excuse powerful actors from the monitoring requirements that should constrain any ethical lapses. That is, an overly credulous view of the powerful or misdirected suspicion toward the powerless may create conditions that enable abuses by the powerful. We examine these predictions in a series of 5 studies (3 experiments and 2 field studies). Our findings challenge the notion that people subscribe to a “power corrupts” view in evaluating powerholders, and our research highlights how the very mechanism organizations put in place to constrain powerholders’ behaviors (i.e., monitoring) may, because of psychological biases in power-based inferences, be directed away from the intended targets.


Author(s):  
Thomas Groß

AbstractBackground. In recent years, cyber security user studies have been appraised in meta-research, mostly focusing on the completeness of their statistical inferences and the fidelity of their statistical reporting. However, estimates of the field’s distribution of statistical power and its publication bias have not received much attention.Aim. In this study, we aim to estimate the effect sizes and their standard errors present as well as the implications on statistical power and publication bias.Method. We built upon a published systematic literature review of 146 user studies in cyber security (2006–2016). We took into account 431 statistical inferences including t-, $$\chi ^2$$ χ 2 -, r-, one-way F-tests, and Z-tests. In addition, we coded the corresponding total sample sizes, group sizes and test families. Given these data, we established the observed effect sizes and evaluated the overall publication bias. We further computed the statistical power vis-à-vis of parametrized population thresholds to gain unbiased estimates of the power distribution.Results. We obtained a distribution of effect sizes and their conversion into comparable log odds ratios together with their standard errors. We, further, gained funnel-plot estimates of the publication bias present in the sample as well as insights into the power distribution and its consequences.Conclusions. Through the lenses of power and publication bias, we shed light on the statistical reliability of the studies in the field. The upshot of this introspection is practical recommendations on conducting and evaluating studies to advance the field.


Author(s):  
Valentin Amrhein ◽  
Fränzi Korner-Nievergelt ◽  
Tobias Roth

The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (American Statistical Association, Wasserstein & Lazar 2016). We review why degrading p-values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values can tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p≤0.05) is hardly replicable: at a realistic statistical power of 40%, given that there is a true effect, only one in six studies will significantly replicate the significant result of another study. Even at a good power of 80%, results from two studies will be conflicting, in terms of significance, in one third of the cases if there is a true effect. This means that a replication cannot be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgement based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to publication bias against nonsignificant findings. Data dredging, p-hacking and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the observed effect size, e.g., from a sample average, and from a measure of uncertainty, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, such as 'we need more stringent decision rules', 'sample sizes will decrease' or 'we need to get rid of p-values'.


2016 ◽  
Vol 7 (2) ◽  
pp. 9-12
Author(s):  
Ryo Oda ◽  
Ryota Ichihashi

Previous field experiments have found that artificial surveillance cues facilitated prosocial behaviors such as charitable donations and littering. Several previous field studies found that the artificial surveillance cue effect was stronger when few individuals were in the vicinity; however, others reported that the effect was stronger in large groups of people. Here, we report the results of a field study examining the effect of an artificial surveillance cue (stylized eyes) on charitable giving. Three collection boxes were placed in different locations around an izakaya (a Japanese-style tavern) for 84 days. The amount donated was counted each experimental day, and the izakaya staff provided the number of patrons who visited each day. We found that the effect of the stylized eyes was more salient when fewer patrons were in the izakaya. Our findings suggest that the effect of the artificial surveillance cue is similar to that of “real” cues and that the effect on charitable giving may weaken when people habituate to being watched by “real” eyes. 


Sign in / Sign up

Export Citation Format

Share Document