Low statistical power and overestimated anthropogenic impacts, exacerbated by publication bias, dominate field studies in global change biology

Field studies are essential to reliably quantify ecological responses to global change because they are exposed to realistic climate manipulations. Yet such studies are limited in replicates, resulting in less power and, therefore, unreliable effect estimates. Further, while manipulative field experiments are assumed to be more powerful than non-manipulative observations, it has rarely been scrutinized using extensive data. Here, using 3,847 field experiments that were designed to estimate the effect of environmental stressors on ecosystems, we systematically quantified their statistical power and magnitude (Type M) and sign (Type S) errors. Our investigations focused upon the reliability of field experiments to assess the effect of stressors on both ecosystem’s response magnitude and variability. When controlling for publication bias, single experiments were underpowered to detect response magnitude (median power: 18% – 38% depending on mean difference metrics). Single experiments also had much lower power to detect response variability (6% – 12% depending on variance difference metrics) than response magnitude. Such underpowered studies could exaggerate estimates of response magnitude by 2 – 3 times (Type M errors) and variability by 4 – 10 times. Type S errors were comparatively rare. These observations indicate that low power, coupled with publication bias, inflates the estimates of anthropogenic impacts. Importantly, we found that meta-analyses largely mitigated the issues of low power and exaggerated effect size estimates. Rather surprisingly, manipulative experiments and non-manipulative observations had very similar results in terms of their power, Type M and S errors. Therefore, the previous assumption about the superiority of manipulative experiments in terms of power is overstated. These results call for highly powered field studies to reliably inform theory building and policymaking, via more collaboration and team science, and large-scale ecosystem facilities. Future studies also require transparent reporting and open science practices to approach reproducible and reliable empirical work and evidence synthesis.

Download Full-text

Low statistical power and overestimated anthropogenic impacts, exacerbated by publication bias, dominate field studies in global change biology

Global Change Biology ◽

10.1111/gcb.15972 ◽

2021 ◽

Author(s):

Yefeng Yang ◽

Helmut Hillebrand ◽

Malgorzata Lagisz ◽

Ian Cleasby ◽

Shinichi Nakagawa

Keyword(s):

Global Change ◽

Publication Bias ◽

Statistical Power ◽

Anthropogenic Impacts ◽

Field Studies ◽

Global Change Biology

Download Full-text

Publication bias and low power in field studies on goal priming

Royal Society Open Science ◽

10.1098/rsos.210544 ◽

2021 ◽

Vol 8 (10) ◽

Author(s):

David R. Shanks ◽

Miguel A. Vadillo

Keyword(s):

Low Power ◽

Publication Bias ◽

Performance Studies ◽

Meta Analysis ◽

Field Studies ◽

Future Research ◽

Workplace Performance ◽

Dependent Variables ◽

Goal Priming ◽

Selection Of

Research on goal priming asks whether the subtle activation of an achievement goal can improve task performance. Studies in this domain employ a range of priming methods, such as surreptitiously displaying a photograph of an athlete winning a race, and a range of dependent variables including measures of creativity and workplace performance. Chen, Latham, Piccolo and Itzchakov (Chen et al. 2021 J. Appl. Psychol. 70 , 216–253) recently undertook a meta-analysis of this research and reported positive overall effects in both laboratory and field studies, with field studies yielding a moderate-to-large effect that was significantly larger than that obtained in laboratory experiments. We highlight a number of issues with Chen et al .'s selection of field studies and then report a new meta-analysis ( k = 13, N = 683) that corrects these. The new meta-analysis reveals suggestive evidence of publication bias and low power in goal priming field studies. We conclude that the available evidence falls short of demonstrating goal priming effects in the workplace, and offer proposals for how future research can provide stronger tests.

Download Full-text

How to Detect Publication Bias in Psychological Research

Zeitschrift für Psychologie ◽

10.1027/2151-2604/a000386 ◽

2019 ◽

Vol 227 (4) ◽

pp. 261-279 ◽

Cited By ~ 2

Author(s):

Frank Renkewitz ◽

Melanie Keiner

Keyword(s):

Publication Bias ◽

Effect Size ◽

Statistical Power ◽

Type I Error ◽

Psychological Research ◽

Type I ◽

True Effect Size ◽

Questionable Research Practices ◽

True Effect ◽

Meta Analyses

Abstract. Publication biases and questionable research practices are assumed to be two of the main causes of low replication rates. Both of these problems lead to severely inflated effect size estimates in meta-analyses. Methodologists have proposed a number of statistical tools to detect such bias in meta-analytic results. We present an evaluation of the performance of six of these tools. To assess the Type I error rate and the statistical power of these methods, we simulated a large variety of literatures that differed with regard to true effect size, heterogeneity, number of available primary studies, and sample sizes of these primary studies; furthermore, simulated studies were subjected to different degrees of publication bias. Our results show that across all simulated conditions, no method consistently outperformed the others. Additionally, all methods performed poorly when true effect sizes were heterogeneous or primary studies had a small chance of being published, irrespective of their results. This suggests that in many actual meta-analyses in psychology, bias will remain undiscovered no matter which detection method is used.

Download Full-text

A Review of the Global Change Research on the Tibetan Plateau: From Field Observation to Manipulative Experiments

The Global Environmental Engineers ◽

10.15377/2410-3624.2020.07.3 ◽

2020 ◽

Vol 7 (1) ◽

pp. 40-51

Author(s):

Zhen-Hua Zhang ◽

◽

Hua-Kun Zhou ◽

Yao Wei

Keyword(s):

Tibetan Plateau ◽

Global Change ◽

Field Observation ◽

The Tibetan Plateau ◽

Global Change Research ◽

Manipulative Experiments

Download Full-text

Evaluating the Impact of Two Generalist Predators on the Leafhopper Erasmoneura vulnerata Population Density

Insects ◽

10.3390/insects12040321 ◽

2021 ◽

Vol 12 (4) ◽

pp. 321

Author(s):

Stefan Cristian Prazaru ◽

Giulia Zanettin ◽

Alberto Pozzebon ◽

Paola Tirello ◽

Francesco Toffoletto ◽

...

Keyword(s):

Field Experiments ◽

Field Studies ◽

Generalist Predators ◽

Predator Species ◽

Potted Plants ◽

Naturally Occurring ◽

Orius Majusculus ◽

Limited Effectiveness ◽

Pest Outbreaks ◽

The Impact

Outbreaks of the Nearctic leafhopper Erasmoneura vulnerata represent a threat to vinegrowers in Southern Europe, in particular in North-eastern Italy. The pest outbreaks are frequent in organic vineyards because insecticides labeled for organic viticulture show limited effectiveness towards leafhoppers. On the other hand, the naturally occurring predators and parasitoids of E. vulnerata in vineyards are often not able to keep leafhopper densities at acceptable levels for vine-growers. In this study, we evaluated the potential of two generalist, commercially available predators, Chrysoperla carnea and Orius majusculus, in suppressing E. vulnerata. Laboratory and semi-field experiments were carried out to evaluate both species’ predation capacity on E. vulnerata nymphs. The experiments were conducted on grapevine leaves inside Petri dishes (laboratory) and on potted and caged grapevines (semi-field); in both experiments, the leaves or potted plants were infested with E. vulnerata nymphs prior to predator releases. Both predator species exhibited a remarkable voracity and significantly reduced leafhopper densities in laboratory and semi-field experiments. Therefore, field studies were carried out over two growing seasons in two vineyards. We released 4 O. majusculus adults and 30 C. carnea larvae per m2 of canopy. Predator releases in vineyards reduced leafhopper densities by about 30% compared to the control plots. Results obtained in this study showed that the two predators have a potential to suppress the pest density, but more research is required to define appropriate predator–prey release ratios and release timing. Studies on intraguild interactions and competition with naturally occurring predators are also suggested.

Download Full-text

Powerlessness corrupts? Disproportionate monitoring of low-power actors

10.31234/osf.io/euswm ◽

2021 ◽

Author(s):

Jen Overbeck ◽

Leigh Tost ◽

Abbie Wazlawek

Keyword(s):

Low Power ◽

Agency Theory ◽

High Power ◽

Field Studies ◽

Institutional Level ◽

Self Interest ◽

Psychological Predictor ◽

Theory Research

Monitoring is a common tactic used to constrain the behavior of organizational actors. Agency theory research on monitoring focuses at the institutional level on factors such as incentives, contracts, or self-interest, largely directed at those with high power. At the same time, significant monitoring is clearly directed at low-power workers, whose performance and compliance behaviors may be rigidly controlled; arguably, the degree to which monitoring is directed at low-power more than at high-power actors is disproportionate. In this paper, we examine a psychological predictor of decisions about whom to monitor: Specifically, we contend that people’s judgments of someone’s ethicality and thereby trustworthiness are predicted by the target’s power; and these inferences on the basis of power affect decisions about whom to monitor. As a consequence, institutions may excuse powerful actors from the monitoring requirements that should constrain any ethical lapses. That is, an overly credulous view of the powerful or misdirected suspicion toward the powerless may create conditions that enable abuses by the powerful. We examine these predictions in a series of 5 studies (3 experiments and 2 field studies). Our findings challenge the notion that people subscribe to a “power corrupts” view in evaluating powerholders, and our research highlights how the very mechanism organizations put in place to constrain powerholders’ behaviors (i.e., monitoring) may, because of psychological biases in power-based inferences, be directed away from the intended targets.

Download Full-text

Statistical Reliability of 10 Years of Cyber Security User Studies

Lecture Notes in Computer Science - Socio-Technical Aspects in Security and Trust ◽

10.1007/978-3-030-79318-0_10 ◽

2021 ◽

pp. 171-190

Author(s):

Thomas Groß

Keyword(s):

Publication Bias ◽

Cyber Security ◽

Power Distribution ◽

Statistical Power ◽

Total Sample ◽

User Studies ◽

Effect Sizes ◽

Standard Errors ◽

Statistical Reliability ◽

Statistical Inferences

AbstractBackground. In recent years, cyber security user studies have been appraised in meta-research, mostly focusing on the completeness of their statistical inferences and the fidelity of their statistical reporting. However, estimates of the field’s distribution of statistical power and its publication bias have not received much attention.Aim. In this study, we aim to estimate the effect sizes and their standard errors present as well as the implications on statistical power and publication bias.Method. We built upon a published systematic literature review of 146 user studies in cyber security (2006–2016). We took into account 431 statistical inferences including t-, $$\chi ^2$$ χ 2 -, r-, one-way F-tests, and Z-tests. In addition, we coded the corresponding total sample sizes, group sizes and test families. Given these data, we established the observed effect sizes and evaluated the overall publication bias. We further computed the statistical power vis-à-vis of parametrized population thresholds to gain unbiased estimates of the power distribution.Results. We obtained a distribution of effect sizes and their conversion into comparable log odds ratios together with their standard errors. We, further, gained funnel-plot estimates of the publication bias present in the sample as well as insights into the power distribution and its consequences.Conclusions. Through the lenses of power and publication bias, we shed light on the statistical reliability of the studies in the field. The upshot of this introspection is practical recommendations on conducting and evaluating studies to advance the field.

Download Full-text

Improvement of Statistical Power to Detect Publication Bias in Meta-analysis Using the Clinical Trial Registration System

Japanese Journal of Biometrics ◽

10.5691/jjb.32.13 ◽

2011 ◽

Vol 32 (1) ◽

pp. 13-31

Author(s):

Nobushige Matsuoka ◽

Hiroshi Horio ◽

Chikuma Hamada

Keyword(s):

Clinical Trial ◽

Publication Bias ◽

Statistical Power ◽

Meta Analysis ◽

Trial Registration ◽

Clinical Trial Registration ◽

Registration System

Download Full-text

The earth is flat (p>0.05): Significance thresholds and the crisis of unreplicable research

10.7287/peerj.preprints.2921v1 ◽

2017 ◽

Cited By ~ 1

Author(s):

Valentin Amrhein ◽

Fränzi Korner-Nievergelt ◽

Tobias Roth

Keyword(s):

Publication Bias ◽

Null Hypothesis ◽

Statistical Power ◽

Alternative Hypothesis ◽

Statistical Significance ◽

Practical Importance ◽

Decision Rules ◽

Effect Sizes ◽

P Values ◽

True Effect

The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (American Statistical Association, Wasserstein & Lazar 2016). We review why degrading p-values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values can tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p≤0.05) is hardly replicable: at a realistic statistical power of 40%, given that there is a true effect, only one in six studies will significantly replicate the significant result of another study. Even at a good power of 80%, results from two studies will be conflicting, in terms of significance, in one third of the cases if there is a true effect. This means that a replication cannot be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgement based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to publication bias against nonsignificant findings. Data dredging, p-hacking and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the observed effect size, e.g., from a sample average, and from a measure of uncertainty, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, such as 'we need more stringent decision rules', 'sample sizes will decrease' or 'we need to get rid of p-values'.

Download Full-text

The watching eyes effect on charitable donation is boosted by fewer people in the vicinity

Letters on Evolutionary Behavioral Science ◽

10.5178/lebs.2016.52 ◽

2016 ◽

Vol 7 (2) ◽

pp. 9-12

Author(s):

Ryo Oda ◽

Ryota Ichihashi

Keyword(s):

Field Study ◽

Charitable Giving ◽

Field Experiments ◽

Field Studies ◽

Prosocial Behaviors ◽

Charitable Donation ◽

Charitable Donations ◽

Large Groups

Previous field experiments have found that artificial surveillance cues facilitated prosocial behaviors such as charitable donations and littering. Several previous field studies found that the artificial surveillance cue effect was stronger when few individuals were in the vicinity; however, others reported that the effect was stronger in large groups of people. Here, we report the results of a field study examining the effect of an artificial surveillance cue (stylized eyes) on charitable giving. Three collection boxes were placed in different locations around an izakaya (a Japanese-style tavern) for 84 days. The amount donated was counted each experimental day, and the izakaya staff provided the number of patrons who visited each day. We found that the effect of the stylized eyes was more salient when fewer patrons were in the izakaya. Our findings suggest that the effect of the artificial surveillance cue is similar to that of “real” cues and that the effect on charitable giving may weaken when people habituate to being watched by “real” eyes.

Download Full-text