scholarly journals Investigating the replicability of preclinical cancer biology

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Timothy M Errington ◽  
Maya Mathur ◽  
Courtney K Soderberg ◽  
Alexandria Denis ◽  
Nicole Perfito ◽  
...  

Replicability is an important feature of scientific research, but aspects of contemporary research culture, such as an emphasis on novelty, can make replicability seem less important than it should be. The Reproducibility Project: Cancer Biology was set up to provide evidence about the replicability of preclinical research in cancer biology by repeating selected experiments from high-impact papers. A total of 50 experiments from 23 papers were repeated, generating data about the replicability of a total of 158 effects. Most of the original effects were positive effects (136), with the rest being null effects (22). A majority of the original effect sizes were reported as numerical values (117), with the rest being reported as representative images (41). We employed seven methods to assess replicability, and some of these methods were not suitable for all the effects in our sample. One method compared effect sizes: for positive effects, the median effect size in the replications was 85% smaller than the median effect size in the original experiments, and 92% of replication effect sizes were smaller than the original. The other methods were binary – the replication was either a success or a failure – and five of these methods could be used to assess both positive and null effects when effect sizes were reported as numerical values. For positive effects, 40% of replications (39/97) succeeded according to three or more of these five methods, and for null effects 80% of replications (12/15) were successful on this basis; combining positive and null effects, the success rate was 46% (51/112). A successful replication does not definitively confirm an original finding or its theoretical interpretation. Equally, a failure to replicate does not disconfirm a finding, but it does suggest that additional investigation is needed to establish its reliability.

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Timothy M Errington ◽  
Alexandria Denis ◽  
Nicole Perfito ◽  
Elizabeth Iorns ◽  
Brian A Nosek

We conducted the Reproducibility Project: Cancer Biology to investigate the replicability of preclinical research in cancer biology. The initial aim of the project was to repeat 193 experiments from 53 high-impact papers, using an approach in which the experimental protocols and plans for data analysis had to be peer reviewed and accepted for publication before experimental work could begin. However, the various barriers and challenges we encountered while designing and conducting the experiments meant that we were only able to repeat 50 experiments from 23 papers. Here we report these barriers and challenges. First, many original papers failed to report key descriptive and inferential statistics: the data needed to compute effect sizes and conduct power analyses was publicly accessible for just 4 of 193 experiments. Moreover, despite contacting the authors of the original papers, we were unable to obtain these data for 68% of the experiments. Second, none of the 193 experiments were described in sufficient detail in the original paper to enable us to design protocols to repeat the experiments, so we had to seek clarifications from the original authors. While authors were extremely or very helpful for 41% of experiments, they were minimally helpful for 9% of experiments, and not at all helpful (or did not respond to us) for 32% of experiments. Third, once experimental work started, 67% of the peer-reviewed protocols required modifications to complete the research and just 41% of those modifications could be implemented. Cumulatively, these three factors limited the number of experiments that could be repeated. This experience draws attention to a basic and fundamental concern about replication – it is hard to assess whether reported findings are credible.


Author(s):  
Ciro Conversano ◽  
Graziella Orrù ◽  
Andrea Pozza ◽  
Mario Miccoli ◽  
Rebecca Ciacchini ◽  
...  

Background: Hypertension is among the most important risk factors for cardiovascular diseases, which are considered high mortality risk medical conditions. To date, several studies have reported positive effects of mindfulness-based stress reduction (MBSR) interventions on physical and psychological well-being in other medical conditions, but no meta-analysis on MBSR programs for hypertension has been conducted. Objectives: The objective of this study was to determine the effectiveness of MBSR programs for hypertension. Methods: A systematic review and meta-analysis of randomized controlled trials examining the effects of MBSR on systolic and diastolic blood pressure (BP), anxiety, depression, and perceived stress in people with hypertension or pre-hypertension was conducted. The PubMed/MEDLINE and PsycINFO databases were searched in November 2020 to identify relevant studies. Results: Six studies were included. The comparison of MBSR versus control conditions on diastolic BP was associated with a statistically significant mean effect size favoring MBSR over control conditions (MD = −2.029; 95% confidence interval (CI): −3.676 to −0.383, p = 0.016, k = 6; 22 effect sizes overall), without evidence of heterogeneity (I2 = 0.000%). The comparison of MBSR versus control conditions on systolic BP was associated with a mean effect size which was statistically significant only at a marginal level (MD = −3.894; 95% CI: −7.736–0.053, p = 0.047, k = 6; 22 effect sizes overall), without evidence of high heterogeneity (I2 = 20.772%). The higher the proportion of participants on antihypertensive medications was, the larger the effects of MBSR were on systolic BP (B = −0.750, z = −2.73, p = 0.003). Conclusions: MBSR seems to be a promising intervention, particularly effective on the reduction of diastolic BP. More well-conducted trials are required.


Methodology ◽  
2019 ◽  
Vol 15 (3) ◽  
pp. 97-105
Author(s):  
Rodrigo Ferrer ◽  
Antonio Pardo

Abstract. In a recent paper, Ferrer and Pardo (2014) tested several distribution-based methods designed to assess when test scores obtained before and after an intervention reflect a statistically reliable change. However, we still do not know how these methods perform from the point of view of false negatives. For this purpose, we have simulated change scenarios (different effect sizes in a pre-post-test design) with distributions of different shapes and with different sample sizes. For each simulated scenario, we generated 1,000 samples. In each sample, we recorded the false-negative rate of the five distribution-based methods with the best performance from the point of view of the false positives. Our results have revealed unacceptable rates of false negatives even with effects of very large size, starting from 31.8% in an optimistic scenario (effect size of 2.0 and a normal distribution) to 99.9% in the worst scenario (effect size of 0.2 and a highly skewed distribution). Therefore, our results suggest that the widely used distribution-based methods must be applied with caution in a clinical context, because they need huge effect sizes to detect a true change. However, we made some considerations regarding the effect size and the cut-off points commonly used which allow us to be more precise in our estimates.


2021 ◽  
pp. 016264342198997
Author(s):  
Sojung Jung ◽  
Ciara Ousley ◽  
David McNaughton ◽  
Pamela Wolfe

In this meta-analytic review, we investigated the effects of technology supports on the acquisition of shopping skills for students with intellectual and developmental disabilities (IDD) between the ages of 5 and 24. Nineteen single-case experimental research studies, presented in 15 research articles, met the current study’s inclusion criteria and the What Works Clearinghouse (WWC) standards. An analysis of potential moderators was conducted, and we calculated effect sizes using Tau-U to examine the impact of age, diagnosis, and type of technology on the reported outcomes for the 56 participants. The results from the included studies provide evidence that a wide range of technology interventions had a positive impact on shopping performance. These positive effects were seen for individuals across a wide range of ages and disability types, and for a wide variety of shopping skills. The strongest effect sizes were observed for technologies that provided visual supports rather than just auditory support. We provide an interpretation of the findings, implications of the results, and recommended areas for future research.


Author(s):  
Jeanne Gubbels ◽  
Claudia E. van der Put ◽  
Geert-Jan J. M. Stams ◽  
Mark Assink

AbstractSchool-based programs seem promising for child abuse prevention. However, research mainly focused on sexual child abuse and knowledge is lacking on how individual program components contribute to the effectiveness of school-based prevention programs for any form of child abuse. This study aimed to examine the overall effect of these school-based programs on (a) children’s child abuse-related knowledge and (b) self-protection skills by conducting two three-level meta-analyses. Furthermore, moderator analyses were performed to identify how program components and delivery techniques were associated with effectiveness. A literature search yielded 34 studies (158 effect sizes; N = 11,798) examining knowledge of child abuse and 22 studies (99 effect sizes; N = 7804) examining self-protection skills. A significant overall effect was found of school-based programs on both knowledge (d = 0.572, 95% CI [0.408, 0.737], p < 0.001) and self-protection skills (d = 0.528, 95% CI [0.262, 0.794], p < 0.001). The results of the first meta-analysis on children’s child abuse knowledge suggest that program effects were larger in programs addressing social–emotional skills of children (d = 0.909 for programs with this component versus d = 0.489 for programs without this component) and self-blame (d = 0.776 versus d = 0.412), and when puppets (d = 1.096 versus d = 0.500) and games or quizzes (d = 0.966 versus d = 0.494) were used. The second meta-analysis on children’s self-protections skills revealed that no individual components or techniques were associated with increased effectiveness. Several other study and program characteristics did moderate the overall effects and are discussed. In general, school-based prevention programs show positive effects on both knowledge and self-protection skills, and the results imply that program effectiveness can be improved by implementing specific components and techniques.


2021 ◽  
pp. 174077452098487
Author(s):  
Brian Freed ◽  
Brian Williams ◽  
Xiaolu Situ ◽  
Victoria Landsman ◽  
Jeehyoung Kim ◽  
...  

Background: Blinding aims to minimize biases from what participants and investigators know or believe. Randomized controlled trials, despite being the gold standard to evaluate treatment effect, do not generally assess the success of blinding. We investigated the extent of blinding in back pain trials and the associations between participant guesses and treatment effects. Methods: We did a review with PubMed/OvidMedline, 2000–2019. Eligibility criteria were back pain trials with data available on treatment effect and participants’ guess of treatment. For blinding, blinding index was used as chance-corrected measure of excessive correct guess (0 for random guess). For treatment effects, within- or between-arm effect sizes were used. Analyses of investigators’ guess/blinding or by treatment modality were performed exploratorily. Results: Forty trials (3899 participants) were included. Active and sham treatment groups had mean blinding index of 0.26 (95% confidence interval: 0.12, 0.41) and 0.01 (−0.11, 0.14), respectively, meaning 26% of participants in active treatment believed they received active treatment, whereas only 1% in sham believed they received sham treatment, beyond chance, that is, random guess. A greater belief of receiving active treatment was associated with a larger within-arm effect size in both arms, and ideal blinding (namely, “random guess,” and “wishful thinking” that signifies both groups believing they received active treatment) showed smaller effect sizes, with correlation of effect size and summary blinding indexes of 0.35 ( p = 0.028) for between-arm comparison. We observed uniformly large sham treatment effects for all modalities, and larger correlation for investigator’s (un)blinding, 0.53 ( p = 0.046). Conclusion: Participants in active treatments in back pain trials guessed treatment identity more correctly, while those in sham treatments tended to display successful blinding. Excessive correct guesses (that could reflect weaker blinding and/or noticeable effects) by participants and investigators demonstrated larger effect sizes. Blinding and sham treatment effects on back pain need due consideration in individual trials and meta-analyses.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Liansheng Larry Tang ◽  
Michael Caudy ◽  
Faye Taxman

Multiple meta-analyses may use similar search criteria and focus on the same topic of interest, but they may yield different or sometimes discordant results. The lack of statistical methods for synthesizing these findings makes it challenging to properly interpret the results from multiple meta-analyses, especially when their results are conflicting. In this paper, we first introduce a method to synthesize the meta-analytic results when multiple meta-analyses use the same type of summary effect estimates. When meta-analyses use different types of effect sizes, the meta-analysis results cannot be directly combined. We propose a two-step frequentist procedure to first convert the effect size estimates to the same metric and then summarize them with a weighted mean estimate. Our proposed method offers several advantages over existing methods by Hemming et al. (2012). First, different types of summary effect sizes are considered. Second, our method provides the same overall effect size as conducting a meta-analysis on all individual studies from multiple meta-analyses. We illustrate the application of the proposed methods in two examples and discuss their implications for the field of meta-analysis.


Author(s):  
H. S. Styn ◽  
S. M. Ellis

The determination of significance of differences in means and of relationships between variables is of importance in many empirical studies. Usually only statistical significance is reported, which does not necessarily indicate an important (practically significant) difference or relationship. With studies based on probability samples, effect size indices should be reported in addition to statistical significance tests in order to comment on practical significance. Where complete populations or convenience samples are worked with, the determination of statistical significance is strictly speaking no longer relevant, while the effect size indices can be used as a basis to judge significance. In this article attention is paid to the use of effect size indices in order to establish practical significance. It is also shown how these indices are utilized in a few fields of statistical application and how it receives attention in statistical literature and computer packages. The use of effect sizes is illustrated by a few examples from the research literature.


2019 ◽  
Author(s):  
Adam Altmejd ◽  
Anna Dreber ◽  
Eskil Forsell ◽  
Teck Hua Ho ◽  
Juergen Huber ◽  
...  

We measure how accurately replication of experimental results can be predicted by a black-box statistical model. With data from four large- scale replication projects in experimental psychology and economics, and techniques from machine learning, we train a predictive model and study which variables drive predictable replication.The model predicts binary replication with a cross validated accuracy rate of 70% (AUC of 0.79) and relative effect size with a Spearman ρ of 0.38. The accuracy level is similar to the market-aggregated beliefs of peer scientists (Camerer et al., 2016; Dreber et al., 2015). The predictive power is validated in a pre-registered out of sample test of the outcome of Camerer et al. (2018b), where 71% (AUC of 0.73) of replications are predicted correctly and effect size correlations amount to ρ = 0.25.Basic features such as the sample and effect sizes in original papers, and whether reported effects are single-variable main effects or two- variable interactions, are predictive of successful replication. The models presented in this paper are simple tools to produce cheap, prognostic replicability metrics. These models could be useful in institutionalizing the process of evaluation of new findings and guiding resources to those direct replications that are likely to be most informative.


2020 ◽  
pp. 1-9
Author(s):  
Devin S. Kielur ◽  
Cameron J. Powden

Context: Impaired dorsiflexion range of motion (DFROM) has been established as a predictor of lower-extremity injury. Compression tissue flossing (CTF) may address tissue restrictions associated with impaired DFROM; however, a consensus is yet to support these effects. Objectives: To summarize the available literature regarding CTF on DFROM in physically active individuals. Evidence Acquisition: PubMed and EBSCOhost (CINAHL, MEDLINE, and SPORTDiscus) were searched from 1965 to July 2019 for related articles using combination terms related to CTF and DRFOM. Articles were included if they measured the immediate effects of CTF on DFROM. Methodological quality was assessed using the Physiotherapy Evidence Database scale. The level of evidence was assessed using the Strength of Recommendation Taxonomy. The magnitude of CTF effects from pre-CTF to post-CTF and compared with a control of range of motion activities only were examined using Hedges g effect sizes and 95% confidence intervals. Randomeffects meta-analysis was performed to synthesize DFROM changes. Evidence Synthesis: A total of 6 studies were included in the analysis. The average Physiotherapy Evidence Database score was 60% (range = 30%–80%) with 4 out of 6 studies considered high quality and 2 as low quality. Meta-analysis indicated no DFROM improvements for CTF compared with range of motion activities only (effect size = 0.124; 95% confidence interval, −0.137 to 0.384; P = .352) and moderate improvements from pre-CTF to post-CTF (effect size = 0.455; 95% confidence interval, 0.022 to 0.889; P = .040). Conclusions: There is grade B evidence to suggest CTF may have no effect on DFROM when compared with a control of range of motion activities only and results in moderate improvements from pre-CTF to post-CTF. This suggests that DFROM improvements were most likely due to exercises completed rather than the band application.


Sign in / Sign up

Export Citation Format

Share Document