Is the Replicability Crisis Overblown? Three Arguments Examined

We discuss three arguments voiced by scientists who view the current outpouring of concern about replicability as overblown. The first idea is that the adoption of a low alpha level (e.g., 5%) puts reasonable bounds on the rate at which errors can enter the published literature, making false-positive effects rare enough to be considered a minor issue. This, we point out, rests on statistical misunderstanding: The alpha level imposes no limit on the rate at which errors may arise in the literature (Ioannidis, 2005b). Second, some argue that whereas direct replication attempts are uncommon, conceptual replication attempts are common—providing an even better test of the validity of a phenomenon. We contend that performing conceptual rather than direct replication attempts interacts insidiously with publication bias, opening the door to literatures that appear to confirm the reality of phenomena that in fact do not exist. Finally, we discuss the argument that errors will eventually be pruned out of the literature if the field would just show a bit of patience. We contend that there are no plausible concrete scenarios to back up such forecasts and that what is needed is not patience, but rather systematic reforms in scientific practice.

Download Full-text

Identification of and Correction for Publication Bias: Comment

10.31222/osf.io/dh87m ◽

2019 ◽

Author(s):

Amanda Kvarven ◽

Eirik Strømland ◽

Magnus Johannesson

Keyword(s):

Publication Bias ◽

False Positive ◽

Large Scale ◽

Meta Analysis ◽

False Positive Rate ◽

Effect Sizes ◽

Replication Studies ◽

Moderate Reduction ◽

Positive Rate ◽

Meta Analyses

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.

Download Full-text

Inflated false negative rates undermine reproducibility in task-based fMRI

10.1101/122788 ◽

2017 ◽

Cited By ~ 7

Author(s):

G. Lohmann ◽

J. Stelzer ◽

K. Müller ◽

E. Lacosse ◽

T. Buschmann ◽

...

Keyword(s):

False Positive ◽

False Negative ◽

Sample Sizes ◽

Test Statistic ◽

Scientific Validity ◽

Full Cohort ◽

Positive Effects ◽

Software Packages ◽

Human Connectome Project

AbstractReproducibility is generally regarded as a hallmark of scientific validity. It can be undermined by two very different factors, namely inflated false positive rates or inflated false negative rates. Here we investigate the role of the second factor, i.e. the degree to which true effects are not detected reliably. The availability of large public databases and also supercomputing allows us to tackle this problem quantitatively. Specifically, we estimated the reproducibility in task-based fMRI data over different samples randomly drawn from a large cohort of subjects obtained from the Human Connectome Project. We use the full cohort as a standard of reference to approximate true positive effects, and compute the fraction of those effects that was detected reliably using standard software packages at various smaller sample sizes. We found that with standard sample sizes this fraction was less than 25 percent. We conclude that inflated false negative rates are a major factor that undermine reproducibility. We introduce a new statistical inference algorithm based on a novel test statistic and show that it improves reproducibility without inflating false positive rates.

Download Full-text

A (Very) Few Concluding Thoughts

The Problem with Science ◽

10.1093/oso/9780197536537.003.0012 ◽

2021 ◽

pp. 261-270

Author(s):

R. Barker Bausell

Keyword(s):

False Positive ◽

Scientific Inquiry ◽

Scientific Practice ◽

Substantial Reduction ◽

The Future ◽

Scientific Results

In this chapter, educational recommendations for future scientists are suggested followed by possible scenarios that may characterize the future of the reproducibility initiatives discussed in previous chapters. One such scenario, while quite pessimistic, is not without historical precedent. Namely, that the entire movement may turn out to be little more than a publishing opportunity for methodologically oriented scientists—soon replaced by something else and forgotten by most—thereby allowing it to be reprised a few decades later under a different name by different academics. Alternately, and more optimistically, the procedural and statistical behaviors discussed here will receive an increased emphasis in the scientific curricula accompanied by a sea change in actual scientific practice and its culture—thereby producing a substantial reduction in the prevalence of avoidable false-positive scientific results. And indeed recent evidence does appear to suggest that the reproducibility initiatives instituted by the dedicated cadre of methodologically oriented scientists chronicled in this book have indeed begun the process of making substantive improvements in the quality and veracity of scientific inquiry itself.

Download Full-text

How reliable are scientific studies?

The British Journal of Psychiatry ◽

10.1192/bjp.bp.109.069849 ◽

2010 ◽

Vol 197 (4) ◽

pp. 257-258 ◽

Cited By ~ 27

Author(s):

Marcus R. Munafò ◽

Jonathan Flint

Keyword(s):

Empirical Evidence ◽

Publication Bias ◽

False Positive ◽

Scientific Research ◽

Substantial Proportion ◽

Number Of Factors ◽

Positive Results

SummaryThere is growing concern that a substantial proportion of scientific research may in fact be false. A number of factors have been proposed as contributing to the presence of a large number of false-positive results in the literature, one of which is publication bias. We discuss empirical evidence for these factors.

Download Full-text

False-positive Effect in the Radin Double-slit Experiment: HARKing is used by Radin et al. to Misrepresent the Advanced Meta-experimental Protocol used in Walleczek and von Stillfried (2019)

10.31234/osf.io/a2vkn ◽

2020 ◽

Author(s):

Jan Walleczek ◽

Von Stillfried

Keyword(s):

Statistical Analysis ◽

Research Design ◽

False Positive ◽

True Positive ◽

Experimental Protocol ◽

Positive Effects ◽

Test Strategy ◽

Post Hoc ◽

Positive Effect ◽

Double Slit

A general commentary by Walleczek and von Stillfried (2020) was recently published in Frontiers in Psychology. The present work provides an account of (i) the detailed research record and (ii) the main arguments behind the commentary for the purpose of full transparency and disclosure. For historical overview, Walleczek and von Stillfried (2019) had previously reported (i) the absence of any true-positive effects and (ii) the presence of one false-positive effect in a commissioned replication study of the Radin double-slit (DS) experiment on observer consciousness. In their subsequent misrepresentations, Radin et al. (2019, 2020) regrettably used the malpractice of undisclosed HARKing, i.e., undisclosed hypothesizing after the results are known. HARKing can increase greatly the risk of false-negative or false-positive conclusions. Specifically, Radin et al. (2019, 2020) deviated in two major ways from the pre-specified protocol for this commissioned study, which (i) was agreed to by Radin before data collection was started (Radin, 2011) and (ii) included data encryption to prevent the use of p-hacking and HARKing. First, Radin et al. (2019) violate the original research design by reporting a so-called “true-positive outcome of a secondary planned hypothesis”. Contrary to the claim by Radin et al. (2019, 2020), that hypothesis was not, however, part of the planned test strategy, but, instead, the associated statistical analysis – a chi-square test – was chosen by Radin sometime after the planned statistical analysis had been completed and the data unblinded. Second, Radin et al. (2019, 2020) violate the funder-approved research design in an additional way by falsely claiming that the newly developed protocol, i.e., the advanced meta-experimental protocol (AMP), implements a non-predictive test strategy when – in fact – the AMP-based test strategy is strictly predictive. Put simply, Radin et al. (2019, 2020) are mistaken that the funder-approved hypotheses posited the random occurrence of effects for the test categories in this replication experiment; instead, a different specific prediction was tested in each of the eight planned test categories, and true-positive effects were predicted to occur for only two (12.5%) of the 16 possible measurement outcomes of the eight planned single-test categories. Therefore, in the predictive single-testing regime, a statistical correction for non-predictive, i.e., random, multiple testing would not be appropriate and would thus violate the AMP-based strategy, which was implemented in the commissioned study based upon the planned outcome predictions as pre-specified in Radin (2011). Neither of these post-hoc changes by Radin et al. (on the basis of HARKing) were disclosed in Radin et al. (2019, 2020) and both these changes violate the funder-approved, original methodology agreed upon in Radin (2011) and pre-specified in the research contract. In summary, the present work reconfirms that – exactly as reported in Walleczek and von Stillfried (2019) – “the false-positive effect, which would be indistinguishable from the predicted true-positive effect, was significant at p = 0.021 (σ = −2.02; N = 1,250 test trials)” and “no statistically significant effects could be identified” in those two groups for which true-positives were predicted to occur. These observations are consistent also with an independent statistical reanalysis of the Radin DS-experiment by Tremblay (2019) and a replication attempt by Guerrer (2019). Tremblay reported significant false-positives in control groups and Guerrer found significant effects with post-hoc analyses only, but null results only when using the planned confirmatory analysis. As a general recommendation, the authors call for the implementation of advanced control-test strategies, including novel approaches from the metascience reform movement, for empirically detecting and preventing uncontrolled false-positive effects in parapsychological research.

Download Full-text

Questionable Research Practices (QRPs) and Their Devastating Scientific Effects

The Problem with Science ◽

10.1093/oso/9780197536537.003.0004 ◽

2021 ◽

pp. 56-90

Author(s):

R. Barker Bausell

Keyword(s):

Publication Bias ◽

False Positive ◽

Institutional Research ◽

Research Practices ◽

Questionable Research Practices ◽

Research Participants ◽

Positive Results ◽

Exhaustive List

The linchpin of both publication bias and irreproducibility involves an exhaustive list of more than a score of individually avoidable questionable research practices (QRPs) supplemented by 10 inane institutional research practices. While these untoward effects on the production of false-positive results are unsettling, a far more entertaining (in a masochistic sort of way) pair of now famous iconoclastic experiments conducted by Simmons, Nelson, and Simonsohn are presented in which, with the help of only a few well-chosen QRPs, research participants can actually become older after simply listening to a Beatle’s song. In addition, surveys designed to estimate the prevalence of these and other QRPs in the published literatures are also described.

Download Full-text

Disgust as the Source of False Positive Effects in the Measurement of Ophidiophobia

The Journal of Psychology ◽

10.1080/00223989709603523 ◽

1997 ◽

Vol 131 (4) ◽

pp. 371-382 ◽

Cited By ~ 23

Author(s):

Douglas M. Klieger ◽

Kimberly K. Siejak

Keyword(s):

False Positive ◽

Positive Effects

Download Full-text

Miocene Palynology of the Solimões Formation (Well 1-AS-105-AM), Western Brazilian Amazonia

10.5479/si.16803493 ◽

2021 ◽

Author(s):

Carlos D'Apolito ◽

Carlos Jaramillo ◽

Guy Harrington

Keyword(s):

Environmental Complexity ◽

Brazilian Amazonia ◽

Positive Effects ◽

The Family ◽

Different Types ◽

Pollen And Spores ◽

Palynological Study ◽

A Minor ◽

Early Late ◽

Minor Extent

During the Miocene, Andean tectonism caused the development of a vast wetland across western Amazonia. Palynological studies have been the main source of chronological and paleobotanical information for this region, including several boreholes in the Solimões Formation in western Brazilian Amazonia. Here, a palynological study of well core 1-AS-105-AM drilled in Tabatinga (Amazonas, Brazil) is presented: 91 new taxa are erected (25 spores and 66 pollen, including one new genus), 16 new combinations are proposed, and a list of botanical/ecological affinities is updated. We recorded 23,880 palynomorphs distributed in 401 different types. Among pollen and spores, 62 extant families and 99 extant genera were identified, which accounts for 39% and 30% of known botanical affinities to the family and genus level, respectively. Individual samples have pollen/spore counts with approximately 25% to 95% of known affinities to the family level. Pollen associations are sourced primarily from the wetland environments and to a minor extent from nonflooded forests. Palynological diversity analyses indicate an increase from the early to the middle/early late Miocene in core 1-AS-105-AM. Probable scenarios to explain this diversity increase include a higher degree of environmental complexity from the middle Miocene onwards, that is, a more heterogeneous riverscape, including broader extensions of nonflooded forests, as opposed to the swamp-dominated early Miocene. Additionally, the positive effects of the Miocene Climatic Optimum on plant richness could explain the increase in pollen richness. We posit hypotheses of forest diversification that can be tested as more botanical affinities are established along with a longer Miocene record.

Download Full-text

Evaluation of Clinical Usefulness of Monocyte Gating Using CD14 and CD64 for Detecting PNH Clone By Flow Cytometry

Blood ◽

10.1182/blood-2018-99-111468 ◽

2018 ◽

Vol 132 (Supplement 1) ◽

pp. 4947-4947

Author(s):

Woo Jae Kwoun ◽

Jeong-Yeal Ahn ◽

Ja Young Seo ◽

Jae Hoon Lee ◽

Hawk Kim ◽

...

Keyword(s):

Flow Cytometry ◽

False Positive ◽

False Positive Rate ◽

Cell Lineages ◽

Flow Cytometric ◽

Significant Difference ◽

Positive Rate ◽

A Minor ◽

Gating Method ◽

Pnh Clone

Abstract Introduction Flow cytometry is the gold standard in diagnosis of paroxysmal nocturnal hemoglobinuria (PNH) by detecting the absence of glycol-phosphatidyl inositol (GPI)-linked protein expression on red blood cell, granulocyte, and monocyte. The current assays are 4-color analyses of GPI-linked markers such as fluorescein-labeled proaerolysin (FLAER), CD24, CD14, CD59, and CD235a and the lineage markers for granulocyte (CD15) and monocyte (CD64) cells to detect PNH clones. We investigated the utility of CD14/CD64 monocyte gating by comparing with CD45/light scatter (LS) gating in PNH study of the patients with cytopenia and analyzed the types and cell lineages of PNH clone according to the disease groups. Method Total 138 cases were recruited in this study from July 2017 to February 2018 at Gachon University Gil Medical Center in Korea. Flow cytometric analysis was performed with EDTA blood by Beckman Coulter Cytomics FC500 cytometer using gating antibodies such as CD45, CD14, CD15, CD64, CD235a and GPI-linked antibodies such as CD59, CD14, CD24, FLAER. The proportion of monocyte was estimated by CD14/CD64 gating and compared with those using CD45/LS gating. The type of PNH clone was defined according to the size of PNH population. A PNH clone is defined as a PNH population exceeding 1% of the gated cells, a minor PNH clone as between 0.1 and 1%, and rare cells with GPI-deficiency defined as a PNH population less than 0.1%. The types and cell lineages of the PNH clone were analyzed according to the disease groups. Statistical analysis was done using SPSS 17.0 and MedCalc 15.2, and P<0.05 was considered statistically significant. Results Of the 138 cases, PNH clone was detected with 27 cases including 15 cases with a PNH clone and 12 cases with a minor PNH clone. PNH clone was observed in all 8 cases (100%) of PNH cases. Two PNH clone and 4 minor PNH clones were identified in 6 of 16 cases (38%) of acute myeloid leukemia. In 6 of 21 cases (29%) of aplastic anemia (AA) show 5 PNH clones and 1 minor PNH clone. In 5 of 78 cases (6%) of cytopenia(s) only minor PNH clone was observed. The CD45 plus LS gating in monocyte represents a sensitivity of 100%, a specificity of 40.2%, and 60% (73/89) false positive rate in detecting of PNH clone. McNemar test indicates a significant difference between CD14/CD64 and CD45/LS gating methods (P = 0.00). The Bland-Altman plot of monocyte proportion between the two gating methods revealed that CD45/LS gating method was tended to underestimate monocyte proportion and the larger the number of monocytes, the greater the difference in number of monocyte between the two gating methods. The trend of the size of PNH clone in each cell lineage was confirmed by follow-up in three patients with PNH clone. Two patients showed more abrupt changes of PNH clone in monocytes than in red blood cells or in granulocytes. However, in the other patient, a significant trend found in only PNH clone of RBC. Conclusion The types of PNH clone observed in each disease group showed different characteristics. PNH clone was identified in 5 of 6 PNH population detected AA cases, whereas minor PNH clones were observed in all 5 PNH population detected cytopenia cases. Four minor PNH clones and two PNH clones were discovered in 6 PNH population detected AML cases. However, all observed PNH clones observed in AML cases were monocyte. Monocyte gating with CD45 and LS not only underestimated the proportion of monocyte in total WBCs but also showed a high false positive rate of 60% in detecting PNH clone. In contrast, the CD14/CD64 gating method can accurately measure the monocyte population and avoid making a false positive measurement of PNH clone. In addition, in monitoring PNH patients, the measurement of the PNH clone in monocyte tends to be more sensitive to change of PNH clone size than those measured in RBC or granulocytes. In conclusion, the gating using CD14 and CD64 is significantly valuable in flow cytometric diagnosis for detecting the PNH clone in diagnosing new patents as well as monitoring of PNH patients. Disclosures No relevant conflicts of interest to declare.

Download Full-text

Drugs with anti-inflammatory effects to improve outcome of traumatic brain injury: a meta-analysis

Scientific Reports ◽

10.1038/s41598-020-73227-5 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Marieke Begemann ◽

Mikela Leon ◽

Harm Jan van der Horn ◽

Joukje van der Naalt ◽

Iris Sommer

Keyword(s):

Traumatic Brain Injury ◽

Brain Injury ◽

Publication Bias ◽

Meta Analysis ◽

Positive Effects ◽

Younger Age ◽

Negative Findings ◽

Anti Inflammatory ◽

Stratified Analysis ◽

Positive Effect

Abstract Outcome after traumatic brain injury (TBI) varies largely and degree of immune activation is an important determinant factor. This meta-analysis evaluates the efficacy of drugs with anti-inflammatory properties in improving neurological and functional outcome. The systematic search following PRISMA guidelines resulted in 15 randomized placebo-controlled trials (3734 patients), evaluating progesterone, erythropoietin and cyclosporine. The meta-analysis (15 studies) showed that TBI patients receiving a drug with anti-inflammatory effects had a higher chance of a favorable outcome compared to those receiving placebo (RR = 1.15; 95% CI 1.01–1.32, p = 0.041). However, publication bias was indicated together with heterogeneity (I2 = 76.59%). Stratified analysis showed that positive effects were mainly observed in patients receiving this treatment within 8 h after injury. Subanalyses by drug type showed efficacy for progesterone (8 studies, RR 1.22; 95% CI 1.01–1.47, p = 0.040), again heterogeneity was high (I2 = 62.92%) and publication bias could not be ruled out. The positive effect of progesterone covaried with younger age and was mainly observed when administered intramuscularly and not intravenously. Erythropoietin (4 studies, RR 1.20; p = 0.110; I2 = 76.59%) and cyclosporine (3 studies, RR 0.75; p = 0.189, I2 = 0%) did not show favorable significant effects. While negative findings for erythropoietin may reflect insufficient power, cyclosporine did not show better outcome at all. Current results do not allow firm conclusions on the efficacy of drugs with anti-inflammatory properties in TBI patients. Included trials showed heterogeneity in methodological and sample parameters. At present, only progesterone showed positive results and early administration via intramuscular administration may be most effective, especially in young people. The anti-inflammatory component of progesterone is relatively weak and other mechanisms than mitigating overall immune response may be more important.

Download Full-text