The Reproducibility Crisis
Attempts to replicate reported studies often fail because the research relied on data mining—searching through data for patterns without any pre-specified, coherent theories. The perils of data mining can be exacerbated by data torturing—slicing, dicing, and otherwise mangling data to create patterns. If there is no underlying reason for a pattern, it is likely to disappear when someone attempts to replicate the study. Big data and powerful computers are part of the problem, not the solution, in that they can easily identify an essentially unlimited number of phantom patterns and relationships, which vanish when confronted with fresh data. If a researcher will benefit from a claim, it is likely to be biased. If a claim sounds implausible, it is probably misleading. If the statistical evidence sounds too good to be true, it probably is.