A Reproducible MEEG Data Analysis Workflow with conda, Snakemake, and R Markdown
This tutorial is devoted to computational reproducibility, which is an ability to recreate the reported results using the original data and code. Previous studies show that this is impossible for a large percentage of studies with published data and code. We find this situation to be a serious problem for science in general and for the cognitive neuroscience in particular. In this tutorial, we focused on three sources of irreproducibility: differences in software environment, utilization of out-of-date derivative files, and human errors during manual copying of figures, tables, and numbers to the manuscript. We describe three tools that solve these issues: conda, Snakemake, and R Markdown, respectively. Together, they form an effective toolkit that can help researchers achieve reproducibility of their analyses. We demonstrate an application of this toolkit by reimplementing a published data analysis pipeline applied to an open MEEG dataset. Main strengths and weaknesses of our and other approaches are discussed.