Replication of Data Analyses
The metadata that describe how scientific data are created and analyzed are typically limited to a general description of data sources, software used, and statistical tests applied and are presented in narrative form in the methods section of a scientific paper or a data set description. Recognizing that such narratives are usually inadequate to support reproduction of the analysis of the original work, a growing number of journals now require that authors also publish their data. However, finer-scale metadata that describe exactly how individual items of data were created and transformed and the processes by which this was done are rarely provided, even though such metadata have great potential to improve data set reliability. This chapter focuses on the detailed process metadata, called “data provenance,” required to ensure reproducibility of analyses and reliable re-use of the data.