Emerging Privacy Issues in Times of Open Science
The open science movement has taken up the important challenge to increase transparency of statistical analyses, to facilitate reproducibility of studies, and to enhance reusability of data sets. To counter the replication crisis in the psychological and related sciences, the movement also urges researchers to publish their primary data sets alongside their articles. While such data publications represent a desirable improvement in terms of transparency and are also helpful for future research (e.g., subsequent meta-analyses or replication studies), we argue that such a procedure can worsen existing privacy issues that are insufficiently considered so far in this context. Recent advances in de-anonymization and re-identification techniques render privacy protection increasingly difficult, as prevalent anonymization mechanisms for handling participants' data might no longer be adequate. When exploiting publicly shared primary data sets, data from multiple studies can be linked with contextual data and eventually, participants can be de-anonymized. Such attacks can either re-identify specific individuals of interest, or they can be used to de-anonymize entire participant cohorts. The threat of de-anonymization attacks can endanger the perceived confidentiality of responses by participants, and ultimately, lower the overall trust of potential participants into the research process due to privacy concerns.