Privacy Preserving Anonymity for Periodical Releases of Spontaneous ADE Reporting Data: Algorithm Development and Validation (Preprint)
BACKGROUND Increasingly, spontaneous reporting systems (SRS) have been established to collect adverse drug events to foster the research of ADR detection and analysis. SRS data contains personal information and so its publication requires data anonymization to prevent the disclosure of individual privacy. We previously have proposed a privacy model called MS(k, θ*)-bounding and the associated MS-Anonymization algorithm to fulfill the anonymization of SRS data. In the real world, the SRS data usually are released periodically, e.g., FAERS, to accommodate newly collected adverse drug events. Different anonymized releases of SRS data available to the attacker may thwart our single-release-focus method, i.e., MS(k, θ*)-bounding. OBJECTIVE We investigate the privacy threat caused by periodical releases of SRS data and propose anonymization methods to prevent the disclosure of personal privacy information while maintain the utility of published data. METHODS We identify some potential attacks on periodical releases of SRS data, namely BFL-attacks, that are mainly caused by follow-up cases. We present a new privacy model called PPMS(k, θ*)-bounding, and propose the associated PPMS-Anonymization algorithm along with two improvements, PPMS+-Anonymization and PPMS++-Anonymization. Empirical evaluations were performed using 32 selected FAERS quarter datasets, from 2004Q1 to 2011Q4. The performance of the proposed three versions of PPMS-Anonymization were inspected against MS-Anonymization from some aspects, including data distortion, measured by Normalized Information Loss (NIS); privacy risk of anonymized data, measured by Dangerous Identity Ratio (DIR) and Dangerous Sensitivity Ratio (DSR); and data utility, measured by bias of signal counting and strength (PRR). RESULTS The results show that our new method can prevent privacy disclosure for periodical releases of SRS data with reasonable sacrifice of data utility and acceptable deviation of the strength of ADR signals. The best version of PPMS-Anonymization, PPMS++-Anonymization, achieves nearly the same quality as MS-Anonymization both in privacy protection and data utility. CONCLUSIONS The proposed PPMS(k, θ*)-bounding model and PPMS-Anonymization algorithm are effective in anonymizing SRS datasets in the periodical data publishing scenario, preventing the series of releases from the disclosure of personal sensitive information caused by BFL-attacks while maintaining the data utility for ADR signal detection.