An Axiomatic View of Statistical Privacy and Utility

"Privacy" and "utility" are words that frequently appear in the literature on statistical privacy. But what do these words really mean? In recent years, many problems with intuitive notions of privacy and utility have been uncovered. Thus more formal notions of privacy and utility, which are amenable to mathematical analysis, are needed. In this paper we present our initial work on an axiomatization of privacy and utility. We present two privacy axioms which describe how privacy is affected by post-processing data and by randomly selecting a privacy mechanism. We present three axioms for utility measures which also describe how measured utility is affected by post-processing. Our analysis of these axioms yields new insights into the construction of privacy definitions and utility measures. In particular, we characterize the class of relaxations of differential privacy that can be obtained by changing constraints on probabilities; we show that the resulting constraints must be formed from concave functions. We also present several classes of utility metrics satisfying our axioms and explicitly show that measures of utility borrowed from statistics can lead to utility paradoxes when applied to statistical privacy. Finally, we show that the outputs of differentially private algorithms are best interpreted in terms of graphs or likelihood functions rather than query answers or synthetic data.

Download Full-text

R2DP: A Universal and Automated Approach to Optimizing the Randomization Mechanisms of Differential Privacy for Utility Metrics with No Known Optimal Distributions

Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security ◽

10.1145/3372297.3417259 ◽

2020 ◽

Author(s):

Meisam Mohammady ◽

Shangyu Xie ◽

Yuan Hong ◽

Mengyuan Zhang ◽

Lingyu Wang ◽

...

Keyword(s):

Differential Privacy ◽

Utility Metrics

Download Full-text

Utility Analysis of Horizontally Merged Multi-Party Synthetic Data with Differential Privacy

2020 International Symposium on Networks, Computers and Communications (ISNCC) ◽

10.1109/isncc49221.2020.9297254 ◽

2020 ◽

Author(s):

Bingyue Su ◽

Fang Liu

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Utility Analysis

Download Full-text

Integrating Differential Privacy in the Statistical Disclosure Control Tool-Kit for Synthetic Data Production

Privacy in Statistical Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-57521-2_19 ◽

2020 ◽

pp. 271-280

Author(s):

Natalie Shlomo

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Data Production ◽

Statistical Disclosure ◽

Control Tool

Download Full-text

Bayesian inversion of magnetotelluric data considering dimensionality discrepancies

Geophysical Journal International ◽

10.1093/gji/ggaa391 ◽

2020 ◽

Vol 223 (3) ◽

pp. 1565-1583

Author(s):

Hoël Seillé ◽

Gerhard Visser

Keyword(s):

Learning Algorithm ◽

Synthetic Data ◽

Real Data ◽

Error Model ◽

Bayesian Inversion ◽

Magnetotelluric Data ◽

Data Set ◽

Likelihood Functions ◽

Training Images ◽

Phase Tensor

SUMMARY Bayesian inversion of magnetotelluric (MT) data is a powerful but computationally expensive approach to estimate the subsurface electrical conductivity distribution and associated uncertainty. Approximating the Earth subsurface with 1-D physics considerably speeds-up calculation of the forward problem, making the Bayesian approach tractable, but can lead to biased results when the assumption is violated. We propose a methodology to quantitatively compensate for the bias caused by the 1-D Earth assumption within a 1-D trans-dimensional Markov chain Monte Carlo sampler. Our approach determines site-specific likelihood functions which are calculated using a dimensionality discrepancy error model derived by a machine learning algorithm trained on a set of synthetic 3-D conductivity training images. This is achieved by exploiting known geometrical dimensional properties of the MT phase tensor. A complex synthetic model which mimics a sedimentary basin environment is used to illustrate the ability of our workflow to reliably estimate uncertainty in the inversion results, even in presence of strong 2-D and 3-D effects. Using this dimensionality discrepancy error model we demonstrate that on this synthetic data set the use of our workflow performs better in 80 per cent of the cases compared to the existing practice of using constant errors. Finally, our workflow is benchmarked against real data acquired in Queensland, Australia, and shows its ability to detect the depth to basement accurately.

Download Full-text

Privacy-Preserving Monotonicity of Differential Privacy Mechanisms

Applied Sciences ◽

10.3390/app8112081 ◽

2018 ◽

Vol 8 (11) ◽

pp. 2081 ◽

Cited By ~ 1

Author(s):

Hai Liu ◽

Zhenqiang Wu ◽

Yihui Zhou ◽

Changgen Peng ◽

Feng Tian ◽

...

Keyword(s):

Differential Privacy ◽

Estimation Error ◽

Privacy Preserving ◽

Randomized Response ◽

Response Mechanism ◽

Rational Model ◽

Trade Off ◽

Definition Of ◽

Utility Metrics ◽

Monotonicity Results

Differential privacy mechanisms can offer a trade-off between privacy and utility by using privacy metrics and utility metrics. The trade-off of differential privacy shows that one thing increases and another decreases in terms of privacy metrics and utility metrics. However, there is no unified trade-off measurement of differential privacy mechanisms. To this end, we proposed the definition of privacy-preserving monotonicity of differential privacy, which measured the trade-off between privacy and utility. First, to formulate the trade-off, we presented the definition of privacy-preserving monotonicity based on computational indistinguishability. Second, building on privacy metrics of the expected estimation error and entropy, we theoretically and numerically showed privacy-preserving monotonicity of Laplace mechanism, Gaussian mechanism, exponential mechanism, and randomized response mechanism. In addition, we also theoretically and numerically analyzed the utility monotonicity of these several differential privacy mechanisms based on utility metrics of modulus of characteristic function and variant of normalized entropy. Third, according to the privacy-preserving monotonicity of differential privacy, we presented a method to seek trade-off under a semi-honest model and analyzed a unilateral trade-off under a rational model. Therefore, privacy-preserving monotonicity can be used as a criterion to evaluate the trade-off between privacy and utility in differential privacy mechanisms under the semi-honest model. However, privacy-preserving monotonicity results in a unilateral trade-off of the rational model, which can lead to severe consequences.

Download Full-text

General and specific utility measures for synthetic data

Journal of the Royal Statistical Society Series A (Statistics in Society) ◽

10.1111/rssa.12358 ◽

2018 ◽

Vol 181 (3) ◽

pp. 663-688 ◽

Cited By ~ 11

Author(s):

Joshua Snoke ◽

Gillian M. Raab ◽

Beata Nowok ◽

Chris Dibben ◽

Aleksandra Slavkovic

Keyword(s):

Synthetic Data ◽

Utility Measures

Download Full-text

HLYWD: a program for post-processing data files to generate selected plots or time-lapse graphics

10.2172/5384991 ◽

1980 ◽

Author(s):

J.K. Jr. Munro

Keyword(s):

Time Lapse ◽

Post Processing ◽

Processing Data ◽

Data Files

Download Full-text

Private FL-GAN: Differential Privacy Synthetic Data Generation Based on Federated Learning

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp40776.2020.9054559 ◽

2020 ◽

Cited By ~ 1

Author(s):

Bangzhou Xin ◽

Wei Yang ◽

Yangyang Geng ◽

Sheng Chen ◽

Shaowei Wang ◽

...

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation

Download Full-text

SynTEG: a framework for temporal structured electronic health data simulation

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa262 ◽

2020 ◽

Author(s):

Ziqi Zhang ◽

Chao Yan ◽

Thomas A Lasko ◽

Jimeng Sun ◽

Bradley A Malin

Keyword(s):

Prediction Models ◽

Medical Center ◽

Synthetic Data ◽

Temporal Patterns ◽

Relative Difference ◽

Generative Adversarial Networks ◽

Electronic Health Record Data ◽

Electronic Health ◽

Privacy Risks ◽

Utility Measures

Abstract Objective Simulating electronic health record data offers an opportunity to resolve the tension between data sharing and patient privacy. Recent techniques based on generative adversarial networks have shown promise but neglect the temporal aspect of healthcare. We introduce a generative framework for simulating the trajectory of patients’ diagnoses and measures to evaluate utility and privacy. Materials and Methods The framework simulates date-stamped diagnosis sequences based on a 2-stage process that 1) sequentially extracts temporal patterns from clinical visits and 2) generates synthetic data conditioned on the learned patterns. We designed 3 utility measures to characterize the extent to which the framework maintains feature correlations and temporal patterns in clinical events. We evaluated the framework with billing codes, represented as phenome-wide association study codes (phecodes), from over 500 000 Vanderbilt University Medical Center electronic health records. We further assessed the privacy risks based on membership inference and attribute disclosure attacks. Results The simulated temporal sequences exhibited similar characteristics to real sequences on the utility measures. Notably, diagnosis prediction models based on real versus synthetic temporal data exhibited an average relative difference in area under the ROC curve of 1.6% with standard deviation of 3.8% for 1276 phecodes. Additionally, the relative difference in the mean occurrence age and time between visits were 4.9% and 4.2%, respectively. The privacy risks in synthetic data, with respect to the membership and attribute inference were negligible. Conclusion This investigation indicates that temporal diagnosis code sequences can be simulated in a manner that provides utility and respects privacy.

Download Full-text

A Post-processing Trajectory Publication Method Under Differential Privacy

Lecture Notes in Computer Science - Smart Computing and Communication ◽

10.1007/978-3-030-74717-6_1 ◽

2021 ◽

pp. 1-8

Author(s):

Sijie Jiang ◽

Junguo Liao ◽

Shaobo Zhang ◽

Gengming Zhu ◽

Su Wang ◽

...

Keyword(s):

Differential Privacy ◽

Post Processing

Download Full-text