scholarly journals An Axiomatic View of Statistical Privacy and Utility

Author(s):  
Daniel Kifer ◽  
Bing-Rong Lin

"Privacy" and "utility" are words that frequently appear in the literature on statistical privacy. But what do these words really mean? In recent years, many problems with intuitive notions of privacy and utility have been uncovered. Thus more formal notions of privacy and utility, which are amenable to mathematical analysis, are needed. In this paper we present our initial work on an axiomatization of privacy and utility. We present two privacy axioms which describe how privacy is affected by post-processing data and by randomly selecting a privacy mechanism. We present three axioms for utility measures which also describe how measured utility is affected by post-processing. Our analysis of these axioms yields new insights into the construction of privacy definitions and utility measures. In particular, we characterize the class of relaxations of differential privacy that can be obtained by changing constraints on probabilities; we show that the resulting constraints must be formed from concave functions. We also present several classes of utility metrics satisfying our axioms and explicitly show that measures of utility borrowed from statistics can lead to utility paradoxes when applied to statistical privacy. Finally, we show that the outputs of differentially private algorithms are best interpreted in terms of graphs or likelihood functions rather than query answers or synthetic data.

2020 ◽  
Vol 223 (3) ◽  
pp. 1565-1583
Author(s):  
Hoël Seillé ◽  
Gerhard Visser

SUMMARY Bayesian inversion of magnetotelluric (MT) data is a powerful but computationally expensive approach to estimate the subsurface electrical conductivity distribution and associated uncertainty. Approximating the Earth subsurface with 1-D physics considerably speeds-up calculation of the forward problem, making the Bayesian approach tractable, but can lead to biased results when the assumption is violated. We propose a methodology to quantitatively compensate for the bias caused by the 1-D Earth assumption within a 1-D trans-dimensional Markov chain Monte Carlo sampler. Our approach determines site-specific likelihood functions which are calculated using a dimensionality discrepancy error model derived by a machine learning algorithm trained on a set of synthetic 3-D conductivity training images. This is achieved by exploiting known geometrical dimensional properties of the MT phase tensor. A complex synthetic model which mimics a sedimentary basin environment is used to illustrate the ability of our workflow to reliably estimate uncertainty in the inversion results, even in presence of strong 2-D and 3-D effects. Using this dimensionality discrepancy error model we demonstrate that on this synthetic data set the use of our workflow performs better in 80 per cent of the cases compared to the existing practice of using constant errors. Finally, our workflow is benchmarked against real data acquired in Queensland, Australia, and shows its ability to detect the depth to basement accurately.


2018 ◽  
Vol 8 (11) ◽  
pp. 2081 ◽  
Author(s):  
Hai Liu ◽  
Zhenqiang Wu ◽  
Yihui Zhou ◽  
Changgen Peng ◽  
Feng Tian ◽  
...  

Differential privacy mechanisms can offer a trade-off between privacy and utility by using privacy metrics and utility metrics. The trade-off of differential privacy shows that one thing increases and another decreases in terms of privacy metrics and utility metrics. However, there is no unified trade-off measurement of differential privacy mechanisms. To this end, we proposed the definition of privacy-preserving monotonicity of differential privacy, which measured the trade-off between privacy and utility. First, to formulate the trade-off, we presented the definition of privacy-preserving monotonicity based on computational indistinguishability. Second, building on privacy metrics of the expected estimation error and entropy, we theoretically and numerically showed privacy-preserving monotonicity of Laplace mechanism, Gaussian mechanism, exponential mechanism, and randomized response mechanism. In addition, we also theoretically and numerically analyzed the utility monotonicity of these several differential privacy mechanisms based on utility metrics of modulus of characteristic function and variant of normalized entropy. Third, according to the privacy-preserving monotonicity of differential privacy, we presented a method to seek trade-off under a semi-honest model and analyzed a unilateral trade-off under a rational model. Therefore, privacy-preserving monotonicity can be used as a criterion to evaluate the trade-off between privacy and utility in differential privacy mechanisms under the semi-honest model. However, privacy-preserving monotonicity results in a unilateral trade-off of the rational model, which can lead to severe consequences.


Author(s):  
Joshua Snoke ◽  
Gillian M. Raab ◽  
Beata Nowok ◽  
Chris Dibben ◽  
Aleksandra Slavkovic

Author(s):  
Ziqi Zhang ◽  
Chao Yan ◽  
Thomas A Lasko ◽  
Jimeng Sun ◽  
Bradley A Malin

Abstract Objective Simulating electronic health record data offers an opportunity to resolve the tension between data sharing and patient privacy. Recent techniques based on generative adversarial networks have shown promise but neglect the temporal aspect of healthcare. We introduce a generative framework for simulating the trajectory of patients’ diagnoses and measures to evaluate utility and privacy. Materials and Methods The framework simulates date-stamped diagnosis sequences based on a 2-stage process that 1) sequentially extracts temporal patterns from clinical visits and 2) generates synthetic data conditioned on the learned patterns. We designed 3 utility measures to characterize the extent to which the framework maintains feature correlations and temporal patterns in clinical events. We evaluated the framework with billing codes, represented as phenome-wide association study codes (phecodes), from over 500 000 Vanderbilt University Medical Center electronic health records. We further assessed the privacy risks based on membership inference and attribute disclosure attacks. Results The simulated temporal sequences exhibited similar characteristics to real sequences on the utility measures. Notably, diagnosis prediction models based on real versus synthetic temporal data exhibited an average relative difference in area under the ROC curve of 1.6% with standard deviation of 3.8% for 1276 phecodes. Additionally, the relative difference in the mean occurrence age and time between visits were 4.9% and 4.2%, respectively. The privacy risks in synthetic data, with respect to the membership and attribute inference were negligible. Conclusion This investigation indicates that temporal diagnosis code sequences can be simulated in a manner that provides utility and respects privacy.


Author(s):  
Sijie Jiang ◽  
Junguo Liao ◽  
Shaobo Zhang ◽  
Gengming Zhu ◽  
Su Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document