Investigating Statistical Privacy Frameworks from the Perspective of Hypothesis Testing

Abstract Over the last decade, differential privacy (DP) has emerged as the gold standard of a rigorous and provable privacy framework. However, there are very few practical guidelines on how to apply differential privacy in practice, and a key challenge is how to set an appropriate value for the privacy parameter ɛ. In this work, we employ a statistical tool called hypothesis testing for discovering useful and interpretable guidelines for the state-of-the-art privacy-preserving frameworks. We formalize and implement hypothesis testing in terms of an adversary’s capability to infer mutually exclusive sensitive information about the input data (such as whether an individual has participated or not) from the output of the privacy-preserving mechanism. We quantify the success of the hypothesis testing using the precision- recall-relation, which provides an interpretable and natural guideline for practitioners and researchers on selecting ɛ. Our key results include a quantitative analysis of how hypothesis testing can guide the choice of the privacy parameter ɛ in an interpretable manner for a differentially private mechanism and its variants. Importantly, our findings show that an adversary’s auxiliary information - in the form of prior distribution of the database and correlation across records and time - indeed influences the proper choice of ɛ. Finally, we also show how the perspective of hypothesis testing can provide useful insights on the relationships among a broad range of privacy frameworks including differential privacy, Pufferfish privacy, Blowfish privacy, dependent differential privacy, inferential privacy, membership privacy and mutual-information based differential privacy.

Download Full-text

An Analysis of Differential Privacy Research in Location and Trajectory Data

10.21203/rs.3.rs-94765/v1 ◽

2020 ◽

Author(s):

Fatima Zahra Errounda ◽

Yan Liu

Keyword(s):

Location Privacy ◽

Differential Privacy ◽

State Of The Art ◽

Original Data ◽

Privacy Preserving ◽

The State ◽

Trajectory Data ◽

Powerful Technique ◽

Location Data ◽

Single User

Abstract Location and trajectory data are routinely collected to generate valuable knowledge about users' pattern behavior. However, releasing location data may jeopardize the privacy of the involved individuals. Differential privacy is a powerful technique that prevents an adversary from inferring the presence or absence of an individual in the original data solely based on the observed data. The first challenge in applying differential privacy in location is that a it usually involves a single user. This shifts the adversary's target to the user's locations instead of presence or absence in the original data. The second challenge is that the inherent correlation between location data, due to people's movement regularity and predictability, gives the adversary an advantage in inferring information about individuals. In this paper, we review the differentially private approaches to tackle these challenges. Our goal is to help newcomers to the field to better understand the state-of-the art by providing a research map that highlights the different challenges in designing differentially private frameworks that tackle the characteristics of location data. We find that in protecting an individual's location privacy, the attention of differential privacy mechanisms shifts to preventing the adversary from inferring the original location based on the observed one. Moreover, we find that the privacy-preserving mechanisms make use of the predictability and regularity of users' movements to design and protect the users' privacy in trajectory data. Finally, we explore how well the presented frameworks succeed in protecting users' locations and trajectories against well-known privacy attacks.

Download Full-text

Differential Privacy for Stackelberg Games

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/481 ◽

2020 ◽

Cited By ~ 2

Author(s):

Ferdinando Fioretto ◽

Lesia Mitridati ◽

Pascal Van Hentenryck

Keyword(s):

Electricity Market ◽

Original Problem ◽

Differential Privacy ◽

Stackelberg Game ◽

Privacy Preserving ◽

Coordination Mechanism ◽

Stackelberg Games ◽

Sensitive Information ◽

Near Optimality

This paper introduces a differentially private (DP) mechanism to protect the information exchanged during the coordination of sequential and interdependent markets. This coordination represents a classic Stackelberg game and relies on the exchange of sensitive information between the system agents. The paper is motivated by the observation that the perturbation introduced by traditional DP mechanisms fundamentally changes the underlying optimization problem and even leads to unsatisfiable instances. To remedy such limitation, the paper introduces the Privacy-Preserving Stackelberg Mechanism (PPSM), a framework that enforces the notions of feasibility and fidelity (i.e. near-optimality) of the privacy-preserving information to the original problem objective. PPSM complies with the notion of differential privacy and ensures that the outcomes of the privacy-preserving coordination mechanism are close-to-optimality for each agent. Experimental results on several gas and electricity market benchmarks based on a real case study demonstrate the effectiveness of the proposed approach. A full version of this paper [Fioretto et al., 2020b] contains complete proofs and additional discussion on the motivating application.

Download Full-text

A Defense Framework for Privacy Risks in Remote Machine Learning Service

Security and Communication Networks ◽

10.1155/2021/9924684 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Yang Bai ◽

Yu Li ◽

Mingchuang Xie ◽

Mingyu Fan

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Original Data ◽

Privacy Preserving ◽

Training Data ◽

Sensitive Information ◽

Learning Approaches ◽

Local Data ◽

Sensitive Data ◽

Privacy Risks

In recent years, machine learning approaches have been widely adopted for many applications, including classification. Machine learning models deal with collective sensitive data usually trained in a remote public cloud server, for instance, machine learning as a service (MLaaS) system. In this scene, users upload their local data and utilize the computation capability to train models, or users directly access models trained by MLaaS. Unfortunately, recent works reveal that the curious server (that trains the model with users’ sensitive local data and is curious to know the information about individuals) and the malicious MLaaS user (who abused to query from the MLaaS system) will cause privacy risks. The adversarial method as one of typical mitigation has been studied by several recent works. However, most of them focus on the privacy-preserving against the malicious user; in other words, they commonly consider the data owner and the model provider as one role. Under this assumption, the privacy leakage risks from the curious server are neglected. Differential privacy methods can defend against privacy threats from both the curious sever and the malicious MLaaS user by directly adding noise to the training data. Nonetheless, the differential privacy method will decrease the classification accuracy of the target model heavily. In this work, we propose a generic privacy-preserving framework based on the adversarial method to defend both the curious server and the malicious MLaaS user. The framework can adapt with several adversarial algorithms to generate adversarial examples directly with data owners’ original data. By doing so, sensitive information about the original data is hidden. Then, we explore the constraint conditions of this framework which help us to find the balance between privacy protection and the model utility. The experiments’ results show that our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive data from direct content exposed attacks. In addition, our method can achieve better privacy and utility balance compared to the existing method.

Download Full-text

Kamino

Proceedings of the VLDB Endowment ◽

10.14778/3467861.3467876 ◽

2021 ◽

Vol 14 (10) ◽

pp. 1886-1899

Author(s):

Chang Ge ◽

Shubhankar Mohapatra ◽

Xi He ◽

Ihab F. Ilyas

Keyword(s):

Differential Privacy ◽

State Of The Art ◽

Synthesis Methods ◽

Sensitive Information ◽

Synthesis System ◽

Data Synthesis ◽

Generative Process ◽

Original Dataset ◽

Private Data ◽

Data Owner

Organizations are increasingly relying on data to support decisions. When data contains private and sensitive information, the data owner often desires to publish a synthetic database instance that is similarly useful as the true data, while ensuring the privacy of individual data records. Existing differentially private data synthesis methods aim to generate useful data based on applications, but they fail in keeping one of the most fundamental data properties of the structured data --- the underlying correlations and dependencies among tuples and attributes (i.e., the structure of the data). This structure is often expressed as integrity and schema constraints, or with a probabilistic generative process. As a result, the synthesized data is not useful for any downstream tasks that require this structure to be preserved. This work presents KAMINO, a data synthesis system to ensure differential privacy and to preserve the structure and correlations present in the original dataset. KAMINO takes as input of a database instance, along with its schema (including integrity constraints), and produces a synthetic database instance with differential privacy and structure preservation guarantees. We empirically show that while preserving the structure of the data, KAMINO achieves comparable and even better usefulness in applications of training classification models and answering marginal queries than the state-of-the-art methods of differentially private data synthesis.

Download Full-text

Privacy-preserving healthcare informatics: a review

ITM Web of Conferences ◽

10.1051/itmconf/20213604005 ◽

2021 ◽

Vol 36 ◽

pp. 04005

Author(s):

Kah Meng Chong

Keyword(s):

Data Privacy ◽

Differential Privacy ◽

State Of The Art ◽

Future Research ◽

Sensitive Information ◽

Research Directions ◽

Data Anonymization ◽

Healthcare Data ◽

Comprehensive Survey ◽

Future Research Directions

Electronic Health Record (EHR) is the key to an efficient healthcare service delivery system. The publication of healthcare data is highly beneficial to healthcare industries and government institutions to support a variety of medical and census research. However, healthcare data contains sensitive information of patients and the publication of such data could lead to unintended privacy disclosures. In this paper, we present a comprehensive survey of the state-of-the-art privacy-enhancing methods that ensure a secure healthcare data sharing environment. We focus on the recently proposed schemes based on data anonymization and differential privacy approaches in the protection of healthcare data privacy. We highlight the strengths and limitations of the two approaches and discussed some promising future research directions in this area.

Download Full-text

Privacy-Preserving Hybrid K-Means

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2018040101 ◽

2018 ◽

Vol 14 (2) ◽

pp. 1-17 ◽

Cited By ~ 4

Author(s):

Zhiqiang Gao ◽

Yixiao Sun ◽

Xiaolong Cui ◽

Yutao Wang ◽

Yanyu Duan ◽

...

Keyword(s):

Differential Privacy ◽

State Of The Art ◽

Privacy Preserving ◽

Local Optimum ◽

Massive Data ◽

Data Sets ◽

Second Stage ◽

Private Data ◽

Privacy Budget ◽

Selection Of

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.

Download Full-text

A User-Centric Mechanism for Sequentially Releasing Graph Datasets under Blowfish Privacy

ACM Transactions on Internet Technology ◽

10.1145/3431501 ◽

2021 ◽

Vol 21 (1) ◽

pp. 1-25

Author(s):

Elie Chicha ◽

Bechara Al Bouna ◽

Mohamed Nassar ◽

Richard Chbeir ◽

Ramzi A. Haraty ◽

...

Keyword(s):

Discrete Time ◽

Differential Privacy ◽

State Of The Art ◽

Graph Model ◽

Privacy Preserving ◽

Direct Application ◽

Time Dimension ◽

User Centric ◽

Correlation Attacks ◽

Do So

In this article, we present a privacy-preserving technique for user-centric multi-release graphs. Our technique consists of sequentially releasing anonymized versions of these graphs under Blowfish Privacy. To do so, we introduce a graph model that is augmented with a time dimension and sampled at discrete time steps. We show that the direct application of state-of-the-art privacy-preserving Differential Private techniques is weak against background knowledge attacker models. We present different scenarios where randomizing separate releases independently is vulnerable to correlation attacks. Our method is inspired by Differential Privacy (DP) and its extension Blowfish Privacy (BP). To validate it, we show its effectiveness as well as its utility by experimental simulations.

Download Full-text

A Neuron Noise-Injection Technique for Privacy Preserving Deep Neural Networks

Open Computer Science ◽

10.1515/comp-2020-0133 ◽

2020 ◽

Vol 10 (1) ◽

pp. 137-152

Author(s):

Tosin A. Adesuyi ◽

Byeong Man Kim

Keyword(s):

Differential Privacy ◽

Real Life ◽

Privacy Preserving ◽

Training Dataset ◽

Injection Technique ◽

Sensitive Information ◽

Contribution Ratio ◽

Noise Injection ◽

Real World Datasets ◽

The Right

AbstractData is the key to information mining that unveils hidden knowledge. The ability to revealed knowledge relies on the extractable features of a dataset and likewise the depth of the mining model. Conversely, several of these datasets embed sensitive information that can engender privacy violation and are subsequently used to build deep neural network (DNN) models. Recent approaches to enact privacy and protect data sensitivity in DNN models does decline accuracy, thus, giving rise to significant accuracy disparity between a non-private DNN and a privacy preserving DNN model. This accuracy gap is due to the enormous uncalculated noise flooding and the inability to quantify the right level of noise required to perturb distinct neurons in the DNN model, hence, a dent in accuracy. Consequently, this has hindered the use of privacy protected DNN models in real life applications. In this paper, we present a neuron noise-injection technique based on layer-wise buffered contribution ratio forwarding and ϵ-differential privacy technique to preserve privacy in a DNN model. We adapt a layer-wise relevance propagation technique to compute contribution ratio for each neuron in our network at the pre-training phase. Based on the proportion of each neuron’s contribution ratio, we generate a noise-tuple via the Laplace mechanism, and this helps to eliminate unwanted noise flooding. The noise-tuple is subsequently injected into the training network through its neurons to preserve privacy of the training dataset in a differentially private manner. Hence, each neuron receives right proportion of noise as estimated via contribution ratio, and as a result, unquantifiable noise that drops accuracy of privacy preserving DNN models is avoided. Extensive experiments were conducted based on three real-world datasets and their results show that our approach was able to narrow down the existing accuracy gap to a close proximity, as well outperforms the state-of-the-art approaches in this context.

Download Full-text

Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v4i1.612 ◽

2012 ◽

Vol 4 (1) ◽

Cited By ~ 35

Author(s):

Benjamin I. P. Rubinstein ◽

Peter L. Bartlett ◽

Ling Huang ◽

Nina Taft

Keyword(s):

High Probability ◽

Large Scale ◽

Differential Privacy ◽

Feature Space ◽

Privacy Preserving ◽

Training Data ◽

Inner Product ◽

Support Vector ◽

Sensitive Information ◽

Finite Dimensional

The ubiquitous need for analyzing privacy-sensitive information—including health records, personal communications, product ratings and social network data—is driving significant interest in privacy-preserving data analysis across several research communities. This paper explores the release of Support Vector Machine (SVM) classifiers while preserving the privacy of training data. The SVM is a popular machine learning method that maps data to a high-dimensional feature space before learning a linear decision boundary. We present efficient mechanisms for finite-dimensional feature mappings and for (potentially infinite-dimensional) mappings with translation-invariant kernels. In the latter case, our mechanism borrows a technique from large-scale learning to learn in a finite-dimensional feature space whose inner-product uniformly approximates the desired feature space inner-product (the desired kernel) with high probability. Differential privacy is established using algorithmic stability, a property used in learning theory to bound generalization error. Utility—when the private classifier is pointwise close to the non-private classifier with high probability—is proven using smoothness of regularized empirical risk minimization with respect to small perturbations to the feature mapping. Finally we conclude with lower bounds on the differential privacy of any mechanism approximating the SVM.

Download Full-text

Effective Privacy-Preserving Collection of Health Data from a User’s Wearable Device

Applied Sciences ◽

10.3390/app10186396 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6396

Author(s):

Jong Wook Kim ◽

Su-Mee Moon ◽

Sang-ug Kang ◽

Beakcheol Jang

Keyword(s):

Differential Privacy ◽

Service Providers ◽

Healthcare Services ◽

Wearable Devices ◽

Privacy Preserving ◽

Health Data ◽

Experimental Results ◽

Sensitive Information ◽

Privacy Concerns ◽

Primary Means

The popularity of wearable devices equipped with a variety of sensors that can measure users’ health status and monitor their lifestyle has been increasing. In fact, healthcare service providers have been utilizing these devices as a primary means to collect considerable health data from users. Although the health data collected via wearable devices are useful for providing healthcare services, the indiscriminate collection of an individual’s health data raises serious privacy concerns. This is because the health data measured and monitored by wearable devices contain sensitive information related to the wearer’s personal health and lifestyle. Therefore, we propose a method to aggregate health data obtained from users’ wearable devices in a privacy-preserving manner. The proposed method leverages local differential privacy, which is a de facto standard for privacy-preserving data processing and aggregation, to collect sensitive health data. In particular, to mitigate the error incurred by the perturbation mechanism of location differential privacy, the proposed scheme first samples a small number of salient data that best represents the original health data, after which the scheme collects the sampled salient data instead of the entire set of health data. Our experimental results show that the proposed sampling-based collection scheme achieves significant improvement in the estimated accuracy when compared with straightforward solutions. Furthermore, the experimental results verify that an effective tradeoff between the level of privacy protection and the accuracy of aggregate statistics can be achieved with the proposed approach.

Download Full-text