scholarly journals Investigating Statistical Privacy Frameworks from the Perspective of Hypothesis Testing

2019 ◽  
Vol 2019 (3) ◽  
pp. 233-254 ◽  
Author(s):  
Changchang Liu ◽  
Xi He ◽  
Thee Chanyaswad ◽  
Shiqiang Wang ◽  
Prateek Mittal

Abstract Over the last decade, differential privacy (DP) has emerged as the gold standard of a rigorous and provable privacy framework. However, there are very few practical guidelines on how to apply differential privacy in practice, and a key challenge is how to set an appropriate value for the privacy parameter ɛ. In this work, we employ a statistical tool called hypothesis testing for discovering useful and interpretable guidelines for the state-of-the-art privacy-preserving frameworks. We formalize and implement hypothesis testing in terms of an adversary’s capability to infer mutually exclusive sensitive information about the input data (such as whether an individual has participated or not) from the output of the privacy-preserving mechanism. We quantify the success of the hypothesis testing using the precision- recall-relation, which provides an interpretable and natural guideline for practitioners and researchers on selecting ɛ. Our key results include a quantitative analysis of how hypothesis testing can guide the choice of the privacy parameter ɛ in an interpretable manner for a differentially private mechanism and its variants. Importantly, our findings show that an adversary’s auxiliary information - in the form of prior distribution of the database and correlation across records and time - indeed influences the proper choice of ɛ. Finally, we also show how the perspective of hypothesis testing can provide useful insights on the relationships among a broad range of privacy frameworks including differential privacy, Pufferfish privacy, Blowfish privacy, dependent differential privacy, inferential privacy, membership privacy and mutual-information based differential privacy.

2020 ◽  
Author(s):  
Fatima Zahra Errounda ◽  
Yan Liu

Abstract Location and trajectory data are routinely collected to generate valuable knowledge about users' pattern behavior. However, releasing location data may jeopardize the privacy of the involved individuals. Differential privacy is a powerful technique that prevents an adversary from inferring the presence or absence of an individual in the original data solely based on the observed data. The first challenge in applying differential privacy in location is that a it usually involves a single user. This shifts the adversary's target to the user's locations instead of presence or absence in the original data. The second challenge is that the inherent correlation between location data, due to people's movement regularity and predictability, gives the adversary an advantage in inferring information about individuals. In this paper, we review the differentially private approaches to tackle these challenges. Our goal is to help newcomers to the field to better understand the state-of-the art by providing a research map that highlights the different challenges in designing differentially private frameworks that tackle the characteristics of location data. We find that in protecting an individual's location privacy, the attention of differential privacy mechanisms shifts to preventing the adversary from inferring the original location based on the observed one. Moreover, we find that the privacy-preserving mechanisms make use of the predictability and regularity of users' movements to design and protect the users' privacy in trajectory data. Finally, we explore how well the presented frameworks succeed in protecting users' locations and trajectories against well-known privacy attacks.


Author(s):  
Ferdinando Fioretto ◽  
Lesia Mitridati ◽  
Pascal Van Hentenryck

This paper introduces a differentially private (DP) mechanism to protect the information exchanged during the coordination of sequential and interdependent markets. This coordination represents a classic Stackelberg game and relies on the exchange of sensitive information between the system agents. The paper is motivated by the observation that the perturbation introduced by traditional DP mechanisms fundamentally changes the underlying optimization problem and even leads to unsatisfiable instances. To remedy such limitation, the paper introduces the Privacy-Preserving Stackelberg Mechanism (PPSM), a framework that enforces the notions of feasibility and fidelity (i.e. near-optimality) of the privacy-preserving information to the original problem objective. PPSM complies with the notion of differential privacy and ensures that the outcomes of the privacy-preserving coordination mechanism are close-to-optimality for each agent. Experimental results on several gas and electricity market benchmarks based on a real case study demonstrate the effectiveness of the proposed approach. A full version of this paper [Fioretto et al., 2020b] contains complete proofs and additional discussion on the motivating application.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Yang Bai ◽  
Yu Li ◽  
Mingchuang Xie ◽  
Mingyu Fan

In recent years, machine learning approaches have been widely adopted for many applications, including classification. Machine learning models deal with collective sensitive data usually trained in a remote public cloud server, for instance, machine learning as a service (MLaaS) system. In this scene, users upload their local data and utilize the computation capability to train models, or users directly access models trained by MLaaS. Unfortunately, recent works reveal that the curious server (that trains the model with users’ sensitive local data and is curious to know the information about individuals) and the malicious MLaaS user (who abused to query from the MLaaS system) will cause privacy risks. The adversarial method as one of typical mitigation has been studied by several recent works. However, most of them focus on the privacy-preserving against the malicious user; in other words, they commonly consider the data owner and the model provider as one role. Under this assumption, the privacy leakage risks from the curious server are neglected. Differential privacy methods can defend against privacy threats from both the curious sever and the malicious MLaaS user by directly adding noise to the training data. Nonetheless, the differential privacy method will decrease the classification accuracy of the target model heavily. In this work, we propose a generic privacy-preserving framework based on the adversarial method to defend both the curious server and the malicious MLaaS user. The framework can adapt with several adversarial algorithms to generate adversarial examples directly with data owners’ original data. By doing so, sensitive information about the original data is hidden. Then, we explore the constraint conditions of this framework which help us to find the balance between privacy protection and the model utility. The experiments’ results show that our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive data from direct content exposed attacks. In addition, our method can achieve better privacy and utility balance compared to the existing method.


2021 ◽  
Vol 14 (10) ◽  
pp. 1886-1899
Author(s):  
Chang Ge ◽  
Shubhankar Mohapatra ◽  
Xi He ◽  
Ihab F. Ilyas

Organizations are increasingly relying on data to support decisions. When data contains private and sensitive information, the data owner often desires to publish a synthetic database instance that is similarly useful as the true data, while ensuring the privacy of individual data records. Existing differentially private data synthesis methods aim to generate useful data based on applications, but they fail in keeping one of the most fundamental data properties of the structured data --- the underlying correlations and dependencies among tuples and attributes (i.e., the structure of the data). This structure is often expressed as integrity and schema constraints, or with a probabilistic generative process. As a result, the synthesized data is not useful for any downstream tasks that require this structure to be preserved. This work presents KAMINO, a data synthesis system to ensure differential privacy and to preserve the structure and correlations present in the original dataset. KAMINO takes as input of a database instance, along with its schema (including integrity constraints), and produces a synthetic database instance with differential privacy and structure preservation guarantees. We empirically show that while preserving the structure of the data, KAMINO achieves comparable and even better usefulness in applications of training classification models and answering marginal queries than the state-of-the-art methods of differentially private data synthesis.


2021 ◽  
Vol 36 ◽  
pp. 04005
Author(s):  
Kah Meng Chong

Electronic Health Record (EHR) is the key to an efficient healthcare service delivery system. The publication of healthcare data is highly beneficial to healthcare industries and government institutions to support a variety of medical and census research. However, healthcare data contains sensitive information of patients and the publication of such data could lead to unintended privacy disclosures. In this paper, we present a comprehensive survey of the state-of-the-art privacy-enhancing methods that ensure a secure healthcare data sharing environment. We focus on the recently proposed schemes based on data anonymization and differential privacy approaches in the protection of healthcare data privacy. We highlight the strengths and limitations of the two approaches and discussed some promising future research directions in this area.


2018 ◽  
Vol 14 (2) ◽  
pp. 1-17 ◽  
Author(s):  
Zhiqiang Gao ◽  
Yixiao Sun ◽  
Xiaolong Cui ◽  
Yutao Wang ◽  
Yanyu Duan ◽  
...  

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.


2021 ◽  
Vol 21 (1) ◽  
pp. 1-25
Author(s):  
Elie Chicha ◽  
Bechara Al Bouna ◽  
Mohamed Nassar ◽  
Richard Chbeir ◽  
Ramzi A. Haraty ◽  
...  

In this article, we present a privacy-preserving technique for user-centric multi-release graphs. Our technique consists of sequentially releasing anonymized versions of these graphs under Blowfish Privacy. To do so, we introduce a graph model that is augmented with a time dimension and sampled at discrete time steps. We show that the direct application of state-of-the-art privacy-preserving Differential Private techniques is weak against background knowledge attacker models. We present different scenarios where randomizing separate releases independently is vulnerable to correlation attacks. Our method is inspired by Differential Privacy (DP) and its extension Blowfish Privacy (BP). To validate it, we show its effectiveness as well as its utility by experimental simulations.


2020 ◽  
Vol 10 (1) ◽  
pp. 137-152
Author(s):  
Tosin A. Adesuyi ◽  
Byeong Man Kim

AbstractData is the key to information mining that unveils hidden knowledge. The ability to revealed knowledge relies on the extractable features of a dataset and likewise the depth of the mining model. Conversely, several of these datasets embed sensitive information that can engender privacy violation and are subsequently used to build deep neural network (DNN) models. Recent approaches to enact privacy and protect data sensitivity in DNN models does decline accuracy, thus, giving rise to significant accuracy disparity between a non-private DNN and a privacy preserving DNN model. This accuracy gap is due to the enormous uncalculated noise flooding and the inability to quantify the right level of noise required to perturb distinct neurons in the DNN model, hence, a dent in accuracy. Consequently, this has hindered the use of privacy protected DNN models in real life applications. In this paper, we present a neuron noise-injection technique based on layer-wise buffered contribution ratio forwarding and ϵ-differential privacy technique to preserve privacy in a DNN model. We adapt a layer-wise relevance propagation technique to compute contribution ratio for each neuron in our network at the pre-training phase. Based on the proportion of each neuron’s contribution ratio, we generate a noise-tuple via the Laplace mechanism, and this helps to eliminate unwanted noise flooding. The noise-tuple is subsequently injected into the training network through its neurons to preserve privacy of the training dataset in a differentially private manner. Hence, each neuron receives right proportion of noise as estimated via contribution ratio, and as a result, unquantifiable noise that drops accuracy of privacy preserving DNN models is avoided. Extensive experiments were conducted based on three real-world datasets and their results show that our approach was able to narrow down the existing accuracy gap to a close proximity, as well outperforms the state-of-the-art approaches in this context.


Author(s):  
Benjamin I. P. Rubinstein ◽  
Peter L. Bartlett ◽  
Ling Huang ◽  
Nina Taft

The ubiquitous need for analyzing privacy-sensitive information—including health records, personal communications, product ratings and social network data—is driving significant interest in privacy-preserving data analysis across several research communities. This paper explores the release of Support Vector Machine (SVM) classifiers while preserving the privacy of training data. The SVM is a popular machine learning method that maps data to a high-dimensional feature space before learning a linear decision boundary. We present efficient mechanisms for finite-dimensional feature mappings and for (potentially infinite-dimensional) mappings with translation-invariant kernels. In the latter case, our mechanism borrows a technique from large-scale learning to learn in a finite-dimensional feature space whose inner-product uniformly approximates the desired feature space inner-product (the desired kernel) with high probability. Differential privacy is established using algorithmic stability, a property used in learning theory to bound generalization error. Utility—when the private classifier is pointwise close to the non-private classifier with high probability—is proven using smoothness of regularized empirical risk minimization with respect to small perturbations to the feature mapping. Finally we conclude with lower bounds on the differential privacy of any mechanism approximating the SVM.


2020 ◽  
Vol 10 (18) ◽  
pp. 6396
Author(s):  
Jong Wook Kim ◽  
Su-Mee Moon ◽  
Sang-ug Kang ◽  
Beakcheol Jang

The popularity of wearable devices equipped with a variety of sensors that can measure users’ health status and monitor their lifestyle has been increasing. In fact, healthcare service providers have been utilizing these devices as a primary means to collect considerable health data from users. Although the health data collected via wearable devices are useful for providing healthcare services, the indiscriminate collection of an individual’s health data raises serious privacy concerns. This is because the health data measured and monitored by wearable devices contain sensitive information related to the wearer’s personal health and lifestyle. Therefore, we propose a method to aggregate health data obtained from users’ wearable devices in a privacy-preserving manner. The proposed method leverages local differential privacy, which is a de facto standard for privacy-preserving data processing and aggregation, to collect sensitive health data. In particular, to mitigate the error incurred by the perturbation mechanism of location differential privacy, the proposed scheme first samples a small number of salient data that best represents the original health data, after which the scheme collects the sampled salient data instead of the entire set of health data. Our experimental results show that the proposed sampling-based collection scheme achieves significant improvement in the estimated accuracy when compared with straightforward solutions. Furthermore, the experimental results verify that an effective tradeoff between the level of privacy protection and the accuracy of aggregate statistics can be achieved with the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document