scholarly journals An Analysis of Differential Privacy Research in Location and Trajectory Data

2020 ◽  
Author(s):  
Fatima Zahra Errounda ◽  
Yan Liu

Abstract Location and trajectory data are routinely collected to generate valuable knowledge about users' pattern behavior. However, releasing location data may jeopardize the privacy of the involved individuals. Differential privacy is a powerful technique that prevents an adversary from inferring the presence or absence of an individual in the original data solely based on the observed data. The first challenge in applying differential privacy in location is that a it usually involves a single user. This shifts the adversary's target to the user's locations instead of presence or absence in the original data. The second challenge is that the inherent correlation between location data, due to people's movement regularity and predictability, gives the adversary an advantage in inferring information about individuals. In this paper, we review the differentially private approaches to tackle these challenges. Our goal is to help newcomers to the field to better understand the state-of-the art by providing a research map that highlights the different challenges in designing differentially private frameworks that tackle the characteristics of location data. We find that in protecting an individual's location privacy, the attention of differential privacy mechanisms shifts to preventing the adversary from inferring the original location based on the observed one. Moreover, we find that the privacy-preserving mechanisms make use of the predictability and regularity of users' movements to design and protect the users' privacy in trajectory data. Finally, we explore how well the presented frameworks succeed in protecting users' locations and trajectories against well-known privacy attacks.

Author(s):  
Mohammad Reza Ebrahimi Dishabi ◽  
Mohammad Abdollahi Azgomi

Most of the existing privacy preserving clustering (PPC) algorithms do not consider the worst case privacy guarantees and are based on heuristic notions. In addition, these algorithms do not run efficiently in the case of high dimensionality of data. In this paper, to alleviate these challenges, we propose a new PPC algorithm, which is based on Daubechies-2 wavelet transform (D2WT) and preserves the differential privacy notion. Differential privacy is the strong notion of privacy, which provides the worst case privacy guarantees. On the other hand, most of the existing differential-based PPC algorithms generate data with poor utility. If we apply differential privacy properties over the original raw data, the resulting data will offer lower quality of clustering (QOC) during the clustering analysis. Therefore, we use D2WT for the preprocessing of the original data before adding noise to the data. By applying D2WT to the original data, the resulting data not only contains lower dimension compared to the original data, but also can provide differential privacy guarantee with high QOC due to less noise addition. The proposed algorithm has been implemented and experimented over some well-known datasets. We also compare the proposed algorithm with some recently introduced algorithms based on utility and privacy degrees.


Author(s):  
Quanming Yao ◽  
Xiawei Guo ◽  
James Kwok ◽  
Weiwei Tu ◽  
Yuqiang Chen ◽  
...  

To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this paper, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i.e., MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of a significant concern.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Farhad Farokhi

AbstractLocal differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parametric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Yang Bai ◽  
Yu Li ◽  
Mingchuang Xie ◽  
Mingyu Fan

In recent years, machine learning approaches have been widely adopted for many applications, including classification. Machine learning models deal with collective sensitive data usually trained in a remote public cloud server, for instance, machine learning as a service (MLaaS) system. In this scene, users upload their local data and utilize the computation capability to train models, or users directly access models trained by MLaaS. Unfortunately, recent works reveal that the curious server (that trains the model with users’ sensitive local data and is curious to know the information about individuals) and the malicious MLaaS user (who abused to query from the MLaaS system) will cause privacy risks. The adversarial method as one of typical mitigation has been studied by several recent works. However, most of them focus on the privacy-preserving against the malicious user; in other words, they commonly consider the data owner and the model provider as one role. Under this assumption, the privacy leakage risks from the curious server are neglected. Differential privacy methods can defend against privacy threats from both the curious sever and the malicious MLaaS user by directly adding noise to the training data. Nonetheless, the differential privacy method will decrease the classification accuracy of the target model heavily. In this work, we propose a generic privacy-preserving framework based on the adversarial method to defend both the curious server and the malicious MLaaS user. The framework can adapt with several adversarial algorithms to generate adversarial examples directly with data owners’ original data. By doing so, sensitive information about the original data is hidden. Then, we explore the constraint conditions of this framework which help us to find the balance between privacy protection and the model utility. The experiments’ results show that our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive data from direct content exposed attacks. In addition, our method can achieve better privacy and utility balance compared to the existing method.


2019 ◽  
Vol 2019 (3) ◽  
pp. 233-254 ◽  
Author(s):  
Changchang Liu ◽  
Xi He ◽  
Thee Chanyaswad ◽  
Shiqiang Wang ◽  
Prateek Mittal

Abstract Over the last decade, differential privacy (DP) has emerged as the gold standard of a rigorous and provable privacy framework. However, there are very few practical guidelines on how to apply differential privacy in practice, and a key challenge is how to set an appropriate value for the privacy parameter ɛ. In this work, we employ a statistical tool called hypothesis testing for discovering useful and interpretable guidelines for the state-of-the-art privacy-preserving frameworks. We formalize and implement hypothesis testing in terms of an adversary’s capability to infer mutually exclusive sensitive information about the input data (such as whether an individual has participated or not) from the output of the privacy-preserving mechanism. We quantify the success of the hypothesis testing using the precision- recall-relation, which provides an interpretable and natural guideline for practitioners and researchers on selecting ɛ. Our key results include a quantitative analysis of how hypothesis testing can guide the choice of the privacy parameter ɛ in an interpretable manner for a differentially private mechanism and its variants. Importantly, our findings show that an adversary’s auxiliary information - in the form of prior distribution of the database and correlation across records and time - indeed influences the proper choice of ɛ. Finally, we also show how the perspective of hypothesis testing can provide useful insights on the relationships among a broad range of privacy frameworks including differential privacy, Pufferfish privacy, Blowfish privacy, dependent differential privacy, inferential privacy, membership privacy and mutual-information based differential privacy.


2021 ◽  
Vol 14 (11) ◽  
pp. 2190-2202
Author(s):  
Kuntai Cai ◽  
Xiaoyu Lei ◽  
Jianxin Wei ◽  
Xiaokui Xiao

This paper studies the synthesis of high-dimensional datasets with differential privacy (DP). The state-of-the-art solution addresses this problem by first generating a set M of noisy low-dimensional marginals of the input data D , and then use them to approximate the data distribution in D for synthetic data generation. However, it imposes several constraints on M that considerably limits the choices of marginals. This makes it difficult to capture all important correlations among attributes, which in turn degrades the quality of the resulting synthetic data. To address the above deficiency, we propose PrivMRF, a method that (i) also utilizes a set M of low-dimensional marginals for synthesizing high-dimensional data with DP, but (ii) provides a high degree of flexibility in the choices of marginals. The key idea of PrivMRF is to select an appropriate M to construct a Markov random field (MRF) that models the correlations among the attributes in the input data, and then use the MRF for data synthesis. Experimental results on four benchmark datasets show that PrivMRF consistently outperforms the state of the art in terms of the accuracy of counting queries and classification tasks conducted on the synthetic data generated.


2017 ◽  
Vol 2017 (4) ◽  
pp. 156-176 ◽  
Author(s):  
Apostolos Pyrgelis ◽  
Carmela Troncoso ◽  
Emiliano De Cristofaro

Abstract Information about people’s movements and the locations they visit enables an increasing number of mobility analytics applications, e.g., in the context of urban and transportation planning, In this setting, rather than collecting or sharing raw data, entities often use aggregation as a privacy protection mechanism, aiming to hide individual users’ location traces. Furthermore, to bound information leakage from the aggregates, they can perturb the input of the aggregation or its output to ensure that these are differentially private. In this paper, we set to evaluate the impact of releasing aggregate location time-series on the privacy of individuals contributing to the aggregation. We introduce a framework allowing us to reason about privacy against an adversary attempting to predict users’ locations or recover their mobility patterns. We formalize these attacks as inference problems, and discuss a few strategies to model the adversary’s prior knowledge based on the information she may have access to. We then use the framework to quantify the privacy loss stemming from aggregate location data, with and without the protection of differential privacy, using two real-world mobility datasets. We find that aggregates do leak information about individuals’ punctual locations and mobility profiles. The density of the observations, as well as timing, play important roles, e.g., regular patterns during peak hours are better protected than sporadic movements. Finally, our evaluation shows that both output and input perturbation offer little additional protection, unless they introduce large amounts of noise ultimately destroying the utility of the data.


Sign in / Sign up

Export Citation Format

Share Document