scholarly journals Differential Privacy at Risk: Bridging Randomness and Privacy Budget

2021 ◽  
Vol 2021 (1) ◽  
pp. 64-84
Author(s):  
Ashish Dandekar ◽  
Debabrota Basu ◽  
Stéphane Bressan

AbstractThe calibration of noise for a privacy-preserving mechanism depends on the sensitivity of the query and the prescribed privacy level. A data steward must make the non-trivial choice of a privacy level that balances the requirements of users and the monetary constraints of the business entity.Firstly, we analyse roles of the sources of randomness, namely the explicit randomness induced by the noise distribution and the implicit randomness induced by the data-generation distribution, that are involved in the design of a privacy-preserving mechanism. The finer analysis enables us to provide stronger privacy guarantees with quantifiable risks. Thus, we propose privacy at risk that is a probabilistic calibration of privacy-preserving mechanisms. We provide a composition theorem that leverages privacy at risk. We instantiate the probabilistic calibration for the Laplace mechanism by providing analytical results.Secondly, we propose a cost model that bridges the gap between the privacy level and the compensation budget estimated by a GDPR compliant business entity. The convexity of the proposed cost model leads to a unique fine-tuning of privacy level that minimises the compensation budget. We show its effectiveness by illustrating a realistic scenario that avoids overestimation of the compensation budget by using privacy at risk for the Laplace mechanism. We quantitatively show that composition using the cost optimal privacy at risk provides stronger privacy guarantee than the classical advanced composition. Although the illustration is specific to the chosen cost model, it naturally extends to any convex cost model. We also provide realistic illustrations of how a data steward uses privacy at risk to balance the trade-off between utility and privacy.

2020 ◽  
Vol 34 (01) ◽  
pp. 784-791 ◽  
Author(s):  
Qinbin Li ◽  
Zhaomin Wu ◽  
Zeyi Wen ◽  
Bingsheng He

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.


2019 ◽  
Vol 17 (4) ◽  
pp. 450-460
Author(s):  
Hai Liu ◽  
Zhenqiang Wu ◽  
Changgen Peng ◽  
Feng Tian ◽  
Laifeng Lu

Considering the untrusted server, differential privacy and local differential privacy has been used for privacy-preserving in data aggregation. Through our analysis, differential privacy and local differential privacy cannot achieve Nash equilibrium between privacy and utility for mobile service based multiuser collaboration, which is multiuser negotiating a desired privacy budget in a collaborative manner for privacy-preserving. To this end, we proposed a Privacy-Preserving Data Aggregation Framework (PPDAF) that reached Nash equilibrium between privacy and utility. Firstly, we presented an adaptive Gaussian mechanism satisfying Nash equilibrium between privacy and utility by multiplying expected utility factor with conditional filtering noise under expected privacy budget. Secondly, we constructed PPDAF using adaptive Gaussian mechanism based on negotiating privacy budget with heuristic obfuscation. Finally, our theoretical analysis and experimental evaluation showed that the PPDAF could achieve Nash equilibrium between privacy and utility. Furthermore, this framework can be extended to engineering instances in a data aggregation setting


Author(s):  
Zhiqiang Gao ◽  
Yixiao Sun ◽  
Xiaolong Cui ◽  
Yutao Wang ◽  
Yanyu Duan ◽  
...  

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.


2018 ◽  
Vol 14 (2) ◽  
pp. 1-17 ◽  
Author(s):  
Zhiqiang Gao ◽  
Yixiao Sun ◽  
Xiaolong Cui ◽  
Yutao Wang ◽  
Yanyu Duan ◽  
...  

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.


2020 ◽  
Vol 2020 (2) ◽  
pp. 379-396 ◽  
Author(s):  
Ricardo Mendes ◽  
Mariana Cunha ◽  
João P. Vilela

AbstractLocation privacy has became an emerging topic due to the pervasiveness of Location-Based Services (LBSs). When sharing location, a certain degree of privacy can be achieved through the use of Location Privacy-Preserving Mechanisms (LPPMs), in where an obfuscated version of the exact user location is reported instead. However, even obfuscated location reports disclose information which poses a risk to privacy. Based on the formal notion of differential privacy, Geo-indistinguishability has been proposed to design LPPMs that limit the amount of information that is disclosed to a potential adversary observing the reports. While promising, this notion considers reports to be independent from each other, thus discarding the potential threat that arises from exploring the correlation between reports. This assumption might hold for the sporadic release of data, however, there is still no formal nor quantitative boundary between sporadic and continuous reports and thus we argue that the consideration of independence is valid depending on the frequency of reports made by the user. This work intends to fill this research gap through a quantitative evaluation of the impact on the privacy level of Geo-indistinguishability under different frequency of reports. Towards this end, state-of-the-art localization attacks and a tracking attack are implemented against a Geo-indistinguishable LPPM under several values of privacy budget and the privacy level is measured along different frequencies of updates using real mobility data.


2020 ◽  
Author(s):  
Junjie Chen ◽  
Wendy Hui Wang ◽  
Xinghua Shi

Machine learning is powerful to model massive genomic data while genome privacy is a growing concern. Studies have shown that not only the raw data but also the trained model can potentially infringe genome privacy. An example is the membership inference attack (MIA), by which the adversary, who only queries a given target model without knowing its internal parameters, can determine whether a specific record was included in the training dataset of the target model. Differential privacy (DP) has been used to defend against MIA with rigorous privacy guarantee. In this paper, we investigate the vulnerability of machine learning against MIA on genomic data, and evaluate the effectiveness of using DP as a defense mechanism. We consider two widely-used machine learning models, namely Lasso and convolutional neural network (CNN), as the target model. We study the trade-off between the defense power against MIA and the prediction accuracy of the target model under various privacy settings of DP. Our results show that the relationship between the privacy budget and target model accuracy can be modeled as a log-like curve, thus a smaller privacy budget provides stronger privacy guarantee with the cost of losing more model accuracy. We also investigate the effect of model sparsity on model vulnerability against MIA. Our results demonstrate that in addition to prevent overfitting, model sparsity can work together with DP to significantly mitigate the risk of MIA.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Guoming Lu ◽  
Xu Zheng ◽  
Jingyuan Duan ◽  
Ling Tian ◽  
Xia Wang

The data publication from multiple contributors has been long considered a fundamental task for data processing in various domains. It has been treated as one prominent prerequisite for enabling AI techniques in wireless networks. With the emergence of diversified smart devices and applications, data held by individuals becomes more pervasive and nontrivial for publication. First, the data are more private and sensitive, as they cover every aspect of daily life, from the incoming data to the fitness data. Second, the publication of such data is also bandwidth-consuming, as they are likely to be stored on mobile devices. The local differential privacy has been considered a novel paradigm for such distributed data publication. However, existing works mostly request the encoding of contents into vector space for publication, which is still costly in network resources. Therefore, this work proposes a novel framework for highly efficient privacy-preserving data publication. Specifically, two sampling-based algorithms are proposed for the histogram publication, which is an important statistic for data analysis. The first algorithm applies a bit-level sampling strategy to both reduce the overall bandwidth and balance the cost among contributors. The second algorithm allows consumers to adjust their focus on different intervals and can properly allocate the sampling ratios to optimize the overall performance. Both the analysis and the validation of real-world data traces have demonstrated the advancement of our work.


Cloud server aggregates a large amount of genome data from multi genome donors to facilitate scientific research. However, the untrusted cloud server is prone to violate privacy of aggregating genome data. Thus, each genome donor can randomly perturb her genome data using differential privacy mechanism before aggregating. But this is easy to lead to utility disaster of aggregating genome data due to the different privacy preferences of each genome donor, and privacy leakage of aggregating genome data because of the kinship between genome donors. The key challenge here is to achieve an equilibrium between privacy preserving and data utility of aggregating multiparty genome data. To this end, we proposed federated aggregation protocol of multiparty genome data (MGD-FAP) with privacy-utility equilibrium for guaranteeing desired privacy protection and desired data utility. First, we regarded the privacy budget and the accuracy as the desired privacy-utility metrics of genome data respectively. Second, we constructed the federated aggregation model of multiparty genome data by combining random perturbation method of genome data guaranteeing desired data utility with federated comparing update method of local privacy budget achieving desired privacy preserving. Third, we presented the MGD-FAP maintaining privacy-utility equilibrium under the federated aggregation model of multiparty genome data. Finally, our theoretical and experimental analysis showed that MGD-FAP can maintain privacy-utility equilibrium. The MGD-FAP is practical and feasible to ensure the privacy-utility equilibrium of cloud server aggregating multiparty genome data.


Author(s):  
Dan Wang ◽  
Ju Ren ◽  
Zhibo Wang ◽  
Xiaoyi Pang ◽  
Yaoxue Zhang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document