Differential Privacy at Risk: Bridging Randomness and Privacy Budget

AbstractThe calibration of noise for a privacy-preserving mechanism depends on the sensitivity of the query and the prescribed privacy level. A data steward must make the non-trivial choice of a privacy level that balances the requirements of users and the monetary constraints of the business entity.Firstly, we analyse roles of the sources of randomness, namely the explicit randomness induced by the noise distribution and the implicit randomness induced by the data-generation distribution, that are involved in the design of a privacy-preserving mechanism. The finer analysis enables us to provide stronger privacy guarantees with quantifiable risks. Thus, we propose privacy at risk that is a probabilistic calibration of privacy-preserving mechanisms. We provide a composition theorem that leverages privacy at risk. We instantiate the probabilistic calibration for the Laplace mechanism by providing analytical results.Secondly, we propose a cost model that bridges the gap between the privacy level and the compensation budget estimated by a GDPR compliant business entity. The convexity of the proposed cost model leads to a unique fine-tuning of privacy level that minimises the compensation budget. We show its effectiveness by illustrating a realistic scenario that avoids overestimation of the compensation budget by using privacy at risk for the Laplace mechanism. We quantitatively show that composition using the cost optimal privacy at risk provides stronger privacy guarantee than the classical advanced composition. Although the illustration is specific to the chosen cost model, it naturally extends to any convex cost model. We also provide realistic illustrations of how a data steward uses privacy at risk to balance the trade-off between utility and privacy.

Download Full-text

Privacy-Preserving Gradient Boosting Decision Trees

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5422 ◽

2020 ◽

Vol 34 (01) ◽

pp. 784-791 ◽

Cited By ~ 1

Author(s):

Qinbin Li ◽

Zhaomin Wu ◽

Zeyi Wen ◽

Bingsheng He

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Training Data ◽

Gradient Boosting ◽

Training Algorithm ◽

Model Accuracy ◽

Machine Learning Model ◽

Improve Model ◽

Privacy Budget ◽

Privacy Level

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.

Download Full-text

Privacy-Preserving Data Aggregation Framework for Mobile Service Based Multiuser Collaboration

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/4/3 ◽

2019 ◽

Vol 17 (4) ◽

pp. 450-460

Author(s):

Hai Liu ◽

Zhenqiang Wu ◽

Changgen Peng ◽

Feng Tian ◽

Laifeng Lu

Keyword(s):

Nash Equilibrium ◽

Theoretical Analysis ◽

Expected Utility ◽

Data Aggregation ◽

Experimental Evaluation ◽

Differential Privacy ◽

Privacy Preserving ◽

Mobile Service ◽

Utility Factor ◽

Privacy Budget

Considering the untrusted server, differential privacy and local differential privacy has been used for privacy-preserving in data aggregation. Through our analysis, differential privacy and local differential privacy cannot achieve Nash equilibrium between privacy and utility for mobile service based multiuser collaboration, which is multiuser negotiating a desired privacy budget in a collaborative manner for privacy-preserving. To this end, we proposed a Privacy-Preserving Data Aggregation Framework (PPDAF) that reached Nash equilibrium between privacy and utility. Firstly, we presented an adaptive Gaussian mechanism satisfying Nash equilibrium between privacy and utility by multiplying expected utility factor with conditional filtering noise under expected privacy budget. Secondly, we constructed PPDAF using adaptive Gaussian mechanism based on negotiating privacy budget with heuristic obfuscation. Finally, our theoretical analysis and experimental evaluation showed that the PPDAF could achieve Nash equilibrium between privacy and utility. Furthermore, this framework can be extended to engineering instances in a data aggregation setting

Download Full-text

Privacy-Preserving Hybrid K-Means

Censorship, Surveillance, and Privacy ◽

10.4018/978-1-5225-7113-1.ch049 ◽

2019 ◽

pp. 1009-1026

Author(s):

Zhiqiang Gao ◽

Yixiao Sun ◽

Xiaolong Cui ◽

Yutao Wang ◽

Yanyu Duan ◽

...

Keyword(s):

Data Mining ◽

Differential Privacy ◽

Privacy Preserving ◽

Local Optimum ◽

Data Sets ◽

Swarm Optimization ◽

Second Stage ◽

Private Data ◽

Privacy Budget ◽

Selection Of

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.

Download Full-text

Privacy-Preserving Hybrid K-Means

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2018040101 ◽

2018 ◽

Vol 14 (2) ◽

pp. 1-17 ◽

Cited By ~ 4

Author(s):

Zhiqiang Gao ◽

Yixiao Sun ◽

Xiaolong Cui ◽

Yutao Wang ◽

Yanyu Duan ◽

...

Keyword(s):

Differential Privacy ◽

State Of The Art ◽

Privacy Preserving ◽

Local Optimum ◽

Massive Data ◽

Data Sets ◽

Second Stage ◽

Private Data ◽

Privacy Budget ◽

Selection Of

Download Full-text

Impact of Frequency of Location Reports on the Privacy Level of Geo-indistinguishability

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2020-0032 ◽

2020 ◽

Vol 2020 (2) ◽

pp. 379-396 ◽

Cited By ~ 2

Author(s):

Ricardo Mendes ◽

Mariana Cunha ◽

João P. Vilela

Keyword(s):

Location Privacy ◽

Differential Privacy ◽

Location Based Services ◽

Potential Threat ◽

Mobility Data ◽

Research Gap ◽

Privacy Budget ◽

The Impact ◽

User Location ◽

Privacy Level

AbstractLocation privacy has became an emerging topic due to the pervasiveness of Location-Based Services (LBSs). When sharing location, a certain degree of privacy can be achieved through the use of Location Privacy-Preserving Mechanisms (LPPMs), in where an obfuscated version of the exact user location is reported instead. However, even obfuscated location reports disclose information which poses a risk to privacy. Based on the formal notion of differential privacy, Geo-indistinguishability has been proposed to design LPPMs that limit the amount of information that is disclosed to a potential adversary observing the reports. While promising, this notion considers reports to be independent from each other, thus discarding the potential threat that arises from exploring the correlation between reports. This assumption might hold for the sporadic release of data, however, there is still no formal nor quantitative boundary between sporadic and continuous reports and thus we argue that the consideration of independence is valid depending on the frequency of reports made by the user. This work intends to fill this research gap through a quantitative evaluation of the impact on the privacy level of Geo-indistinguishability under different frequency of reports. Towards this end, state-of-the-art localization attacks and a tracking attack are implemented against a Geo-indistinguishable LPPM under several values of privacy budget and the privacy level is measured along different frequencies of updates using real mobility data.

Download Full-text

Differential Privacy Protection Against Membership Inference Attack on Machine Learning for Genomic Data

10.1101/2020.08.03.235416 ◽

2020 ◽

Author(s):

Junjie Chen ◽

Wendy Hui Wang ◽

Xinghua Shi

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Defense Mechanism ◽

Genomic Data ◽

Training Dataset ◽

Model Accuracy ◽

Target Model ◽

Inference Attack ◽

The Cost ◽

Privacy Budget

Machine learning is powerful to model massive genomic data while genome privacy is a growing concern. Studies have shown that not only the raw data but also the trained model can potentially infringe genome privacy. An example is the membership inference attack (MIA), by which the adversary, who only queries a given target model without knowing its internal parameters, can determine whether a specific record was included in the training dataset of the target model. Differential privacy (DP) has been used to defend against MIA with rigorous privacy guarantee. In this paper, we investigate the vulnerability of machine learning against MIA on genomic data, and evaluate the effectiveness of using DP as a defense mechanism. We consider two widely-used machine learning models, namely Lasso and convolutional neural network (CNN), as the target model. We study the trade-off between the defense power against MIA and the prediction accuracy of the target model under various privacy settings of DP. Our results show that the relationship between the privacy budget and target model accuracy can be modeled as a log-like curve, thus a smaller privacy budget provides stronger privacy guarantee with the cost of losing more model accuracy. We also investigate the effect of model sparsity on model vulnerability against MIA. Our results demonstrate that in addition to prevent overfitting, model sparsity can work together with DP to significantly mitigate the risk of MIA.

Download Full-text

A Sampling-Based Method for Highly Efficient Privacy-Preserving Data Publication

Wireless Communications and Mobile Computing ◽

10.1155/2021/6648775 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Guoming Lu ◽

Xu Zheng ◽

Jingyuan Duan ◽

Ling Tian ◽

Xia Wang

Keyword(s):

Differential Privacy ◽

Sampling Strategy ◽

Privacy Preserving ◽

Smart Devices ◽

Distributed Data ◽

Network Resources ◽

Real World Data ◽

Data Publication ◽

Highly Efficient ◽

The Cost

The data publication from multiple contributors has been long considered a fundamental task for data processing in various domains. It has been treated as one prominent prerequisite for enabling AI techniques in wireless networks. With the emergence of diversified smart devices and applications, data held by individuals becomes more pervasive and nontrivial for publication. First, the data are more private and sensitive, as they cover every aspect of daily life, from the incoming data to the fitness data. Second, the publication of such data is also bandwidth-consuming, as they are likely to be stored on mobile devices. The local differential privacy has been considered a novel paradigm for such distributed data publication. However, existing works mostly request the encoding of contents into vector space for publication, which is still costly in network resources. Therefore, this work proposes a novel framework for highly efficient privacy-preserving data publication. Specifically, two sampling-based algorithms are proposed for the histogram publication, which is an important statistic for data analysis. The first algorithm applies a bit-level sampling strategy to both reduce the overall bandwidth and balance the cost among contributors. The second algorithm allows consumers to adjust their focus on different intervals and can properly allocate the sampling ratios to optimize the overall performance. Both the analysis and the validation of real-world data traces have demonstrated the advancement of our work.

Download Full-text

Privacy-Utility Equilibrium Protocol for Federated Aggregating Multiparty Genome Data

Journal of Networking and Network Applications ◽

10.33969/j-nana.2021.010303 ◽

2021 ◽

Vol 1 (3) ◽

Keyword(s):

Differential Privacy ◽

Random Perturbation ◽

Privacy Preserving ◽

Aggregation Model ◽

Data Utility ◽

Genome Data ◽

Genome Donor ◽

Cloud Server ◽

Genome Donors ◽

Privacy Budget

Cloud server aggregates a large amount of genome data from multi genome donors to facilitate scientific research. However, the untrusted cloud server is prone to violate privacy of aggregating genome data. Thus, each genome donor can randomly perturb her genome data using differential privacy mechanism before aggregating. But this is easy to lead to utility disaster of aggregating genome data due to the different privacy preferences of each genome donor, and privacy leakage of aggregating genome data because of the kinship between genome donors. The key challenge here is to achieve an equilibrium between privacy preserving and data utility of aggregating multiparty genome data. To this end, we proposed federated aggregation protocol of multiparty genome data (MGD-FAP) with privacy-utility equilibrium for guaranteeing desired privacy protection and desired data utility. First, we regarded the privacy budget and the accuracy as the desired privacy-utility metrics of genome data respectively. Second, we constructed the federated aggregation model of multiparty genome data by combining random perturbation method of genome data guaranteeing desired data utility with federated comparing update method of local privacy budget achieving desired privacy preserving. Third, we presented the MGD-FAP maintaining privacy-utility equilibrium under the federated aggregation model of multiparty genome data. Finally, our theoretical and experimental analysis showed that MGD-FAP can maintain privacy-utility equilibrium. The MGD-FAP is practical and feasible to ensure the privacy-utility equilibrium of cloud server aggregating multiparty genome data.

Download Full-text