Optimal Distribution of Privacy Budget in Differential Privacy

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.

Download Full-text

Privacy-Preserving Data Aggregation Framework for Mobile Service Based Multiuser Collaboration

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/4/3 ◽

2019 ◽

Vol 17 (4) ◽

pp. 450-460

Author(s):

Hai Liu ◽

Zhenqiang Wu ◽

Changgen Peng ◽

Feng Tian ◽

Laifeng Lu

Keyword(s):

Nash Equilibrium ◽

Theoretical Analysis ◽

Expected Utility ◽

Data Aggregation ◽

Experimental Evaluation ◽

Differential Privacy ◽

Privacy Preserving ◽

Mobile Service ◽

Utility Factor ◽

Privacy Budget

Considering the untrusted server, differential privacy and local differential privacy has been used for privacy-preserving in data aggregation. Through our analysis, differential privacy and local differential privacy cannot achieve Nash equilibrium between privacy and utility for mobile service based multiuser collaboration, which is multiuser negotiating a desired privacy budget in a collaborative manner for privacy-preserving. To this end, we proposed a Privacy-Preserving Data Aggregation Framework (PPDAF) that reached Nash equilibrium between privacy and utility. Firstly, we presented an adaptive Gaussian mechanism satisfying Nash equilibrium between privacy and utility by multiplying expected utility factor with conditional filtering noise under expected privacy budget. Secondly, we constructed PPDAF using adaptive Gaussian mechanism based on negotiating privacy budget with heuristic obfuscation. Finally, our theoretical analysis and experimental evaluation showed that the PPDAF could achieve Nash equilibrium between privacy and utility. Furthermore, this framework can be extended to engineering instances in a data aggregation setting

Download Full-text

Budget sharing for multi-analyst differential privacy

Proceedings of the VLDB Endowment ◽

10.14778/3467861.3467870 ◽

2021 ◽

Vol 14 (10) ◽

pp. 1805-1817

Author(s):

David Pujol ◽

Yikai Wu ◽

Brandon Fain ◽

Ashwin Machanavajjhala

Keyword(s):

Optimization Problem ◽

Differential Privacy ◽

Census Bureau ◽

Query Answering ◽

Multiple Stakeholders ◽

The Us ◽

Us Census ◽

Privacy Budget ◽

Summary Data ◽

Single Set

Large organizations that collect data about populations (like the US Census Bureau) release summary statistics that are used by multiple stakeholders for resource allocation and policy making problems. These organizations are also legally required to protect the privacy of individuals from whom they collect data. Differential Privacy (DP) provides a solution to release useful summary data while preserving privacy. Most DP mechanisms are designed to answer a single set of queries. In reality, there are often multiple stakeholders that use a given data release and have overlapping but not-identical queries. This introduces a novel joint optimization problem in DP where the privacy budget must be shared among different analysts. We initiate study into the problem of DP query answering across multiple analysts. To capture the competing goals and priorities of multiple analysts, we formulate three desiderata that any mechanism should satisfy in this setting - The Sharing Incentive, Non-interference, and Adaptivity - while still optimizing for overall error. We demonstrate how existing DP query answering mechanisms in the multi-analyst settings fail to satisfy at least one of the desiderata. We present novel DP algorithms that provably satisfy all our desiderata and empirically show that they incur low error on realistic tasks.

Download Full-text

Not All Attributes are Created Equal: dX -Private Mechanisms for Linear Queries

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2020-0007 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 103-125

Author(s):

Parameswaran Kamalaruban ◽

Victor Perrier ◽

Hassan Jameel Asghar ◽

Mohamed Ali Kaafar

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Differential Privacy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Trade Off ◽

Systematic Procedure ◽

Privacy Budget ◽

Sensitivity Vector

AbstractDifferential privacy provides strong privacy guarantees simultaneously enabling useful insights from sensitive datasets. However, it provides the same level of protection for all elements (individuals and attributes) in the data. There are practical scenarios where some data attributes need more/less protection than others. In this paper, we consider dX -privacy, an instantiation of the privacy notion introduced in [6], which allows this flexibility by specifying a separate privacy budget for each pair of elements in the data domain. We describe a systematic procedure to tailor any existing differentially private mechanism that assumes a query set and a sensitivity vector as input into its dX -private variant, specifically focusing on linear queries. Our proposed meta procedure has broad applications as linear queries form the basis of a range of data analysis and machine learning algorithms, and the ability to define a more flexible privacy budget across the data domain results in improved privacy/utility tradeoff in these applications. We propose several dX -private mechanisms, and provide theoretical guarantees on the trade-off between utility and privacy. We also experimentally demonstrate the effectiveness of our procedure, by evaluating our proposed dX -private Laplace mechanism on both synthetic and real datasets using a set of randomly generated linear queries.

Download Full-text

BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model

Journal of Privacy and Confidentiality ◽

10.29012/jpc.680 ◽

2019 ◽

Vol 9 (2) ◽

Author(s):

Brendan Avent ◽

Aleksandra Korolova ◽

David Zeber ◽

Torgeir Hovden ◽

Benjamin Livshits

Keyword(s):

Local Search ◽

Hybrid Model ◽

Differential Privacy ◽

Data Sets ◽

Privacy Model ◽

New Type ◽

Privacy Budget

We propose a hybrid model of differential privacy that considers a combination of regular and opt-in users who desire the differential privacy guarantees of the local privacy model and the trusted curator model, respectively. We demonstrate that within this model, it is possible to design a new type of blended algorithm that improves the utility of obtained data, while providing users with their desired privacy guarantees. We apply this algorithm to the task of privately computing the head of the search log and show that the blended approach provides significant improvements in the utility of the data compared to related work. Specifically, on two large search click data sets, comprising 1.75 and 16 GB, respectively, our approach attains NDCG values exceeding 95% across a range of privacy budget values.

Download Full-text

LinkedIn's Audience Engagements API

Journal of Privacy and Confidentiality ◽

10.29012/jpc.782 ◽

2021 ◽

Vol 11 (3) ◽

Author(s):

Ryan Rogers ◽

Subbu Subramaniam ◽

Sean Peng ◽

David Durfee ◽

Seunghyun Lee ◽

...

Keyword(s):

Real Time ◽

Data Analytics ◽

Differential Privacy ◽

Management Service ◽

Budget Management ◽

Time Data ◽

Audience Engagement ◽

Marketing Analytics ◽

Real Time Data ◽

Privacy Budget

We present a privacy system that leverages differential privacy to protect LinkedIn members' data while also providing audience engagement insights to enable marketing analytics related applications. We detail the differentially private algorithms and other privacy safeguards used to provide results that can be used with existing real-time data analytics platforms, specifically with the open sourced Pinot system. Our privacy system provides user-level privacy guarantees. As part of our privacy system, we include a budget management service that enforces a strict differential privacy budget on the returned results to the analyst. This budget management service brings together the latest research in differential privacy into a product to maintain utility given a fixed differential privacy budget.

Download Full-text

Generalized and Multiple-Queries-Oriented Privacy Budget Strategies in Differential Privacy via Convergent Series

Security and Communication Networks ◽

10.1155/2021/5564176 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Yunlu Bai ◽

Geng Yang ◽

Yang Xiang ◽

Xuan Wang

Keyword(s):

High Performance ◽

Differential Privacy ◽

Budget Allocation ◽

Convergent Series ◽

Multiple Queries ◽

Analysis Task ◽

Different Characteristics ◽

High Degree ◽

Privacy Budget ◽

Allocation Strategies

For data analysis with differential privacy, an analysis task usually requires multiple queries to complete, and the total budget needs to be divided into different parts and allocated to each query. However, at present, the budget allocation in differential privacy lacks efficient and general allocation strategies, and most of the research tends to adopt an average or exclusive allocation method. In this paper, we propose two series strategies for budget allocation: the geometric series and the Taylor series. We show the different characteristics of the two series and provide a calculation method for selecting the key parameters. To better reflect a user’s preference of noise during the allocation, we explored the relationship between sensitivity and noise in detail, and, based on this, we propose an optimization for the series strategies. Finally, to prevent collusion attacks and improve security, we provide three ideas for protecting the budget sequence. Both the theoretical analysis and experimental results show that our methods can support more queries and achieve higher utility. This shows that our series allocation strategies have a high degree of flexibility which can meet the user’s need and allow them to be better applied to differentially private algorithms to achieve high performance while maintaining the security.

Download Full-text

Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff

Gates Open Research ◽

10.12688/gatesopenres.13089.1 ◽

2019 ◽

Vol 3 ◽

pp. 1722 ◽

Cited By ~ 1

Author(s):

Samantha Petti ◽

Abraham Flaxman

Keyword(s):

Census Data ◽

Differential Privacy ◽

Simple Random Sampling ◽

Computer Code ◽

Census Bureau ◽

Decennial Census ◽

Empirical Measure ◽

Simple Random Sample ◽

Us Census ◽

Privacy Budget

Background: The 2020 US Census will use a novel approach to disclosure avoidance to protect respondents’ data, called TopDown. This TopDown algorithm was applied to the 2018 end-to-end (E2E) test of the decennial census. The computer code used for this test as well as accompanying exposition has recently been released publicly by the Census Bureau. Methods: We used the available code and data to better understand the error introduced by the E2E disclosure avoidance system when Census Bureau applied it to 1940 census data and we developed an empirical measure of privacy loss to compare the error and privacy of the new approach to that of a simple-random-sampling approach to protecting privacy. Results: We found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets we examined. When run on the 1940 census data, TopDown with a privacy budget of 1.0 was similar in error and privacy loss to that of a simple random sample of 50% of the US population. When run with a privacy budget of 4.0, it was similar in error and privacy loss of a 90% sample. Conclusions: This work fits into the beginning of a discussion on how to best balance privacy and accuracy in decennial census data collection, and there is a need for continued discussion.

Download Full-text

Heterogeneous Gaussian Mechanism: Preserving Differential Privacy in Deep Learning with Provable Robustness

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/660 ◽

2019 ◽

Cited By ~ 1

Author(s):

NhatHai Phan ◽

Minh N. Vu ◽

Yang Liu ◽

Ruoming Jin ◽

Dejing Dou ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Theoretical Analysis ◽

Gaussian Noise ◽

Deep Neural Networks ◽

Differential Privacy ◽

Trade Off ◽

Adversarial Examples ◽

Hidden Layer ◽

Privacy Budget

In this paper, we propose a novel Heterogeneous Gaussian Mechanism (HGM) to preserve differential privacy in deep neural networks, with provable robustness against adversarial examples. We first relax the constraint of the privacy budget in the traditional Gaussian Mechanism from (0, 1] to (0, infty), with a new bound of the noise scale to preserve differential privacy. The noise in our mechanism can be arbitrarily redistributed, offering a distinctive ability to address the trade-off between model utility and privacy loss. To derive provable robustness, our HGM is applied to inject Gaussian noise into the first hidden layer. Then, a tighter robustness bound is proposed. Theoretical analysis and thorough evaluations show that our mechanism notably improves the robustness of differentially private deep neural networks, compared with baseline approaches, under a variety of model attacks.

Download Full-text

Perosonalized Differentially Private Location Collection Method with Adaptive GPS Discretization

Communications in Computer and Information Science - Cyber Security ◽

10.1007/978-981-33-4922-3_13 ◽

2020 ◽

pp. 175-190

Author(s):

Huichuan Liu ◽

Yong Zeng ◽

Jiale Liu ◽

Zhihong Liu ◽

Jianfeng Ma ◽

...

Keyword(s):

Differential Privacy ◽

Geographic Location ◽

Real Data ◽

User Profile ◽

Data Sets ◽

User Privacy ◽

Mobile Terminals ◽

Data Collection Process ◽

Private Location ◽

Privacy Budget

AbstractIn recent years, with the development of mobile terminals, geographic location has attracted the attention of many researchers because of its convenience in collection and its ability to reflect user profile. To protect user privacy, researchers have adopted local differential privacy in data collection process. However, most existing methods assume that location has already been discretized, which we found, if not done carefully, may introduces huge noise, lowering collected result utility. Thus in this paper, we design a differentially private location division module that could automatically discretize locations according to access density of each region. However, as the size of discretized regions may be large, if directly applying existing local differential privacy based attribute method, the overall utility of collected results may be completely destroyed. Thus, we further improve the optimized binary local hash method, based on personalized differential privacy, to collect user visit frequency of each discretized region. This solution improve the accuracy of the collected results while satisfying the privacy of the user’s geographic location. Through experiments on synthetic and real data sets, this paper proves that the proposed method achieves higher accuracy than the best known method under the same privacy budget.

Download Full-text