On Sparse Linear Regression in the Local Differential Privacy Model

With the invention of big data era, data releasing is becoming a hot topic in database community. Meanwhile, data privacy also raises the attention of users. As far as the privacy protection models that have been proposed, the differential privacy model is widely utilized because of its many advantages over other models. However, for the private releasing of multi-dimensional data sets, the existing algorithms are publishing data usually with low availability. The reason is that the noise in the released data is rapidly grown as the increasing of the dimensions. In view of this issue, we propose algorithms based on regular and irregular marginal tables of frequent item sets to protect privacy and promote availability. The main idea is to reduce the dimension of the data set, and to achieve differential privacy protection with Laplace noise. First, we propose a marginal table cover algorithm based on frequent items by considering the effectiveness of query cover combination, and then obtain a regular marginal table cover set with smaller size but higher data availability. Then, a differential privacy model with irregular marginal table is proposed in the application scenario with low data availability and high cover rate. Next, we obtain the approximate optimal marginal table cover algorithm by our analysis to get the query cover set which satisfies the multi-level query policy constraint. Thus, the balance between privacy protection and data availability is achieved. Finally, extensive experiments have been done on synthetic and real databases, demonstrating that the proposed method preforms better than state-of-the-art methods in most cases.

Download Full-text

Differentially Private Web Browsing Trajectory over Infinite Streams

Security and Communication Networks ◽

10.1155/2021/9968905 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Xiang Liu ◽

Yuchun Guo ◽

Xiaoying Tan ◽

Yishuai Chen

Keyword(s):

Data Mining ◽

Theoretical Analysis ◽

Differential Privacy ◽

Data Accuracy ◽

Web Browsing ◽

Web Traffic ◽

Data Utility ◽

Privacy Model ◽

Content Popularity ◽

Popularity Prediction

Nowadays, a lot of data mining applications, such as web traffic analysis and content popularity prediction, leverage users’ web browsing trajectories to improve their performance. However, the disclosure of web browsing trajectory is the most prominent issue. A novel privacy model, named Differential Privacy, is used to rigorously protect user’s privacy. Some works have applied this privacy model to spatial-temporal streams. However, these works either protect the users’ activities in different places separately or protect their activities in all places jointly. The former one cannot protect trajectories that traverse multiple places; while the latter ignores the differences among places and suffers the degradation of data utility (i.e., data accuracy). In this paper, we propose a w , n -differential privacy to protect any spatial-temporal sequence occurring in w successive timestamps and n -range places. To achieve better data utility, we propose two implementation algorithms, named Spatial-Temporal Budget Distribution (STBD) and Spatial-Temporal RescueDP (STR). Theoretical analysis and experimental results show that these two algorithms can achieve a balance between data utility and trajectory privacy guarantee.

Download Full-text

BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model

Journal of Privacy and Confidentiality ◽

10.29012/jpc.680 ◽

2019 ◽

Vol 9 (2) ◽

Author(s):

Brendan Avent ◽

Aleksandra Korolova ◽

David Zeber ◽

Torgeir Hovden ◽

Benjamin Livshits

Keyword(s):

Local Search ◽

Hybrid Model ◽

Differential Privacy ◽

Data Sets ◽

Privacy Model ◽

New Type ◽

Privacy Budget

We propose a hybrid model of differential privacy that considers a combination of regular and opt-in users who desire the differential privacy guarantees of the local privacy model and the trusted curator model, respectively. We demonstrate that within this model, it is possible to design a new type of blended algorithm that improves the utility of obtained data, while providing users with their desired privacy guarantees. We apply this algorithm to the task of privately computing the head of the search log and show that the blended approach provides significant improvements in the utility of the data compared to related work. Specifically, on two large search click data sets, comprising 1.75 and 16 GB, respectively, our approach attains NDCG values exceeding 95% across a range of privacy budget values.

Download Full-text

Differentially Private Outlier Detection in a Collaborative Environment

International Journal of Cooperative Information Systems ◽

10.1142/s0218843018500053 ◽

2018 ◽

Vol 27 (03) ◽

pp. 1850005 ◽

Cited By ~ 3

Author(s):

Hafiz Asif ◽

Tanay Talukdar ◽

Jaideep Vaidya ◽

Basit Shafiq ◽

Nabil Adam

Keyword(s):

Outlier Detection ◽

Categorical Data ◽

Data Analytics ◽

Differential Privacy ◽

Real Data ◽

Important Data ◽

Collaborative Environment ◽

Privacy Model ◽

Novel Method ◽

Combined Data

Outlier detection is one of the most important data analytics tasks and is used in numerous applications and domains. The goal of outlier detection is to find abnormal entities that are significantly different from the remaining data. Often, the underlying data is distributed across different organizations. If outlier detection is done locally, the results obtained are not as accurate as when outlier detection is done collaboratively over the combined data. However, the data cannot be easily integrated into a single database due to privacy and legal concerns. In this paper, we address precisely this problem. We first define privacy in the context of collaborative outlier detection. We then develop a novel method to find outliers from both horizontally partitioned and vertically partitioned categorical data in a privacy-preserving manner. Our method is based on a scalable outlier detection technique that uses attribute value frequencies. We provide an end-to-end privacy guarantee by using the differential privacy model and secure multiparty computation techniques. Experiments on real data show that our proposed technique is both effective and efficient.

Download Full-text

Preserving Differential Privacy for Similarity Measurement in Smart Environments

The Scientific World JOURNAL ◽

10.1155/2014/581426 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Kok-Seng Wong ◽

Myung Ho Kim

Keyword(s):

Privacy Protection ◽

Differential Privacy ◽

Smart Environments ◽

Coefficient Function ◽

Sensitive Information ◽

Smart Environment ◽

Privacy Concerns ◽

Privacy Model ◽

The Subject ◽

Measurement Metric

Advances in both sensor technologies and network infrastructures have encouraged the development of smart environments to enhance people’s life and living styles. However, collecting and storing user’s data in the smart environments pose severe privacy concerns because these data may contain sensitive information about the subject. Hence, privacy protection is now an emerging issue that we need to consider especially when data sharing is essential for analysis purpose. In this paper, we consider the case where two agents in the smart environment want to measure the similarity of their collected or stored data. We use similarity coefficient functionFSCas the measurement metric for the comparison with differential privacy model. Unlike the existing solutions, our protocol can facilitate more than one request to computeFSCwithout modifying the protocol. Our solution ensures privacy protection for both the inputs and the computedFSCresults.

Download Full-text

Research on Government Data Publishing Based on Differential Privacy Model

2017 IEEE 14th International Conference on e-Business Engineering (ICEBE) ◽

10.1109/icebe.2017.21 ◽

2017 ◽

Cited By ~ 1

Author(s):

Chunhui Piao ◽

Yajuan Shi ◽

Yunzuo Zhang ◽

Xuehong Jiang

Keyword(s):

Differential Privacy ◽

Data Publishing ◽

Privacy Model ◽

Government Data

Download Full-text

Frequency estimation under local differential privacy

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476261 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2046-2058

Author(s):

Graham Cormode ◽

Samuel Maddock ◽

Carsten Maple

Keyword(s):

Large Scale ◽

Differential Privacy ◽

Frequency Estimation ◽

Careful Choice ◽

The Core ◽

Private Collection ◽

Privacy Model ◽

Common Framework ◽

Heavy Hitter ◽

Series Of Experiments

Private collection of statistics from a large distributed population is an important problem, and has led to large scale deployments from several leading technology companies. The dominant approach requires each user to randomly perturb their input, leading to guarantees in the local differential privacy model. In this paper, we place the various approaches that have been suggested into a common framework, and perform an extensive series of experiments to understand the tradeoffs between different implementation choices. Our conclusion is that for the core problems of frequency estimation and heavy hitter identification, careful choice of algorithms can lead to very effective solutions that scale to millions of users.

Download Full-text

Differentially private posterior summaries for linear regression coefficients

Journal of Privacy and Confidentiality ◽

10.29012/jpc.683 ◽

2018 ◽

Vol 8 (1) ◽

Author(s):

Gilad Amitai ◽

Jerome Reiter

Keyword(s):

Linear Regression ◽

Differential Privacy ◽

Bayesian Regression ◽

Regression Coefficients ◽

General Strategy ◽

Posterior Probabilities ◽

Sampling Studies ◽

Explanatory Variables ◽

Repeated Sampling ◽

Data Points

In Bayesian regression modeling, often analysts summarize inferences using posterior probabilities and quantiles, such as the posterior probability that a coefficient exceeds zero or the posterior median of that coefficient. However, with potentially unbounded outcomes and explanatory variables, regression inferences based on typical prior distributions can be sensitive to values of individual data points. Thus, releasing posterior summaries of regression coefficients can result in disclosure risks. In this article, we propose some differentially private algorithms for reporting posterior probabilities and posterior quantiles of linear regression coefficients. The algorithms use the general strategy of subsample and aggregate, a technique that requires randomly partitioning the data into disjoint subsets, estimating the regression within each subset, and combining results in ways that satisfy differential privacy. We illustrate the performance of some of the algorithms using repeated sampling studies. The non-private versions also can be used for Bayesian inference with big data in non-private settings.

Download Full-text

Random dictatorship for privacy-preserving social choice

International Journal of Information Security ◽

10.1007/s10207-019-00474-7 ◽

2019 ◽

Vol 19 (5) ◽

pp. 537-545

Author(s):

Vicenç Torra

Keyword(s):

Social Choice ◽

Differential Privacy ◽

Third Party ◽

Sensitive Information ◽

Multiparty Computation ◽

Computation Model ◽

Decision Method ◽

Collective Decisions ◽

Privacy Model ◽

Cryptographic Techniques

Abstract Social choice provides methods for collective decisions. They include methods for voting and for aggregating rankings. These methods are used in multiagent systems for similar purposes when decisions are to be made by agents. Votes and rankings are sensitive information. Because of that, privacy mechanisms are needed to avoid the disclosure of sensitive information. Cryptographic techniques can be applied in centralized environments to avoid the disclosure of sensitive information. A trusted third party can then compute the outcome. In distributed environments, we can use a secure multiparty computation approach for implementing a collective decision method. Other privacy models exist. Differential privacy and k-anonymity are two of them. They provide privacy guarantees that are complementary to multiparty computation approaches, and solutions that can be combined with the cryptographic ones, thus providing additional privacy guarantees, e.g., a differentially private multiparty computation model. In this paper, we propose the use of probabilistic social choice methods to achieve differential privacy. We use the method called random dictatorship and prove that under some circumstances differential privacy is satisfied and propose a variation that is always compliant with this privacy model. Our approach can be implemented using a centralized approach and also a decentralized approach. We briefly discuss these implementations.

Download Full-text

Principal Component Analysis in the local differential privacy model

Theoretical Computer Science ◽

10.1016/j.tcs.2019.12.019 ◽

2020 ◽

Vol 809 ◽

pp. 296-312 ◽

Cited By ~ 1

Author(s):

Di Wang ◽

Jinhui Xu

Keyword(s):

Principal Component Analysis ◽

Differential Privacy ◽

Principal Component ◽

Component Analysis ◽

Privacy Model

Download Full-text