On Sparse Linear Regression in the Local Differential Privacy Model

Author(s):  
Di Wang ◽  
Jinhui Xu
2019 ◽  
Vol 16 (3) ◽  
pp. 705-731
Author(s):  
Haoze Lv ◽  
Zhaobin Liu ◽  
Zhonglian Hu ◽  
Lihai Nie ◽  
Weijiang Liu ◽  
...  

With the invention of big data era, data releasing is becoming a hot topic in database community. Meanwhile, data privacy also raises the attention of users. As far as the privacy protection models that have been proposed, the differential privacy model is widely utilized because of its many advantages over other models. However, for the private releasing of multi-dimensional data sets, the existing algorithms are publishing data usually with low availability. The reason is that the noise in the released data is rapidly grown as the increasing of the dimensions. In view of this issue, we propose algorithms based on regular and irregular marginal tables of frequent item sets to protect privacy and promote availability. The main idea is to reduce the dimension of the data set, and to achieve differential privacy protection with Laplace noise. First, we propose a marginal table cover algorithm based on frequent items by considering the effectiveness of query cover combination, and then obtain a regular marginal table cover set with smaller size but higher data availability. Then, a differential privacy model with irregular marginal table is proposed in the application scenario with low data availability and high cover rate. Next, we obtain the approximate optimal marginal table cover algorithm by our analysis to get the query cover set which satisfies the multi-level query policy constraint. Thus, the balance between privacy protection and data availability is achieved. Finally, extensive experiments have been done on synthetic and real databases, demonstrating that the proposed method preforms better than state-of-the-art methods in most cases.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Xiang Liu ◽  
Yuchun Guo ◽  
Xiaoying Tan ◽  
Yishuai Chen

Nowadays, a lot of data mining applications, such as web traffic analysis and content popularity prediction, leverage users’ web browsing trajectories to improve their performance. However, the disclosure of web browsing trajectory is the most prominent issue. A novel privacy model, named Differential Privacy, is used to rigorously protect user’s privacy. Some works have applied this privacy model to spatial-temporal streams. However, these works either protect the users’ activities in different places separately or protect their activities in all places jointly. The former one cannot protect trajectories that traverse multiple places; while the latter ignores the differences among places and suffers the degradation of data utility (i.e., data accuracy). In this paper, we propose a w , n -differential privacy to protect any spatial-temporal sequence occurring in w successive timestamps and n -range places. To achieve better data utility, we propose two implementation algorithms, named Spatial-Temporal Budget Distribution (STBD) and Spatial-Temporal RescueDP (STR). Theoretical analysis and experimental results show that these two algorithms can achieve a balance between data utility and trajectory privacy guarantee.


2019 ◽  
Vol 9 (2) ◽  
Author(s):  
Brendan Avent ◽  
Aleksandra Korolova ◽  
David Zeber ◽  
Torgeir Hovden ◽  
Benjamin Livshits

We propose a hybrid model of differential privacy that considers a combination of regular and opt-in users who desire the differential privacy guarantees of the local privacy model and the trusted curator model, respectively. We demonstrate that within this model, it is possible to design a new type of blended algorithm that improves the utility of obtained data, while providing users with their desired privacy guarantees. We apply this algorithm to the task of privately computing the head of the search log and show that the blended approach provides significant improvements in the utility of the data compared to related work. Specifically, on two large search click data sets, comprising 1.75 and 16 GB, respectively, our approach attains NDCG values exceeding 95% across a range of privacy budget values.


2018 ◽  
Vol 27 (03) ◽  
pp. 1850005 ◽  
Author(s):  
Hafiz Asif ◽  
Tanay Talukdar ◽  
Jaideep Vaidya ◽  
Basit Shafiq ◽  
Nabil Adam

Outlier detection is one of the most important data analytics tasks and is used in numerous applications and domains. The goal of outlier detection is to find abnormal entities that are significantly different from the remaining data. Often, the underlying data is distributed across different organizations. If outlier detection is done locally, the results obtained are not as accurate as when outlier detection is done collaboratively over the combined data. However, the data cannot be easily integrated into a single database due to privacy and legal concerns. In this paper, we address precisely this problem. We first define privacy in the context of collaborative outlier detection. We then develop a novel method to find outliers from both horizontally partitioned and vertically partitioned categorical data in a privacy-preserving manner. Our method is based on a scalable outlier detection technique that uses attribute value frequencies. We provide an end-to-end privacy guarantee by using the differential privacy model and secure multiparty computation techniques. Experiments on real data show that our proposed technique is both effective and efficient.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Kok-Seng Wong ◽  
Myung Ho Kim

Advances in both sensor technologies and network infrastructures have encouraged the development of smart environments to enhance people’s life and living styles. However, collecting and storing user’s data in the smart environments pose severe privacy concerns because these data may contain sensitive information about the subject. Hence, privacy protection is now an emerging issue that we need to consider especially when data sharing is essential for analysis purpose. In this paper, we consider the case where two agents in the smart environment want to measure the similarity of their collected or stored data. We use similarity coefficient functionFSCas the measurement metric for the comparison with differential privacy model. Unlike the existing solutions, our protocol can facilitate more than one request to computeFSCwithout modifying the protocol. Our solution ensures privacy protection for both the inputs and the computedFSCresults.


2021 ◽  
Vol 14 (11) ◽  
pp. 2046-2058
Author(s):  
Graham Cormode ◽  
Samuel Maddock ◽  
Carsten Maple

Private collection of statistics from a large distributed population is an important problem, and has led to large scale deployments from several leading technology companies. The dominant approach requires each user to randomly perturb their input, leading to guarantees in the local differential privacy model. In this paper, we place the various approaches that have been suggested into a common framework, and perform an extensive series of experiments to understand the tradeoffs between different implementation choices. Our conclusion is that for the core problems of frequency estimation and heavy hitter identification, careful choice of algorithms can lead to very effective solutions that scale to millions of users.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Gilad Amitai ◽  
Jerome Reiter

In Bayesian regression modeling, often analysts summarize inferences using posterior probabilities and quantiles, such as the posterior probability that a coefficient exceeds zero or the posterior median of that coefficient. However, with potentially unbounded outcomes and explanatory variables, regression inferences based on typical prior distributions can be sensitive to values of individual data points. Thus, releasing posterior summaries of regression coefficients can result in disclosure risks. In this article, we propose some differentially private algorithms for reporting posterior probabilities and posterior quantiles of linear regression coefficients. The algorithms use the general strategy of subsample and aggregate, a technique that requires randomly partitioning the data into disjoint subsets, estimating the regression within each subset, and combining results in ways that satisfy differential privacy.  We illustrate the performance of some of the algorithms using repeated sampling studies. The non-private versions also can be used for Bayesian inference with big data in non-private settings.


2019 ◽  
Vol 19 (5) ◽  
pp. 537-545
Author(s):  
Vicenç Torra

Abstract Social choice provides methods for collective decisions. They include methods for voting and for aggregating rankings. These methods are used in multiagent systems for similar purposes when decisions are to be made by agents. Votes and rankings are sensitive information. Because of that, privacy mechanisms are needed to avoid the disclosure of sensitive information. Cryptographic techniques can be applied in centralized environments to avoid the disclosure of sensitive information. A trusted third party can then compute the outcome. In distributed environments, we can use a secure multiparty computation approach for implementing a collective decision method. Other privacy models exist. Differential privacy and k-anonymity are two of them. They provide privacy guarantees that are complementary to multiparty computation approaches, and solutions that can be combined with the cryptographic ones, thus providing additional privacy guarantees, e.g., a differentially private multiparty computation model. In this paper, we propose the use of probabilistic social choice methods to achieve differential privacy. We use the method called random dictatorship and prove that under some circumstances differential privacy is satisfied and propose a variation that is always compliant with this privacy model. Our approach can be implemented using a centralized approach and also a decentralized approach. We briefly discuss these implementations.


Sign in / Sign up

Export Citation Format

Share Document