scholarly journals Differentially private data aggregating with relative error constraint

Author(s):  
Hao Wang ◽  
Xiao Peng ◽  
Yihang Xiao ◽  
Zhengquan Xu ◽  
Xian Chen

AbstractPrivacy preserving methods supporting for data aggregating have attracted the attention of researchers in multidisciplinary fields. Among the advanced methods, differential privacy (DP) has become an influential privacy mechanism owing to its rigorous privacy guarantee and high data utility. But DP has no limitation on the bound of noise, leading to a low-level utility. Recently, researchers investigate how to preserving rigorous privacy guarantee while limiting the relative error to a fixed bound. However, these schemes destroy the statistical properties, including the mean, variance and MSE, which are the foundational elements for data aggregating and analyzing. In this paper, we explore the optimal privacy preserving solution, including novel definitions and implementing mechanisms, to maintain the statistical properties while satisfying DP with a fixed relative error bound. Experimental evaluation demonstrates that our mechanism outperforms current schemes in terms of security and utility for large quantities of queries.

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Yuye Wang ◽  
Jing Yang ◽  
Jianpei Zhan

Vertex attributes exert huge impacts on the analysis of social networks. Since the attributes are often sensitive, it is necessary to seek effective ways to protect the privacy of graphs with correlated attributes. Prior work has focused mainly on the graph topological structure and the attributes, respectively, and combining them together by defining the relevancy between them. However, these methods need to add noise to them, respectively, and they produce a large number of required noise and reduce the data utility. In this paper, we introduce an approach to release graphs with correlated attributes under differential privacy based on early fusion. We combine the graph topological structure and the attributes together with a private probability model and generate a synthetic network satisfying differential privacy. We conduct extensive experiments to demonstrate that our approach could meet the request of attributed networks and achieve high data utility.


2013 ◽  
Vol 7 (9) ◽  
pp. 1399-1411 ◽  
Author(s):  
Jun Yang ◽  
Bin Wang ◽  
Xiaochun Yang ◽  
Hongyi Zhang ◽  
Guang Xiang

2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Haoran Li ◽  
Li Xiong ◽  
Lucila Ohno-Machado ◽  
Xiaoqian Jiang

Data sharing is challenging but important for healthcare research. Methods for privacy-preserving data dissemination based on the rigorous differential privacy standard have been developed but they did not consider the characteristics of biomedical data and make full use of the available information. This often results in too much noise in the final outputs. We hypothesized that this situation can be alleviated by leveraging a small portion of open-consented data to improve utility without sacrificing privacy. We developed a hybrid privacy-preserving differentially private support vector machine (SVM) model that uses public data and private data together. Our model leverages the RBF kernel and can handle nonlinearly separable cases. Experiments showed that this approach outperforms two baselines: (1) SVMs that only use public data, and (2) differentially private SVMs that are built from private data. Our method demonstrated very close performance metrics compared to nonprivate SVMs trained on the private data.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Farhad Farokhi

AbstractLocal differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parametric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets.


Author(s):  
Artrim Kjamilji

Nowadays many different entities collect data of the same nature, but in slightly different environments. In this sense different hospitals collect data about their patients’ symptoms and corresponding disease diagnoses, different banks collect transactions of their customers’ bank accounts, multiple cyber-security companies collect data about log files and corresponding attacks, etc. It is shown that if those different entities would merge their privately collected data in a single dataset and use it to train a machine learning (ML) model, they often end up with a trained model that outperforms the human experts of the corresponding fields in terms of accurate predictions. However, there is a drawback. Due to privacy concerns, empowered by laws and ethical reasons, no entity is willing to share with others their privately collected data. The same problem appears during the classification case over an already trained ML model. On one hand, a user that has an unclassified query (record), doesn’t want to share with the server that owns the trained model neither the content of the query (which might contain private data such as credit card number, IP address, etc.), nor the final prediction (classification) of the query. On the other hand, the owner of the trained model doesn’t want to leak any parameter of the trained model to the user. In order to overcome those shortcomings, several cryptographic and probabilistic techniques have been proposed during the last few years to enable both privacy preserving training and privacy preserving classification schemes. Some of them include anonymization and k-anonymity, differential privacy, secure multiparty computation (MPC), federated learning, Private Information Retrieval (PIR), Oblivious Transfer (OT), garbled circuits and/or homomorphic encryption, to name a few. Theoretical analyses and experimental results show that the current privacy preserving schemes are suitable for real-case deployment, while the accuracy of most of them differ little or not at all with the schemes that work in non-privacy preserving fashion.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Jing Yang ◽  
Yuye Wang ◽  
Jianpei Zhang

Releasing evolving networks which contain sensitive information could compromise individual privacy. In this paper, we study the problem of releasing evolving networks under differential privacy. We explore the possibility of designing a differentially private evolving networks releasing algorithm. We found that the majority of traditional methods provide a snapshot of the networks under differential privacy over a brief period of time. As the network structure only changes in local part, the amount of required noise entirely is large and it leads to an inefficient utility. To this end, we propose GHRG-DP, a novel differentially private evolving networks releasing algorithm which reduces the noise scale and achieves high data utility. In the GHRG-DP algorithm, we learn the online connection probabilities between vertices in the evolving networks by generalized hierarchical random graph (GHRG) model. To fit the dynamic environment, a dendrogram structure adjusting method in local areas is proposed to reduce the noise scale in the whole period of time. Moreover, to avoid the unhelpful outcome of the connection probabilities, a Bayesian noisy probabilities calculating method is proposed. Through formal privacy analysis, we show that the GHRG-DP algorithm is ε -differentially private. Experiments on real evolving network datasets illustrate that GHRG-DP algorithm can privately release evolving networks with high accuracy.


Author(s):  
Zhiqiang Gao ◽  
Yixiao Sun ◽  
Xiaolong Cui ◽  
Yutao Wang ◽  
Yanyu Duan ◽  
...  

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.


2018 ◽  
Vol 14 (2) ◽  
pp. 1-17 ◽  
Author(s):  
Zhiqiang Gao ◽  
Yixiao Sun ◽  
Xiaolong Cui ◽  
Yutao Wang ◽  
Yanyu Duan ◽  
...  

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.


2021 ◽  
Vol 2022 (1) ◽  
pp. 481-500
Author(s):  
Xue Jiang ◽  
Xuebing Zhou ◽  
Jens Grossklags

Abstract Business intelligence and AI services often involve the collection of copious amounts of multidimensional personal data. Since these data usually contain sensitive information of individuals, the direct collection can lead to privacy violations. Local differential privacy (LDP) is currently considered a state-ofthe-art solution for privacy-preserving data collection. However, existing LDP algorithms are not applicable to high-dimensional data; not only because of the increase in computation and communication cost, but also poor data utility. In this paper, we aim at addressing the curse-of-dimensionality problem in LDP-based high-dimensional data collection. Based on the idea of machine learning and data synthesis, we propose DP-Fed-Wae, an efficient privacy-preserving framework for collecting high-dimensional categorical data. With the combination of a generative autoencoder, federated learning, and differential privacy, our framework is capable of privately learning the statistical distributions of local data and generating high utility synthetic data on the server side without revealing users’ private information. We have evaluated the framework in terms of data utility and privacy protection on a number of real-world datasets containing 68–124 classification attributes. We show that our framework outperforms the LDP-based baseline algorithms in capturing joint distributions and correlations of attributes and generating high-utility synthetic data. With a local privacy guarantee ∈ = 8, the machine learning models trained with the synthetic data generated by the baseline algorithm cause an accuracy loss of 10% ~ 30%, whereas the accuracy loss is significantly reduced to less than 3% and at best even less than 1% with our framework. Extensive experimental results demonstrate the capability and efficiency of our framework in synthesizing high-dimensional data while striking a satisfactory utility-privacy balance.


Cloud server aggregates a large amount of genome data from multi genome donors to facilitate scientific research. However, the untrusted cloud server is prone to violate privacy of aggregating genome data. Thus, each genome donor can randomly perturb her genome data using differential privacy mechanism before aggregating. But this is easy to lead to utility disaster of aggregating genome data due to the different privacy preferences of each genome donor, and privacy leakage of aggregating genome data because of the kinship between genome donors. The key challenge here is to achieve an equilibrium between privacy preserving and data utility of aggregating multiparty genome data. To this end, we proposed federated aggregation protocol of multiparty genome data (MGD-FAP) with privacy-utility equilibrium for guaranteeing desired privacy protection and desired data utility. First, we regarded the privacy budget and the accuracy as the desired privacy-utility metrics of genome data respectively. Second, we constructed the federated aggregation model of multiparty genome data by combining random perturbation method of genome data guaranteeing desired data utility with federated comparing update method of local privacy budget achieving desired privacy preserving. Third, we presented the MGD-FAP maintaining privacy-utility equilibrium under the federated aggregation model of multiparty genome data. Finally, our theoretical and experimental analysis showed that MGD-FAP can maintain privacy-utility equilibrium. The MGD-FAP is practical and feasible to ensure the privacy-utility equilibrium of cloud server aggregating multiparty genome data.


Sign in / Sign up

Export Citation Format

Share Document