Multi-Party High-Dimensional Data Publishing Under Differential Privacy

2020 ◽  
Vol 32 (8) ◽  
pp. 1557-1571 ◽  
Author(s):  
Xiang Cheng ◽  
Peng Tang ◽  
Sen Su ◽  
Rui Chen ◽  
Zequn Wu ◽  
...  
Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2516
Author(s):  
Chunhua Ju ◽  
Qiuyang Gu ◽  
Gongxing Wu ◽  
Shuangzhu Zhang

Although the Crowd-Sensing perception system brings great data value to people through the release and analysis of high-dimensional perception data, it causes great hidden danger to the privacy of participants in the meantime. Currently, various privacy protection methods based on differential privacy have been proposed, but most of them cannot simultaneously solve the complex attribute association problem between high-dimensional perception data and the privacy threat problems from untrustworthy servers. To address this problem, we put forward a local privacy protection based on Bayes network for high-dimensional perceptual data in this paper. This mechanism realizes the local data protection of the users at the very beginning, eliminates the possibility of other parties directly accessing the user’s original data, and fundamentally protects the user’s data privacy. During this process, after receiving the data of the user’s local privacy protection, the perception server recognizes the dimensional correlation of the high-dimensional data based on the Bayes network, divides the high-dimensional data attribute set into multiple relatively independent low-dimensional attribute sets, and then sequentially synthesizes the new dataset. It can effectively retain the attribute dimension correlation of the original perception data, and ensure that the synthetic dataset and the original dataset have as similar statistical characteristics as possible. To verify its effectiveness, we conduct a multitude of simulation experiments. Results have shown that the synthetic data of this mechanism under the effective local privacy protection has relatively high data utility.


2021 ◽  
Author(s):  
Syed Usama Khalid Bukhari ◽  
Anum Qureshi ◽  
Adeel Anjum ◽  
Munam Ali Shah

<div> <div> <div> <p>Privacy preservation of high-dimensional healthcare data is an emerging problem. Privacy breaches are becoming more common than before and affecting thousands of people. Every individual has sensitive and personal information which needs protection and security. Uploading and storing data directly to the cloud without taking any precautions can lead to serious privacy breaches. It’s a serious struggle to publish a large amount of sensitive data while minimizing privacy concerns. This leads us to make crucial decisions for the privacy of outsourced high-dimensional healthcare data. Many types of privacy preservation techniques have been presented to secure high-dimensional data while keeping its utility and privacy at the same time but every technique has its pros and cons. In this paper, a novel privacy preservation NRPP model for high-dimensional data is proposed. The model uses a privacy-preserving generative technique for releasing sensitive data, which is deferentially private. The contribution of this paper is twofold. First, a state-of-the-art anonymization model for high-dimensional healthcare data is proposed using a generative technique. Second, achieved privacy is evaluated using the concept of differential privacy. The experiment shows that the proposed model performs better in terms of utility. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Syed Usama Khalid Bukhari ◽  
Anum Qureshi ◽  
Adeel Anjum ◽  
Munam Ali Shah

<div> <div> <div> <p>Privacy preservation of high-dimensional healthcare data is an emerging problem. Privacy breaches are becoming more common than before and affecting thousands of people. Every individual has sensitive and personal information which needs protection and security. Uploading and storing data directly to the cloud without taking any precautions can lead to serious privacy breaches. It’s a serious struggle to publish a large amount of sensitive data while minimizing privacy concerns. This leads us to make crucial decisions for the privacy of outsourced high-dimensional healthcare data. Many types of privacy preservation techniques have been presented to secure high-dimensional data while keeping its utility and privacy at the same time but every technique has its pros and cons. In this paper, a novel privacy preservation NRPP model for high-dimensional data is proposed. The model uses a privacy-preserving generative technique for releasing sensitive data, which is deferentially private. The contribution of this paper is twofold. First, a state-of-the-art anonymization model for high-dimensional healthcare data is proposed using a generative technique. Second, achieved privacy is evaluated using the concept of differential privacy. The experiment shows that the proposed model performs better in terms of utility. </p> </div> </div> </div>


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 176429-176437 ◽  
Author(s):  
Wanjie Li ◽  
Xing Zhang ◽  
Xiaohui Li ◽  
Guanghui Cao ◽  
Qingyun Zhang

2021 ◽  
Vol 17 (12) ◽  
pp. 155014772110599
Author(s):  
Lin Wang ◽  
Xingang Xu ◽  
Xuhui Zhao ◽  
Baozhu Li ◽  
Ruijuan Zheng ◽  
...  

Policy gradient methods are effective means to solve the problems of mobile multimedia data transmission in Content Centric Networks. Current policy gradient algorithms impose high computational cost in processing high-dimensional data. Meanwhile, the issue of privacy disclosure has not been taken into account. However, privacy protection is important in data training. Therefore, we propose a randomized block policy gradient algorithm with differential privacy. In order to reduce computational complexity when processing high-dimensional data, we randomly select a block coordinate to update the gradients at each round. To solve the privacy protection problem, we add a differential privacy protection mechanism to the algorithm, and we prove that it preserves the [Formula: see text]-privacy level. We conduct extensive simulations in four environments, which are CartPole, Walker, HalfCheetah, and Hopper. Compared with the methods such as important-sampling momentum-based policy gradient, Hessian-Aided momentum-based policy gradient, REINFORCE, the experimental results of our algorithm show a faster convergence rate than others in the same environment.


2020 ◽  
Vol 93 ◽  
pp. 101785
Author(s):  
Rong Wang ◽  
Yan Zhu ◽  
Chin-Chen Chang ◽  
Qiang Peng

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Weisan Wu

In this paper, we give a modified gradient EM algorithm; it can protect the privacy of sensitive data by adding discrete Gaussian mechanism noise. Specifically, it makes the high-dimensional data easier to process mainly by scaling, truncating, noise multiplication, and smoothing steps on the data. Since the variance of discrete Gaussian is smaller than that of the continuous Gaussian, the difference privacy of data can be guaranteed more effectively by adding the noise of the discrete Gaussian mechanism. Finally, the standard gradient EM algorithm, clipped algorithm, and our algorithm (DG-EM) are compared with the GMM model. The experiments show that our algorithm can effectively protect high-dimensional sensitive data.


2021 ◽  
Vol 2022 (1) ◽  
pp. 481-500
Author(s):  
Xue Jiang ◽  
Xuebing Zhou ◽  
Jens Grossklags

Abstract Business intelligence and AI services often involve the collection of copious amounts of multidimensional personal data. Since these data usually contain sensitive information of individuals, the direct collection can lead to privacy violations. Local differential privacy (LDP) is currently considered a state-ofthe-art solution for privacy-preserving data collection. However, existing LDP algorithms are not applicable to high-dimensional data; not only because of the increase in computation and communication cost, but also poor data utility. In this paper, we aim at addressing the curse-of-dimensionality problem in LDP-based high-dimensional data collection. Based on the idea of machine learning and data synthesis, we propose DP-Fed-Wae, an efficient privacy-preserving framework for collecting high-dimensional categorical data. With the combination of a generative autoencoder, federated learning, and differential privacy, our framework is capable of privately learning the statistical distributions of local data and generating high utility synthetic data on the server side without revealing users’ private information. We have evaluated the framework in terms of data utility and privacy protection on a number of real-world datasets containing 68–124 classification attributes. We show that our framework outperforms the LDP-based baseline algorithms in capturing joint distributions and correlations of attributes and generating high-utility synthetic data. With a local privacy guarantee ∈ = 8, the machine learning models trained with the synthetic data generated by the baseline algorithm cause an accuracy loss of 10% ~ 30%, whereas the accuracy loss is significantly reduced to less than 3% and at best even less than 1% with our framework. Extensive experimental results demonstrate the capability and efficiency of our framework in synthesizing high-dimensional data while striking a satisfactory utility-privacy balance.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Yiyang Hong ◽  
Xingwen Zhao ◽  
Hui Zhu ◽  
Hui Li

With the rapid development of information technology, people benefit more and more from big data. At the same time, it becomes a great concern that how to obtain optimal outputs from big data publishing and sharing management while protecting privacy. Many researchers seek to realize differential privacy protection in massive high-dimensional datasets using the method of principal component analysis. However, these algorithms are inefficient in processing and do not take into account the different privacy protection needs of each attribute in high-dimensional datasets. To address the above problem, we design a Divided-block Sparse Matrix Transformation Differential Privacy Data Publishing Algorithm (DSMT-DP). In this algorithm, different levels of privacy budget parameters are assigned to different attributes according to the required privacy protection level of each attribute, taking into account the privacy protection needs of different levels of attributes. Meanwhile, the use of the divided-block scheme and the sparse matrix transformation scheme can improve the computational efficiency of the principal component analysis method for handling large amounts of high-dimensional sensitive data, and we demonstrate that the proposed algorithm satisfies differential privacy. Our experimental results show that the mean square error of the proposed algorithm is smaller than the traditional differential privacy algorithm with the same privacy parameters, and the computational efficiency can be improved. Further, we combine this algorithm with blockchain and propose an Efficient Privacy Data Publishing and Sharing Model based on the blockchain. Publishing and sharing private data on this model not only resist strong background knowledge attacks from adversaries outside the system but also prevent stealing and tampering of data by not-completely-honest participants inside the system.


Sign in / Sign up

Export Citation Format

Share Document