scholarly journals Privacy-aware Synthesizing for Crowdsourced Data

Author(s):  
Mengdi Huai ◽  
Di Wang ◽  
Chenglin Miao ◽  
Jinhui Xu ◽  
Aidong Zhang

Although releasing crowdsourced data brings many benefits to the data analyzers to conduct statistical analysis, it may violate crowd users' data privacy. A potential way to address this problem is to employ traditional differential privacy (DP) mechanisms and perturb the data with some noise before releasing them. However, considering that there usually exist conflicts among the crowdsourced data and these data are usually large in volume, directly using these mechanisms can not guarantee good utility in the setting of releasing crowdsourced data. To address this challenge, in this paper, we propose a novel privacy-aware synthesizing method (i.e., PrisCrowd) for crowdsourced data, based on which the data collector can release users' data with strong privacy protection for their private information, while at the same time, the data analyzer can achieve good utility from the released data. Both theoretical analysis and extensive experiments on real-world datasets demonstrate the desired performance of the proposed method.

Author(s):  
Zhiyu Xue ◽  
Shaoyang Yang ◽  
Mengdi Huai ◽  
Di Wang

Instead of learning with pointwise loss functions, learning with pairwise loss functions (pairwise learning) has received much attention recently as it is more capable of modeling the relative relationship between pairs of samples. However, most of the existing algorithms for pairwise learning fail to take into consideration the privacy issue in their design. To address this issue, previous work studied pairwise learning in the Differential Privacy (DP) model. However, their utilities (population errors) are far from optimal. To address the sub-optimal utility issue, in this paper, we proposed new pure or approximate DP algorithms for pairwise learning. Specifically, under the assumption that the loss functions are Lipschitz, our algorithms could achieve the optimal expected population risk for both strongly convex and general convex cases. We also conduct extensive experiments on real-world datasets to evaluate the proposed algorithms, experimental results support our theoretical analysis and show the priority of our algorithms.


2019 ◽  
Vol 2019 (3) ◽  
pp. 170-190
Author(s):  
Archita Agarwal ◽  
Maurice Herlihy ◽  
Seny Kamara ◽  
Tarik Moataz

Abstract The problem of privatizing statistical databases is a well-studied topic that has culminated with the notion of differential privacy. The complementary problem of securing these differentially private databases, however, has—as far as we know—not been considered in the past. While the security of private databases is in theory orthogonal to the problem of private statistical analysis (e.g., in the central model of differential privacy the curator is trusted) the recent real-world deployments of differentially-private systems suggest that it will become a problem of increasing importance. In this work, we consider the problem of designing encrypted databases (EDB) that support differentially-private statistical queries. More precisely, these EDBs should support a set of encrypted operations with which a curator can securely query and manage its data, and a set of private operations with which an analyst can privately analyze the data. Using such an EDB, a curator can securely outsource its database to an untrusted server (e.g., on-premise or in the cloud) while still allowing an analyst to privately query it. We show how to design an EDB that supports private histogram queries. As a building block, we introduce a differentially-private encrypted counter based on the binary mechanism of Chan et al. (ICALP, 2010). We then carefully combine multiple instances of this counter with a standard encrypted database scheme to support differentially-private histogram queries.


2019 ◽  
Vol 2019 (1) ◽  
pp. 26-46 ◽  
Author(s):  
Thee Chanyaswad ◽  
Changchang Liu ◽  
Prateek Mittal

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.


2019 ◽  
Vol 16 (3) ◽  
pp. 705-731
Author(s):  
Haoze Lv ◽  
Zhaobin Liu ◽  
Zhonglian Hu ◽  
Lihai Nie ◽  
Weijiang Liu ◽  
...  

With the invention of big data era, data releasing is becoming a hot topic in database community. Meanwhile, data privacy also raises the attention of users. As far as the privacy protection models that have been proposed, the differential privacy model is widely utilized because of its many advantages over other models. However, for the private releasing of multi-dimensional data sets, the existing algorithms are publishing data usually with low availability. The reason is that the noise in the released data is rapidly grown as the increasing of the dimensions. In view of this issue, we propose algorithms based on regular and irregular marginal tables of frequent item sets to protect privacy and promote availability. The main idea is to reduce the dimension of the data set, and to achieve differential privacy protection with Laplace noise. First, we propose a marginal table cover algorithm based on frequent items by considering the effectiveness of query cover combination, and then obtain a regular marginal table cover set with smaller size but higher data availability. Then, a differential privacy model with irregular marginal table is proposed in the application scenario with low data availability and high cover rate. Next, we obtain the approximate optimal marginal table cover algorithm by our analysis to get the query cover set which satisfies the multi-level query policy constraint. Thus, the balance between privacy protection and data availability is achieved. Finally, extensive experiments have been done on synthetic and real databases, demonstrating that the proposed method preforms better than state-of-the-art methods in most cases.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2516
Author(s):  
Chunhua Ju ◽  
Qiuyang Gu ◽  
Gongxing Wu ◽  
Shuangzhu Zhang

Although the Crowd-Sensing perception system brings great data value to people through the release and analysis of high-dimensional perception data, it causes great hidden danger to the privacy of participants in the meantime. Currently, various privacy protection methods based on differential privacy have been proposed, but most of them cannot simultaneously solve the complex attribute association problem between high-dimensional perception data and the privacy threat problems from untrustworthy servers. To address this problem, we put forward a local privacy protection based on Bayes network for high-dimensional perceptual data in this paper. This mechanism realizes the local data protection of the users at the very beginning, eliminates the possibility of other parties directly accessing the user’s original data, and fundamentally protects the user’s data privacy. During this process, after receiving the data of the user’s local privacy protection, the perception server recognizes the dimensional correlation of the high-dimensional data based on the Bayes network, divides the high-dimensional data attribute set into multiple relatively independent low-dimensional attribute sets, and then sequentially synthesizes the new dataset. It can effectively retain the attribute dimension correlation of the original perception data, and ensure that the synthetic dataset and the original dataset have as similar statistical characteristics as possible. To verify its effectiveness, we conduct a multitude of simulation experiments. Results have shown that the synthetic data of this mechanism under the effective local privacy protection has relatively high data utility.


2019 ◽  
Vol 34 (3) ◽  
Author(s):  
Fanny Priscyllia

Perkembangan teknologi informasi dan komunikasi, salah satunya internet (interconnection networking). Informasi berupa data pribadi menjadi acuan dalam penggunaan aplikasi berbasis internet seperti e-commerce, e-health, e-payment, serta perkembangan cloud computing (ruang penyimpanan data seperti google drive, iCloud, Youtube). Privasi data pribadi merupakan hal penting karena menyangkut harga diri dan kebebasan berekspresi seseorang. Perlindungan privasi data pribadi jika tidak diatur dalam suatu peraturan perundang-undangan dapat mengakibatkan kerugian bagi seseorang atas tersebarnya suatu informasi pribadi. Kajian ini bertujuan untuk mendisukusikan konsep perlindungan privasi data pribadi serta pengaturannya dalam perspektif perbandingan hukum. Kajian ini menggunakan penelitian hukum normatif yang meneliti dan menganalisis sumber-sumber hukum. Hasil studi menunjukkan bahwa konsep perlindungan hak privasi merupakan hak penuh seseorang dan pemenuhannya tidak didasarkan pada hak orang lain, tetapi hak tersebut dapat hilang apabila dikehendaki oleh pemiliknya untuk mempublikasikan informasi yang bersifat pribadi kepada publik dan seseorang berhak untuk tidak membagikan semua informasi mengenai pribadinya dalam kehidupan sosial. Ketiadaan hukum yang mengatur secara komprehensif perlindungan privasi atas data pribadi di Indonesia dapat meningkatkan potensi pelanggaran terhadap hak konstitusional warga negara atas perlindungan privasi data pribadi. The development of information and communication technology, which one is the internet (interconnection networking). Personal data becomes a reference in the use of internet-based applications such as e-commerce, e-health, e-payment, and the development of cloud computing (data storage space such as Google Drive, iCloud, Youtube). The privacy of personal data is very important because it involves the freedom of expression and dignity of each individual. Data privacy protection shall be stipulated under the national law, if it is not, it may harm personal information that leads to any loss of someone. This paper aims to discuss the concept of privacy protection of personal data and legal arrangements from the perspective of comparative law. This article is normative legal research that examines and analyzes legal sources. The study results show that the privacy protection concept of personal data is a person rights and applied not by the others, but the rights can be lost if that person publish private information to the public and it’s a rights to does not share all information about his personality in social life. The absence of a comprehensive law governing privacy protection for personal data as a constitutional right can increase the potential of violations in Indonesia.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Yingbo Li ◽  
Yucong Duan ◽  
Zakaria Maamar ◽  
Haoyang Che ◽  
Anamaria-Beatrice Spulber ◽  
...  

Privacy protection has recently been in the spotlight of attention to both academia and industry. Society protects individual data privacy through complex legal frameworks. The increasing number of applications of data science and artificial intelligence has resulted in a higher demand for the ubiquitous application of the data. The privacy protection of the broad Data-Information-Knowledge-Wisdom (DIKW) landscape, the next generation of information organization, has taken a secondary role. In this paper, we will explore DIKW architecture through the applications of the popular swarm intelligence and differential privacy. As differential privacy proved to be an effective data privacy approach, we will look at it from a DIKW domain perspective. Swarm intelligence can effectively optimize and reduce the number of items in DIKW used in differential privacy, thus accelerating both the effectiveness and the efficiency of differential privacy for crossing multiple modals of conceptual DIKW. The proposed approach is demonstrated through the application of personalized data that is based on the open-source IRIS dataset. This experiment demonstrates the efficiency of swarm intelligence in reducing computing complexity.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Sreemoyee Biswas ◽  
Nilay Khare ◽  
Pragati Agrawal ◽  
Priyank Jain

AbstractWith data becoming a salient asset worldwide, dependence amongst data kept on growing. Hence the real-world datasets that one works upon in today’s time are highly correlated. Since the past few years, researchers have given attention to this aspect of data privacy and found a correlation among data. The existing data privacy guarantees cannot assure the expected data privacy algorithms. The privacy guarantees provided by existing algorithms were enough when there existed no relation between data in the datasets. Hence, by keeping the existence of data correlation into account, there is a dire need to reconsider the privacy algorithms. Some of the research has considered utilizing a well-known machine learning concept, i.e., Data Correlation Analysis, to understand the relationship between data in a better way. This concept has given some promising results as well. Though it is still concise, the researchers did a considerable amount of research on correlated data privacy. Researchers have provided solutions using probabilistic models, behavioral analysis, sensitivity analysis, information theory models, statistical correlation analysis, exhaustive combination analysis, temporal privacy leakages, and weighted hierarchical graphs. Nevertheless, researchers are doing work upon the real-world datasets that are often large (technologically termed big data) and house a high amount of data correlation. Firstly, the data correlation in big data must be studied. Researchers are exploring different analysis techniques to find the best suitable. Then, they might suggest a measure to guarantee privacy for correlated big data. This survey paper presents a detailed survey of the methods proposed by different researchers to deal with the problem of correlated data privacy and correlated big data privacy and highlights the future scope in this area. The quantitative analysis of the reviewed articles suggests that data correlation is a significant threat to data privacy. This threat further gets magnified with big data. While considering and analyzing data correlation, then parameters such as Maximum queries executed, Mean average error values show better results when compared with other methods. Hence, there is a grave need to understand and propose solutions for correlated big data privacy.


Sign in / Sign up

Export Citation Format

Share Document