scholarly journals Stack and Deal: An Efficient Algorithm for Privacy Preserving Data Publishing

2021 ◽  
Author(s):  
Vikas Thammanna Gowda

Although k-Anonymity is a good way to publish microdata for research purposes, it still suffers from various attacks. Hence, many refinements of k-Anonymity have been proposed such as ldiversity and t-Closeness, with t-Closeness being one of the strictest privacy models. Satisfying t-Closeness for a lower value of t may yield equivalence classes with high number of records which results in a greater information loss. For a higher value of t, equivalence classes are still prone to homogeneity, skewness, and similarity attacks. This is because equivalence classes can be formed with fewer distinct sensitive attribute values and still satisfy the constraint t. In this paper, we introduce a new algorithm that overcomes the limitations of k-Anonymity and lDiversity and yields equivalence classes of size k with greater diversity and frequency of a SA value in all the equivalence classes differ by at-most one.

Information ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 166
Author(s):  
Yuelei Xiao ◽  
Haiqi Li

Privacy preserving data publishing has received considerable attention for publishing useful information while preserving data privacy. The existing privacy preserving data publishing methods for multiple sensitive attributes do not consider the situation that different values of a sensitive attribute may have different sensitivity requirements. To solve this problem, we defined three security levels for different sensitive attribute values that have different sensitivity requirements, and given an L s l -diversity model for multiple sensitive attributes. Following this, we proposed three specific greed algorithms based on the maximal-bucket first (MBF), maximal single-dimension-capacity first (MSDCF) and maximal multi-dimension-capacity first (MMDCF) algorithms and the maximal security-level first (MSLF) greed policy, named as MBF based on MSLF (MBF-MSLF), MSDCF based on MSLF (MSDCF-MSLF) and MMDCF based on MSLF (MMDCF-MSLF), to implement the L s l -diversity model for multiple sensitive attributes. The experimental results show that the three algorithms can greatly reduce the information loss of the published microdata, but their runtime is only a small increase, and their information loss tends to be stable with the increasing of data volume. And they can solve the problem that the information loss of MBF, MSDCF and MMDCF increases greatly with the increasing of sensitive attribute number.


Author(s):  
Udai Pratap Rao ◽  
Brijesh B. Mehta ◽  
Nikhil Kumar

Privacy preserving data publishing is one of the most demanding research areas in the recent few years. There are more than billions of devices capable to collect the data from various sources. To preserve the privacy while publishing data, algorithms for equivalence class generation and scalable anonymization with k-anonymity and l-diversity using MapReduce programming paradigm are proposed in this article. Equivalence class generation algorithms divide the datasets into equivalence classes for Scalable k-Anonymity (SKA) and Scalable l-Diversity (SLD) separately. These equivalence classes are finally fed to the anonymization algorithm that calculates the Gross Cost Penalty (GCP) for the complete dataset. The value of GCP gives information loss in input dataset after anonymization.


2021 ◽  
Author(s):  
Jayapradha J ◽  
Prakash M

Abstract Privacy of the individuals plays a vital role when a dataset is disclosed in public. Privacy-preserving data publishing is a process of releasing the anonymized dataset for various purposes of analysis and research. The data to be published contain several sensitive attributes such as diseases, salary, symptoms, etc. Earlier, researchers have dealt with datasets considering it would contain only one record for an individual [1:1 dataset], which is uncompromising in various applications. Later, many researchers concentrate on the dataset, where an individual has multiple records [1:M dataset]. In the paper, a model f-slip was proposed that can address the various attacks such as Background Knowledge (bk) attack, Multiple Sensitive attribute correlation attack (MSAcorr), Quasi-identifier correlation attack(QIcorr), Non-membership correlation attack(NMcorr) and Membership correlation attack(Mcorr) in 1:M dataset and the solutions for the attacks. In f -slip, the anatomization was performed to divide the table into two subtables consisting of i) quasi-identifier and ii) sensitive attributes. The correlation of sensitive attributes is computed to anonymize the sensitive attributes without breaking the linking relationship. Further, the quasi-identifier table was divided and k-anonymity was implemented on it. An efficient anonymization technique, frequency-slicing (f-slicing), was also developed to anonymize the sensitive attributes. The f -slip model is consistent as the number of records increases. Extensive experiments were performed on a real-world dataset Informs and proved that the f -slip model outstrips the state-of-the-art techniques in terms of utility loss, efficiency and also acquires an optimal balance between privacy and utility.


Information ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 362
Author(s):  
Widodo ◽  
Eko Kuswardono Budiardjo ◽  
Wahyu Catur Wibowo

Investigation into privacy preserving data publishing with multiple sensitive attributes is performed to reduce probability of adversaries to guess the sensitive values. Masking the sensitive values is usually performed by anonymizing data by using generalization and suppression techniques. A successful anonymization technique should reduce information loss due to the generalization and suppression. This research attempts to solve both problems in microdata with multiple sensitive attributes. We propose a novel overlapped slicing method for privacy preserving data publishing with multiple sensitive attributes. We used discernibility metrics to measure information loss. The experiment result shows that our method obtained a lower discernibility value than other methods.


2015 ◽  
Vol 12 (4) ◽  
pp. 1193-1216 ◽  
Author(s):  
Hong Zhu ◽  
Shengli Tian ◽  
Genyuan Du ◽  
Meiyi Xie

In privacy preserving data publishing, to reduce the correlation loss between sensitive attribute (SA) and non-sensitive attributes(NSAs) caused by anonymization methods (such as generalization, anatomy, slicing and randomization, etc.), the records with same NSAs values should be divided into same blocks to meet the anonymizing demands of ?-diversity. However, there are often many blocks (of the initial partition), in which there are more than ? records with different SA values, and the frequencies of different SA values are uneven. Therefore, anonymization on the initial partition causes more correlation loss. To reduce the correlation loss as far as possible, in this paper, an optimizing model is first proposed. Then according to the optimizing model, the refining partition of the initial partition is generated, and anonymization is applied on the refining partition. Although anonymization on refining partition can be used on top of any existing partitioning method to reduce the correlation loss, we demonstrate that a new partitioning method tailored for refining partition could further improve data utility. An experimental evaluation shows that our approach could efficiently reduce correlation loss.


2020 ◽  
Vol 4 (1) ◽  
pp. 43-48
Author(s):  
Shafa Sya’airillah ◽  
Widodo ◽  
Bambang Prasetya Adhi

Penelitian ini dilatar belakangi oleh teknik anonimitas data yang terdapat pada Privacy Preserving Data Publishing. Sehingga data yang ingin dipublikasikan bersifat anonim, tanpa mengungkap informasi yang sebenarnya. Metode penelitian yang digunakan pada penelitian ini adalah rekayasa teknik dengan cara menghitung nilai information loss yang dihasilkan pada masing-masing algoritma, kemudian membandingkannya. Model yang digunakan pada penelitian ini adalah l-Diversity. Algoritma yang digunakan adalah algoritma Systematic Clustering dan algoritma Datafly. Data yang digunakan adalah dataset ‘Adult’ yang diunduh dari repositori UCI Machine Learning. Sampel yang digunakan dari dataset ‘Adult’ ini adalah sebanyak 2000 tuple. Nilai information loss tertinggi yang dihasilkan algoritma Systematic Clustering adalah 475673.19, sedangkan nilai information loss tertinggi dari algoritma Datafly adalah 46298.00. Kemudian, untuk nilai information loss terendah yang dihasilkan algoritma Systematic Clustering adalah 22364.79, sedangkan nilai information loss terendah dari algoritma Datafly adalah 36659.00. Algoritma dengan tingkat information loss paling kecil dianggap sebagai algoritma yang paling baik dalam membangun model l-Diversity di antara kedua algoritma yang diuji. Hasil pengujian menyatakan bahwa algoritma Systematic Clustering adalah algoritma yang paling baik dalam membangun model l-Diversity di antara algoritma Systematic Clustering dan Datafly.


Author(s):  
Udai Pratap Rao ◽  
Brijesh B. Mehta ◽  
Nikhil Kumar

Privacy preserving data publishing is one of the most demanding research areas in the recent few years. There are more than billions of devices capable to collect the data from various sources. To preserve the privacy while publishing data, algorithms for equivalence class generation and scalable anonymization with k-anonymity and l-diversity using MapReduce programming paradigm are proposed in this article. Equivalence class generation algorithms divide the datasets into equivalence classes for Scalable k-Anonymity (SKA) and Scalable l-Diversity (SLD) separately. These equivalence classes are finally fed to the anonymization algorithm that calculates the Gross Cost Penalty (GCP) for the complete dataset. The value of GCP gives information loss in input dataset after anonymization.


Sign in / Sign up

Export Citation Format

Share Document