Stack and Deal: An Efficient Algorithm for Privacy Preserving Data Publishing

Privacy Preserving Data Publishing for Multiple Sensitive Attributes Based on Security Level

Information ◽

10.3390/info11030166 ◽

2020 ◽

Vol 11 (3) ◽

pp. 166

Author(s):

Yuelei Xiao ◽

Haiqi Li

Keyword(s):

Data Privacy ◽

Privacy Preserving ◽

Information Loss ◽

Experimental Results ◽

Data Publishing ◽

Security Level ◽

Sensitive Attribute ◽

Data Volume ◽

Security Levels ◽

Privacy Preserving Data Publishing

Privacy preserving data publishing has received considerable attention for publishing useful information while preserving data privacy. The existing privacy preserving data publishing methods for multiple sensitive attributes do not consider the situation that different values of a sensitive attribute may have different sensitivity requirements. To solve this problem, we defined three security levels for different sensitive attribute values that have different sensitivity requirements, and given an L s l -diversity model for multiple sensitive attributes. Following this, we proposed three specific greed algorithms based on the maximal-bucket first (MBF), maximal single-dimension-capacity first (MSDCF) and maximal multi-dimension-capacity first (MMDCF) algorithms and the maximal security-level first (MSLF) greed policy, named as MBF based on MSLF (MBF-MSLF), MSDCF based on MSLF (MSDCF-MSLF) and MMDCF based on MSLF (MMDCF-MSLF), to implement the L s l -diversity model for multiple sensitive attributes. The experimental results show that the three algorithms can greatly reduce the information loss of the published microdata, but their runtime is only a small increase, and their information loss tends to be stable with the increasing of data volume. And they can solve the problem that the information loss of MBF, MSDCF and MMDCF increases greatly with the increasing of sensitive attribute number.

Download Full-text

Duplication with Trapdoor Sensitive Attribute Values: A New Approach for Privacy Preserving Data Publishing

Procedia Technology ◽

10.1016/j.protcy.2012.10.118 ◽

2012 ◽

Vol 6 ◽

pp. 970-977

Author(s):

B.R. Purushothama ◽

B.B. Amberker

Keyword(s):

Privacy Preserving ◽

Data Publishing ◽

New Approach ◽

Sensitive Attribute ◽

Privacy Preserving Data Publishing

Download Full-text

Scalable l-Diversity

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2019040102 ◽

2019 ◽

Vol 14 (2) ◽

pp. 27-40

Author(s):

Udai Pratap Rao ◽

Brijesh B. Mehta ◽

Nikhil Kumar

Keyword(s):

Equivalence Class ◽

Information Loss ◽

Equivalence Classes ◽

Data Publishing ◽

Programming Paradigm ◽

Research Areas ◽

Complete Dataset ◽

Input Dataset ◽

Privacy Preserving Data Publishing ◽

Cost Penalty

Privacy preserving data publishing is one of the most demanding research areas in the recent few years. There are more than billions of devices capable to collect the data from various sources. To preserve the privacy while publishing data, algorithms for equivalence class generation and scalable anonymization with k-anonymity and l-diversity using MapReduce programming paradigm are proposed in this article. Equivalence class generation algorithms divide the datasets into equivalence classes for Scalable k-Anonymity (SKA) and Scalable l-Diversity (SLD) separately. These equivalence classes are finally fed to the anonymization algorithm that calculates the Gross Cost Penalty (GCP) for the complete dataset. The value of GCP gives information loss in input dataset after anonymization.

Download Full-text

f-Slip: An Efficient Privacy-Preserving Data Publishing Framework for 1: M Microdata with Multiple Sensitive Attributes.

10.21203/rs.3.rs-660451/v1 ◽

2021 ◽

Author(s):

Jayapradha J ◽

Prakash M

Keyword(s):

Privacy Preserving ◽

Vital Role ◽

Data Publishing ◽

Slip Model ◽

Correlation Attack ◽

Sensitive Attribute ◽

Utility Loss ◽

Privacy Preserving Data Publishing ◽

Loss Efficiency ◽

Attribute Correlation

Abstract Privacy of the individuals plays a vital role when a dataset is disclosed in public. Privacy-preserving data publishing is a process of releasing the anonymized dataset for various purposes of analysis and research. The data to be published contain several sensitive attributes such as diseases, salary, symptoms, etc. Earlier, researchers have dealt with datasets considering it would contain only one record for an individual [1:1 dataset], which is uncompromising in various applications. Later, many researchers concentrate on the dataset, where an individual has multiple records [1:M dataset]. In the paper, a model f-slip was proposed that can address the various attacks such as Background Knowledge (bk) attack, Multiple Sensitive attribute correlation attack (MSAcorr), Quasi-identifier correlation attack(QIcorr), Non-membership correlation attack(NMcorr) and Membership correlation attack(Mcorr) in 1:M dataset and the solutions for the attacks. In f -slip, the anatomization was performed to divide the table into two subtables consisting of i) quasi-identifier and ii) sensitive attributes. The correlation of sensitive attributes is computed to anonymize the sensitive attributes without breaking the linking relationship. Further, the quasi-identifier table was divided and k-anonymity was implemented on it. An efficient anonymization technique, frequency-slicing (f-slicing), was also developed to anonymize the sensitive attributes. The f -slip model is consistent as the number of records increases. Extensive experiments were performed on a real-world dataset Informs and proved that the f -slip model outstrips the state-of-the-art techniques in terms of utility loss, efficiency and also acquires an optimal balance between privacy and utility.

Download Full-text

Privacy Preserving Data Publishing with Multiple Sensitive Attributes based on Overlapped Slicing

Information ◽

10.3390/info10120362 ◽

2019 ◽

Vol 10 (12) ◽

pp. 362

Author(s):

Widodo ◽

Eko Kuswardono Budiardjo ◽

Wahyu Catur Wibowo

Keyword(s):

Privacy Preserving ◽

Information Loss ◽

Data Publishing ◽

Slicing Method ◽

Privacy Preserving Data Publishing

Investigation into privacy preserving data publishing with multiple sensitive attributes is performed to reduce probability of adversaries to guess the sensitive values. Masking the sensitive values is usually performed by anonymizing data by using generalization and suppression techniques. A successful anonymization technique should reduce information loss due to the generalization and suppression. This research attempts to solve both problems in microdata with multiple sensitive attributes. We propose a novel overlapped slicing method for privacy preserving data publishing with multiple sensitive attributes. We used discernibility metrics to measure information loss. The experiment result shows that our method obtained a lower discernibility value than other methods.

Download Full-text

Anonymization on refining partition: Same privacy, more utility

Computer Science and Information Systems ◽

10.2298/csis141212052z ◽

2015 ◽

Vol 12 (4) ◽

pp. 1193-1216 ◽

Cited By ~ 1

Author(s):

Hong Zhu ◽

Shengli Tian ◽

Genyuan Du ◽

Meiyi Xie

Keyword(s):

Experimental Evaluation ◽

Privacy Preserving ◽

Data Publishing ◽

Data Utility ◽

Sensitive Attribute ◽

The Optimizing Model ◽

Privacy Preserving Data Publishing ◽

Initial Partition ◽

Optimizing Model

In privacy preserving data publishing, to reduce the correlation loss between sensitive attribute (SA) and non-sensitive attributes(NSAs) caused by anonymization methods (such as generalization, anatomy, slicing and randomization, etc.), the records with same NSAs values should be divided into same blocks to meet the anonymizing demands of ?-diversity. However, there are often many blocks (of the initial partition), in which there are more than ? records with different SA values, and the frequencies of different SA values are uneven. Therefore, anonymization on the initial partition causes more correlation loss. To reduce the correlation loss as far as possible, in this paper, an optimizing model is first proposed. Then according to the optimizing model, the refining partition of the initial partition is generated, and anonymization is applied on the refining partition. Although anonymization on refining partition can be used on top of any existing partitioning method to reduce the correlation loss, we demonstrate that a new partitioning method tailored for refining partition could further improve data utility. An experimental evaluation shows that our approach could efficiently reduce correlation loss.

Download Full-text

ANALISIS MODEL L-DIVERSITY DENGAN ALGORITMA SYSTEMATIC CLUSTERING DAN DATAFLY

PINTER Jurnal Pendidikan Teknik Informatika dan Komputer ◽

10.21009/pinter.4.1.10 ◽

2020 ◽

Vol 4 (1) ◽

pp. 43-48

Author(s):

Shafa Sya’airillah ◽

Widodo ◽

Bambang Prasetya Adhi

Keyword(s):

Machine Learning ◽

Privacy Preserving ◽

Information Loss ◽

Data Publishing ◽

Privacy Preserving Data Publishing

Penelitian ini dilatar belakangi oleh teknik anonimitas data yang terdapat pada Privacy Preserving Data Publishing. Sehingga data yang ingin dipublikasikan bersifat anonim, tanpa mengungkap informasi yang sebenarnya. Metode penelitian yang digunakan pada penelitian ini adalah rekayasa teknik dengan cara menghitung nilai information loss yang dihasilkan pada masing-masing algoritma, kemudian membandingkannya. Model yang digunakan pada penelitian ini adalah l-Diversity. Algoritma yang digunakan adalah algoritma Systematic Clustering dan algoritma Datafly. Data yang digunakan adalah dataset ‘Adult’ yang diunduh dari repositori UCI Machine Learning. Sampel yang digunakan dari dataset ‘Adult’ ini adalah sebanyak 2000 tuple. Nilai information loss tertinggi yang dihasilkan algoritma Systematic Clustering adalah 475673.19, sedangkan nilai information loss tertinggi dari algoritma Datafly adalah 46298.00. Kemudian, untuk nilai information loss terendah yang dihasilkan algoritma Systematic Clustering adalah 22364.79, sedangkan nilai information loss terendah dari algoritma Datafly adalah 36659.00. Algoritma dengan tingkat information loss paling kecil dianggap sebagai algoritma yang paling baik dalam membangun model l-Diversity di antara kedua algoritma yang diuji. Hasil pengujian menyatakan bahwa algoritma Systematic Clustering adalah algoritma yang paling baik dalam membangun model l-Diversity di antara algoritma Systematic Clustering dan Datafly.

Download Full-text

Scalable l-Diversity

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch048 ◽

2021 ◽

pp. 1051-1065

Author(s):

Udai Pratap Rao ◽

Brijesh B. Mehta ◽

Nikhil Kumar

Keyword(s):

Equivalence Class ◽

Information Loss ◽

Equivalence Classes ◽

Data Publishing ◽

Programming Paradigm ◽

Research Areas ◽

Complete Dataset ◽

Input Dataset ◽

Privacy Preserving Data Publishing ◽

Cost Penalty

Privacy preserving data publishing is one of the most demanding research areas in the recent few years. There are more than billions of devices capable to collect the data from various sources. To preserve the privacy while publishing data, algorithms for equivalence class generation and scalable anonymization with k-anonymity and l-diversity using MapReduce programming paradigm are proposed in this article. Equivalence class generation algorithms divide the datasets into equivalence classes for Scalable k-Anonymity (SKA) and Scalable l-Diversity (SLD) separately. These equivalence classes are finally fed to the anonymization algorithm that calculates the Gross Cost Penalty (GCP) for the complete dataset. The value of GCP gives information loss in input dataset after anonymization.

Download Full-text