Scalable l-Diversity

Privacy preserving data publishing is one of the most demanding research areas in the recent few years. There are more than billions of devices capable to collect the data from various sources. To preserve the privacy while publishing data, algorithms for equivalence class generation and scalable anonymization with k-anonymity and l-diversity using MapReduce programming paradigm are proposed in this article. Equivalence class generation algorithms divide the datasets into equivalence classes for Scalable k-Anonymity (SKA) and Scalable l-Diversity (SLD) separately. These equivalence classes are finally fed to the anonymization algorithm that calculates the Gross Cost Penalty (GCP) for the complete dataset. The value of GCP gives information loss in input dataset after anonymization.

Download Full-text

Evaluation of proposed amalgamated anonymization approach

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i3.pp1439-1446 ◽

2019 ◽

Vol 16 (3) ◽

pp. 1439

Author(s):

Deepak Narula ◽

Pardeep Kumar ◽

Shuchita Upadhyaya

Keyword(s):

Equivalence Class ◽

Class Size ◽

Information Loss ◽

Data Publishing ◽

Published Data ◽

Sensitive Information ◽

Electronic Data ◽

Current Scenario ◽

Privacy Preserving Data Publishing ◽

Modern Era

<p>In the current scenario of modern era, providing security to an individual is always a matter of concern when a huge volume of electronic data is gathering daily. Now providing security to the gathered data is not only a matter of concern but also remains a notable topic of research. The concept of Privacy Preserving Data Publishing (PPDP) defines accessing the published data without disclosing the non required information about an individual. Hence PPDP faces the problem of publishing useful data while keeping the privacy about sensitive information about an individual. A variety of techniques for anonymization has been found in literature, but suffers from different kind of problems in terms of data information loss, discernibility and average equivalence class size. This paper proposes amalgamated approach along with its verification with respect to information loss, value of discernibility and the value of average equivalence class size metric. The result have been found encouraging as compared to existing <em>k-</em>anonymity based algorithms such as Datafly, Mondrian and Incognito on various publically available datasets.</p>

Download Full-text

Stack and Deal: An Efficient Algorithm for Privacy Preserving Data Publishing

10.5121/csit.2021.111111 ◽

2021 ◽

Author(s):

Vikas Thammanna Gowda

Keyword(s):

Efficient Algorithm ◽

Privacy Preserving ◽

Information Loss ◽

Equivalence Classes ◽

Data Publishing ◽

Sensitive Attribute ◽

Privacy Preserving Data Publishing ◽

Privacy Models

Although k-Anonymity is a good way to publish microdata for research purposes, it still suffers from various attacks. Hence, many refinements of k-Anonymity have been proposed such as ldiversity and t-Closeness, with t-Closeness being one of the strictest privacy models. Satisfying t-Closeness for a lower value of t may yield equivalence classes with high number of records which results in a greater information loss. For a higher value of t, equivalence classes are still prone to homogeneity, skewness, and similarity attacks. This is because equivalence classes can be formed with fewer distinct sensitive attribute values and still satisfy the constraint t. In this paper, we introduce a new algorithm that overcomes the limitations of k-Anonymity and lDiversity and yields equivalence classes of size k with greater diversity and frequency of a SA value in all the equivalence classes differ by at-most one.

Download Full-text

Privacy Preserving Data Publishing for Multiple Sensitive Attributes Based on Security Level

Information ◽

10.3390/info11030166 ◽

2020 ◽

Vol 11 (3) ◽

pp. 166

Author(s):

Yuelei Xiao ◽

Haiqi Li

Keyword(s):

Data Privacy ◽

Privacy Preserving ◽

Information Loss ◽

Experimental Results ◽

Data Publishing ◽

Security Level ◽

Sensitive Attribute ◽

Data Volume ◽

Security Levels ◽

Privacy Preserving Data Publishing

Privacy preserving data publishing has received considerable attention for publishing useful information while preserving data privacy. The existing privacy preserving data publishing methods for multiple sensitive attributes do not consider the situation that different values of a sensitive attribute may have different sensitivity requirements. To solve this problem, we defined three security levels for different sensitive attribute values that have different sensitivity requirements, and given an L s l -diversity model for multiple sensitive attributes. Following this, we proposed three specific greed algorithms based on the maximal-bucket first (MBF), maximal single-dimension-capacity first (MSDCF) and maximal multi-dimension-capacity first (MMDCF) algorithms and the maximal security-level first (MSLF) greed policy, named as MBF based on MSLF (MBF-MSLF), MSDCF based on MSLF (MSDCF-MSLF) and MMDCF based on MSLF (MMDCF-MSLF), to implement the L s l -diversity model for multiple sensitive attributes. The experimental results show that the three algorithms can greatly reduce the information loss of the published microdata, but their runtime is only a small increase, and their information loss tends to be stable with the increasing of data volume. And they can solve the problem that the information loss of MBF, MSDCF and MMDCF increases greatly with the increasing of sensitive attribute number.

Download Full-text

Privacy Preserving Data Publishing with Multiple Sensitive Attributes based on Overlapped Slicing

Information ◽

10.3390/info10120362 ◽

2019 ◽

Vol 10 (12) ◽

pp. 362

Author(s):

Widodo ◽

Eko Kuswardono Budiardjo ◽

Wahyu Catur Wibowo

Keyword(s):

Privacy Preserving ◽

Information Loss ◽

Data Publishing ◽

Slicing Method ◽

Privacy Preserving Data Publishing

Investigation into privacy preserving data publishing with multiple sensitive attributes is performed to reduce probability of adversaries to guess the sensitive values. Masking the sensitive values is usually performed by anonymizing data by using generalization and suppression techniques. A successful anonymization technique should reduce information loss due to the generalization and suppression. This research attempts to solve both problems in microdata with multiple sensitive attributes. We propose a novel overlapped slicing method for privacy preserving data publishing with multiple sensitive attributes. We used discernibility metrics to measure information loss. The experiment result shows that our method obtained a lower discernibility value than other methods.

Download Full-text

ANALISIS MODEL L-DIVERSITY DENGAN ALGORITMA SYSTEMATIC CLUSTERING DAN DATAFLY

PINTER Jurnal Pendidikan Teknik Informatika dan Komputer ◽

10.21009/pinter.4.1.10 ◽

2020 ◽

Vol 4 (1) ◽

pp. 43-48

Author(s):

Shafa Sya’airillah ◽

Widodo ◽

Bambang Prasetya Adhi

Keyword(s):

Machine Learning ◽

Privacy Preserving ◽

Information Loss ◽

Data Publishing ◽

Privacy Preserving Data Publishing

Penelitian ini dilatar belakangi oleh teknik anonimitas data yang terdapat pada Privacy Preserving Data Publishing. Sehingga data yang ingin dipublikasikan bersifat anonim, tanpa mengungkap informasi yang sebenarnya. Metode penelitian yang digunakan pada penelitian ini adalah rekayasa teknik dengan cara menghitung nilai information loss yang dihasilkan pada masing-masing algoritma, kemudian membandingkannya. Model yang digunakan pada penelitian ini adalah l-Diversity. Algoritma yang digunakan adalah algoritma Systematic Clustering dan algoritma Datafly. Data yang digunakan adalah dataset ‘Adult’ yang diunduh dari repositori UCI Machine Learning. Sampel yang digunakan dari dataset ‘Adult’ ini adalah sebanyak 2000 tuple. Nilai information loss tertinggi yang dihasilkan algoritma Systematic Clustering adalah 475673.19, sedangkan nilai information loss tertinggi dari algoritma Datafly adalah 46298.00. Kemudian, untuk nilai information loss terendah yang dihasilkan algoritma Systematic Clustering adalah 22364.79, sedangkan nilai information loss terendah dari algoritma Datafly adalah 36659.00. Algoritma dengan tingkat information loss paling kecil dianggap sebagai algoritma yang paling baik dalam membangun model l-Diversity di antara kedua algoritma yang diuji. Hasil pengujian menyatakan bahwa algoritma Systematic Clustering adalah algoritma yang paling baik dalam membangun model l-Diversity di antara algoritma Systematic Clustering dan Datafly.

Download Full-text

Genetic grey wolf optimization and C-mixture for collaborative data publishing

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962318500587 ◽

2018 ◽

Vol 09 (06) ◽

pp. 1850058 ◽

Cited By ~ 1

Author(s):

Yogesh R. Kulkarni ◽

T. Senthil Murugan

Keyword(s):

Equivalence Class ◽

Information Loss ◽

Data Publishing ◽

Grey Wolf ◽

Minimum Value ◽

Area Of Interest ◽

Maximum Utility ◽

Fitness Value ◽

Generalized Information ◽

Mixture Parameter

Data publishing is an area of interest in present day technology that has gained huge attention of researchers and experts. The concept of data publishing faces a lot of security issues, indicating that when any trusted organization provides data to a third party, personal information need not be disclosed. Therefore, to maintain the privacy of the data, this paper proposes an algorithm for privacy preserved collaborative data publishing using the Genetic Grey Wolf Optimizer (Genetic GWO) algorithm for which a C-mixture parameter is used. The C-mixture parameter enhances the privacy of the data if the data does not satisfy the privacy constraints, such as the [Formula: see text]-anonymity, [Formula: see text]-diversity and the [Formula: see text]-privacy. A minimum fitness value is maintained that depends on the minimum value of the generalized information loss and the minimum value of the average equivalence class size. The minimum value of the fitness ensures the maximum utility and the maximum privacy. Experimentation was carried out using the adult dataset, and the proposed Genetic GWO outperformed the existing methods in terms of the generalized information loss and the average equivalence class metric and achieved minimum values at a rate of 0.402 and 0.9, respectively.

Download Full-text