A novel utility metric to measure information loss for generalization and suppression techniques in Privacy Preserving Data publishing

Privacy preserving data publishing has received considerable attention for publishing useful information while preserving data privacy. The existing privacy preserving data publishing methods for multiple sensitive attributes do not consider the situation that different values of a sensitive attribute may have different sensitivity requirements. To solve this problem, we defined three security levels for different sensitive attribute values that have different sensitivity requirements, and given an L s l -diversity model for multiple sensitive attributes. Following this, we proposed three specific greed algorithms based on the maximal-bucket first (MBF), maximal single-dimension-capacity first (MSDCF) and maximal multi-dimension-capacity first (MMDCF) algorithms and the maximal security-level first (MSLF) greed policy, named as MBF based on MSLF (MBF-MSLF), MSDCF based on MSLF (MSDCF-MSLF) and MMDCF based on MSLF (MMDCF-MSLF), to implement the L s l -diversity model for multiple sensitive attributes. The experimental results show that the three algorithms can greatly reduce the information loss of the published microdata, but their runtime is only a small increase, and their information loss tends to be stable with the increasing of data volume. And they can solve the problem that the information loss of MBF, MSDCF and MMDCF increases greatly with the increasing of sensitive attribute number.

Download Full-text

Privacy Preserving Data Publishing with Multiple Sensitive Attributes based on Overlapped Slicing

Information ◽

10.3390/info10120362 ◽

2019 ◽

Vol 10 (12) ◽

pp. 362

Author(s):

Widodo ◽

Eko Kuswardono Budiardjo ◽

Wahyu Catur Wibowo

Keyword(s):

Privacy Preserving ◽

Information Loss ◽

Data Publishing ◽

Slicing Method ◽

Privacy Preserving Data Publishing

Investigation into privacy preserving data publishing with multiple sensitive attributes is performed to reduce probability of adversaries to guess the sensitive values. Masking the sensitive values is usually performed by anonymizing data by using generalization and suppression techniques. A successful anonymization technique should reduce information loss due to the generalization and suppression. This research attempts to solve both problems in microdata with multiple sensitive attributes. We propose a novel overlapped slicing method for privacy preserving data publishing with multiple sensitive attributes. We used discernibility metrics to measure information loss. The experiment result shows that our method obtained a lower discernibility value than other methods.

Download Full-text

ANALISIS MODEL L-DIVERSITY DENGAN ALGORITMA SYSTEMATIC CLUSTERING DAN DATAFLY

PINTER Jurnal Pendidikan Teknik Informatika dan Komputer ◽

10.21009/pinter.4.1.10 ◽

2020 ◽

Vol 4 (1) ◽

pp. 43-48

Author(s):

Shafa Sya’airillah ◽

Widodo ◽

Bambang Prasetya Adhi

Keyword(s):

Machine Learning ◽

Privacy Preserving ◽

Information Loss ◽

Data Publishing ◽

Privacy Preserving Data Publishing

Penelitian ini dilatar belakangi oleh teknik anonimitas data yang terdapat pada Privacy Preserving Data Publishing. Sehingga data yang ingin dipublikasikan bersifat anonim, tanpa mengungkap informasi yang sebenarnya. Metode penelitian yang digunakan pada penelitian ini adalah rekayasa teknik dengan cara menghitung nilai information loss yang dihasilkan pada masing-masing algoritma, kemudian membandingkannya. Model yang digunakan pada penelitian ini adalah l-Diversity. Algoritma yang digunakan adalah algoritma Systematic Clustering dan algoritma Datafly. Data yang digunakan adalah dataset ‘Adult’ yang diunduh dari repositori UCI Machine Learning. Sampel yang digunakan dari dataset ‘Adult’ ini adalah sebanyak 2000 tuple. Nilai information loss tertinggi yang dihasilkan algoritma Systematic Clustering adalah 475673.19, sedangkan nilai information loss tertinggi dari algoritma Datafly adalah 46298.00. Kemudian, untuk nilai information loss terendah yang dihasilkan algoritma Systematic Clustering adalah 22364.79, sedangkan nilai information loss terendah dari algoritma Datafly adalah 36659.00. Algoritma dengan tingkat information loss paling kecil dianggap sebagai algoritma yang paling baik dalam membangun model l-Diversity di antara kedua algoritma yang diuji. Hasil pengujian menyatakan bahwa algoritma Systematic Clustering adalah algoritma yang paling baik dalam membangun model l-Diversity di antara algoritma Systematic Clustering dan Datafly.

Download Full-text

Stack and Deal: An Efficient Algorithm for Privacy Preserving Data Publishing

10.5121/csit.2021.111111 ◽

2021 ◽

Author(s):

Vikas Thammanna Gowda

Keyword(s):

Efficient Algorithm ◽

Privacy Preserving ◽

Information Loss ◽

Equivalence Classes ◽

Data Publishing ◽

Sensitive Attribute ◽

Privacy Preserving Data Publishing ◽

Privacy Models

Although k-Anonymity is a good way to publish microdata for research purposes, it still suffers from various attacks. Hence, many refinements of k-Anonymity have been proposed such as ldiversity and t-Closeness, with t-Closeness being one of the strictest privacy models. Satisfying t-Closeness for a lower value of t may yield equivalence classes with high number of records which results in a greater information loss. For a higher value of t, equivalence classes are still prone to homogeneity, skewness, and similarity attacks. This is because equivalence classes can be formed with fewer distinct sensitive attribute values and still satisfy the constraint t. In this paper, we introduce a new algorithm that overcomes the limitations of k-Anonymity and lDiversity and yields equivalence classes of size k with greater diversity and frequency of a SA value in all the equivalence classes differ by at-most one.

Download Full-text

A Unified Metric Method of Information Loss in Privacy Preserving Data Publishing

2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing ◽

10.1109/nswctc.2010.258 ◽

2010 ◽

Cited By ~ 1

Author(s):

Lv Pin ◽

Yu Wen-bing ◽

Chen Nian-sheng

Keyword(s):

Privacy Preserving ◽

Information Loss ◽

Data Publishing ◽

Privacy Preserving Data Publishing

Download Full-text

An optimal dynamic KCi-slice model for privacy preserving data publishing of multiple sensitive attributes adopting various sensitivity thresholds

International Journal of Data Science ◽

10.1504/ijds.2019.105264 ◽

2019 ◽

Vol 4 (4) ◽

pp. 320

Author(s):

N.V.S. Lakshmipathi Raju ◽

M.N. Seetaramanath ◽

P. Srinivasa Rao

Keyword(s):

Privacy Preserving ◽

Data Publishing ◽

Optimal Dynamic ◽

Privacy Preserving Data Publishing

Download Full-text

Privacy preserving data publishing of categorical data through k ‐anonymity and feature selection

Healthcare Technology Letters ◽

10.1049/htl.2015.0050 ◽

2016 ◽

Vol 3 (1) ◽

pp. 16-21 ◽

Cited By ~ 10

Author(s):

Aristos Aristodimou ◽

Athos Antoniades ◽

Constantinos S. Pattichis

Keyword(s):

Feature Selection ◽

Categorical Data ◽

Privacy Preserving ◽

Data Publishing ◽

Privacy Preserving Data Publishing

Download Full-text

Anonymization Based on Improved Bucketization (AIB): A Privacy-Preserving Data Publishing Technique for Improving Data Utility in Healthcare Data

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3901 ◽

2021 ◽

Vol 11 (12) ◽

pp. 3164-3173

Author(s):

R. Indhumathi ◽

S. Sathiya Devi

Keyword(s):

Medical Information ◽

Threshold Value ◽

Privacy Preserving ◽

Data Publishing ◽

Published Data ◽

Sensitive Information ◽

Data Utility ◽

Healthcare Data ◽

Privacy Preserving Data Publishing ◽

Horizontal Partitioning

Data sharing is essential in present biomedical research. A large quantity of medical information is gathered and for different objectives of analysis and study. Because of its large collection, anonymity is essential. Thus, it is quite important to preserve privacy and prevent leakage of sensitive information of patients. Most of the Anonymization methods such as generalisation, suppression and perturbation are proposed to overcome the information leak which degrades the utility of the collected data. During data sanitization, the utility is automatically diminished. Privacy Preserving Data Publishing faces the main drawback of maintaining tradeoff between privacy and data utility. To address this issue, an efficient algorithm called Anonymization based on Improved Bucketization (AIB) is proposed, which increases the utility of published data while maintaining privacy. The Bucketization technique is used in this paper with the intervention of the clustering method. The proposed work is divided into three stages: (i) Vertical and Horizontal partitioning (ii) Assigning Sensitive index to attributes in the cluster (iii) Verifying each cluster against privacy threshold (iv) Examining for privacy breach in Quasi Identifier (QI). To increase the utility of published data, the threshold value is determined based on the distribution of elements in each attribute, and the anonymization method is applied only to the specific QI element. As a result, the data utility has been improved. Finally, the evaluation results validated the design of paper and demonstrated that our design is effective in improving data utility.

Download Full-text