scholarly journals Post-Masking Optimization of the Tradeoff between Information Loss and Disclosure Risk in Masked Microdata Sets

Author(s):  
Francesc Sebé ◽  
Josep Domingo-Ferrer ◽  
Josep Maria Mateo-Sanz ◽  
Vicenç Torra
Data ◽  
2021 ◽  
Vol 6 (5) ◽  
pp. 53
Author(s):  
Ebaa Fayyoumi ◽  
Omar Alhuniti

This research investigates the micro-aggregation problem in secure statistical databases by integrating the divide and conquer concept with a genetic algorithm. This is achieved by recursively dividing a micro-data set into two subsets based on the proximity distance similarity. On each subset the genetic operation “crossover” is performed until the convergence condition is satisfied. The recursion will be terminated if the size of the generated subset is satisfied. Eventually, the genetic operation “mutation” will be performed over all generated subsets that satisfied the variable group size constraint in order to maximize the objective function. Experimentally, the proposed micro-aggregation technique was applied to recommended real-life data sets. Results demonstrated a remarkable reduction in the computational time, which sometimes exceeded 70% compared to the state-of-the-art. Furthermore, a good equilibrium value of the Scoring Index (SI) was achieved by involving a linear combination of the General Information Loss (GIL) and the General Disclosure Risk (GDR).


Author(s):  
Avinash C. Singh

Consider a complete rectangular database at the micro (or unit) level from a survey (sample or census) or nonsurvey (administrative source) in which potential identifying variables (IVs) are suitably categorized (so that the analytic utility is essentially maintained) for reducing the pretreatment disclosure risk to the extent possible. The pretreatment risk is due to the presence of unique records (with respect to IVs) or nonuniques (i.e., more than one record having a common IV profile) with similar values of at least one sensitive variable (SV). This setup covers macro (or aggregate) level data including tabular data because a common mean value (of 1 in the case of count data) to all units in the aggregation or cell can be assigned. Our goal is to create a public use file with simultaneous control of disclosure risk and information loss after disclosure treatment by perturbation (i.e., substitution of IVs and not SVs) and suppression (i.e., subsampling-out of records). In this paper, an alternative framework of measuring information loss and disclosure risk under a nonsynthetic approach as proposed by Singh (2002, 2006) is considered which, in contrast to the commonly used deterministic treatment, is based on a stochastic selection of records for disclosure treatment in the sense that all records are subject to treatment (with possibly different probabilities), but only a small proportion of them are actually treated. We also propose an extension of the above alternative framework of Singh with the goal of generalizing risk measures to allow partial risk scores for unique and nonunique records. Survey sampling techniques of sample allocation are used to assign substitution and subsampling rates to risk strata defined by unique and nonunique records such that bias due to substitution and variance due to subsampling for main study variables (functions of SVs and IVs) are minimized. This is followed by calibration to controls based on original estimates of main study variables so that these estimates are preserved, and bias and variance for other study variables may also be reduced. The above alternative framework leads to the method of disclosure treatment known as MASSC (signifying micro-agglomeration, substitution, subsampling, and calibration) and to an enhanced method (denoted GenMASSC) which uses generalized risk measures. The GenMASSC method is illustrated through a simple example followed by a discussion of relative merits and demerits of nonsynthetic and synthetic methods of disclosure treatment.


Author(s):  
Nicolas Ruiz

Over the years, the literature on individual data anonymization has burgeoned in many directions. While such diversity should be praised, it does not come without some difficulties. Currently, the task of selecting the optimal analytical environment is complicated by the multitude of available choices and the fact that the performance of any method is generally dependent of the data properties. In light of these issues, the contribution of this paper is twofold. First, based on recent insights from the literature and inspired by cryptography, it proposes a new anonymization method that shows that the task of anonymization can ultimately rely only on ranks permutations. As a result, the method offers a new way to practice data anonymization by performing it ex-ante and independently of the distributional features of the data instead of being engaged, as it is currently the case in the literature, in several ex-post evaluations and iterations to reach the protection and information properties sought after. Second, the method establishes a conceptual connection across the field, as it can mimic all the currently existing tools. To make the method operational, this paper proposes also the introduction of permutation menus in data anonymization, where recently developed universal measures of disclosure risk and information loss are used ex-ante for the calibration of permutation keys. To justify the relevance of their uses, a theoretical characterization of these measures is also proposed.


2020 ◽  
Vol 39 (5) ◽  
pp. 5999-6008
Author(s):  
Vicenç Torra

Microaggregation is an effective data-driven protection method that permits us to achieve a good trade-off between disclosure risk and information loss. In this work we propose a method for microaggregation based on fuzzy c-means, that is appropriate when there are constraints (linear constraints) on the variables that describe the data. Our method leads to results that satisfy these constraints even when the data to be masked do not satisfy them.


2019 ◽  
Vol 2 (1) ◽  
pp. 1-17
Author(s):  
Muhammad Ibnu Pamungkas ◽  
Izzuddin Musthafa ◽  
Muhammad Nurhasan

Ta’lim Muta’alim is Syaikh al-Zarnūjī’s opus that consists of norms, ethics, and rules for gaining knowledge based on Islamic teachings. Thus, claimants of science could reach their goals to obtain it. This book was translated by Achmad Sunarto into Indonesian language and published by Husaini Publisher in Bandung. After reading it totally, researcher found mistakes in translation, especially mistakes in words selection (diction) in translation. And after analyzed it, researcher formulate the mistakes into 4 parts, (1) translation that is the result of direct transliteration from SL without considering its compability in TL, (2) existence of information loss and gain that effects the translation itself and makes it unsuitable, (3) choosing a word which is not suit with the meaning reference from the source text, (4) translation is unacceptable in TL because it is translated literally.


2020 ◽  
Vol 2020 (12) ◽  
Author(s):  
Hsu-Wen Chiang ◽  
Yu-Hsien Kung ◽  
Pisin Chen

Abstract One interesting proposal to solve the black hole information loss paradox without modifying either general relativity or quantum field theory, is the soft hair, a diffeomorphism charge that records the anisotropic radiation in the asymptotic region. This proposal, however, has been challenged, given that away from the source the soft hair behaves as a coordinate transformation that forms an Abelian group, thus unable to store any information. To maintain the spirit of the soft hair but circumvent these obstacles, we consider Hawking radiation as a probe sensitive to the entire history of the black hole evaporation, where the soft hairs on the horizon are induced by the absorption of a null anisotropic flow, generalizing the shock wave considered in [1, 2]. To do so we introduce two different time-dependent extensions of the diffeomorphism associated with the soft hair, where one is the backreaction of the anisotropic null flow, and the other is a coordinate transformation that produces the Unruh effect and a Doppler shift to the Hawking spectrum. Together, they form an exact BMS charge generator on the entire manifold that allows the nonperturbative analysis of the black hole horizon, whose surface gravity, i.e. the Hawking temperature, is found to be modified. The modification depends on an exponential average of the anisotropy of the null flow with a decay rate of 4M, suggesting the emergence of a new 2-D degree of freedom on the horizon, which could be a way out of the information loss paradox.


Sign in / Sign up

Export Citation Format

Share Document