ON THE SECURITY OF MICROAGGREGATION WITH INDIVIDUAL RANKING: ANALYTICAL ATTACKS

Author(s):  
JOSEP DOMINGO-FERRER ◽  
ANNA OGANIAN ◽  
ÀNGEL TORRES ◽  
JOSEP M. MATEO-SANZ

Microaggregation is a statistical disclosure control technique. Raw microdata (i.e. individual records) are grouped into small aggregates prior to publication. With fixed-size groups, each aggregate contains k records to prevent disclosure of individual information. Individual ranking is a usual criterion to reduce multivariate microaggregation to univariate case: the idea is to perform microaggregation independently for each variable in the record. Using distributional assumptions, we show in this paper how to find interval estimates for the original data based on the microaggregated data. Such intervals can be considerably narrower than intervals resulting from subtraction of means, and can be useful to detect lack of security in a microaggregated data set. Analytical arguments given in this paper confirm recent empirical results about the unsafety of individual ranking microaggregation.

Author(s):  
Amanda M. Y. Chu ◽  
Benson S. Y. Lam ◽  
Agnes Tiwari ◽  
Mike K. P. So

Patient data or information collected from public health and health care surveys are of great research value. Usually, the data contain sensitive personal information. Doctors, nurses, or researchers in the public health and health care sector do not analyze the available datasets or survey data on their own, and may outsource the tasks to third parties. Even though all identifiers such as names and ID card numbers are removed, there may still be some occasions in which an individual can be re-identified via the demographic or particular information provided in the datasets. Such data privacy issues can become an obstacle in health-related research. Statistical disclosure control (SDC) is a useful technique used to resolve this problem by masking and designing released data based on the original data. Whilst ensuring the released data can satisfy the needs of researchers for data analysis, there is high protection of the original data from disclosure. In this research, we discuss the statistical properties of two SDC methods: the General Additive Data Perturbation (GADP) method and the Gaussian Copula General Additive Data Perturbation (CGADP) method. An empirical study is provided to demonstrate how we can apply these two SDC methods in public health research.


2020 ◽  
Vol 65 (9) ◽  
pp. 7-27
Author(s):  
Andrzej Młodak

The most important methods of assessing information loss caused by statistical disclosure control (SDC) are presented in the paper. The aim of SDC is to protect an individual against identification or obtaining any sensitive information relating to them by anyone unauthorised. The application of methods based either on the concealment of specific data or on their perturbation results in information loss, which affects the quality of output data, including the distributions of variables, the forms of relationships between them, or any estimations. The aim of this paper is to perform a critical analysis of the strengths and weaknesses of the particular types of methods of assessing information loss resulting from SDC. Moreover, some novel ideas on how to obtain effective and well-interpretable measures are proposed, including an innovative way of using a cyclometric function (arcus tangent) to determine the deviation of values from the original ones, as a result of SDC. Additionally, the inverse correlation matrix was applied in order to assess the influence of SDC on the strength of relationships between variables. The first presented method allows obtaining effective and well- -interpretable measures, while the other makes it possible to fully use the potential of the mutual relationships between variables (including the ones difficult to detect by means of classical statistical methods) for a better analysis of the consequences of SDC. Among other findings, the empirical verification of the utility of the suggested methods confirmed the superiority of the cyclometric function in measuring the distance between the curved deviations and the original data, and also heighlighted the need for a skilful correction of its flattening when large value arguments occur.


2010 ◽  
Vol 37 (4) ◽  
pp. 3256-3263 ◽  
Author(s):  
Jun-Lin Lin ◽  
Tsung-Hsien Wen ◽  
Jui-Chien Hsieh ◽  
Pei-Chann Chang

2020 ◽  
Vol 3 (348) ◽  
pp. 7-24
Author(s):  
Michał Pietrzak

The aim of this article is to analyse the possibility of applying selected perturbative masking methods of Statistical Disclosure Control to microdata, i.e. unit‑level data from the Labour Force Survey. In the first step, the author assessed to what extent the confidentiality of information was protected in the original dataset. In the second step, after applying selected methods implemented in the sdcMicro package in the R programme, the impact of those methods on the disclosure risk, the loss of information and the quality of estimation of population quantities was assessed. The conclusion highlights some problematic aspects of the use of Statistical Disclosure Control methods which were observed during the conducted analysis.


Sign in / Sign up

Export Citation Format

Share Document