ON THE SECURITY OF MICROAGGREGATION WITH INDIVIDUAL RANKING: ANALYTICAL ATTACKS

Microaggregation is a statistical disclosure control technique. Raw microdata (i.e. individual records) are grouped into small aggregates prior to publication. With fixed-size groups, each aggregate contains k records to prevent disclosure of individual information. Individual ranking is a usual criterion to reduce multivariate microaggregation to univariate case: the idea is to perform microaggregation independently for each variable in the record. Using distributional assumptions, we show in this paper how to find interval estimates for the original data based on the microaggregated data. Such intervals can be considerably narrower than intervals resulting from subtraction of means, and can be useful to detect lack of security in a microaggregated data set. Analytical arguments given in this paper confirm recent empirical results about the unsafety of individual ranking microaggregation.

Download Full-text

An Empirical Study of Applying Statistical Disclosure Control Methods to Public Health Research

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16224519 ◽

2019 ◽

Vol 16 (22) ◽

pp. 4519

Author(s):

Amanda M. Y. Chu ◽

Benson S. Y. Lam ◽

Agnes Tiwari ◽

Mike K. P. So

Keyword(s):

Public Health ◽

Health Care ◽

Empirical Study ◽

Health Research ◽

Original Data ◽

Public Health Research ◽

Statistical Disclosure Control ◽

Data Perturbation ◽

Disclosure Control ◽

Statistical Disclosure

Patient data or information collected from public health and health care surveys are of great research value. Usually, the data contain sensitive personal information. Doctors, nurses, or researchers in the public health and health care sector do not analyze the available datasets or survey data on their own, and may outsource the tasks to third parties. Even though all identifiers such as names and ID card numbers are removed, there may still be some occasions in which an individual can be re-identified via the demographic or particular information provided in the datasets. Such data privacy issues can become an obstacle in health-related research. Statistical disclosure control (SDC) is a useful technique used to resolve this problem by masking and designing released data based on the original data. Whilst ensuring the released data can satisfy the needs of researchers for data analysis, there is high protection of the original data from disclosure. In this research, we discuss the statistical properties of two SDC methods: the General Additive Data Perturbation (GADP) method and the Gaussian Copula General Additive Data Perturbation (CGADP) method. An empirical study is provided to demonstrate how we can apply these two SDC methods in public health research.

Download Full-text

Information loss resulting from statistical disclosure control of output data

Wiadomości Statystyczne. The Polish Statistician ◽

10.5604/01.3001.0014.4121 ◽

2020 ◽

Vol 65 (9) ◽

pp. 7-27

Author(s):

Andrzej Młodak

Keyword(s):

Inverse Correlation ◽

Original Data ◽

Information Loss ◽

Sensitive Information ◽

Statistical Disclosure Control ◽

Output Data ◽

Specific Data ◽

Disclosure Control ◽

Statistical Disclosure

The most important methods of assessing information loss caused by statistical disclosure control (SDC) are presented in the paper. The aim of SDC is to protect an individual against identification or obtaining any sensitive information relating to them by anyone unauthorised. The application of methods based either on the concealment of specific data or on their perturbation results in information loss, which affects the quality of output data, including the distributions of variables, the forms of relationships between them, or any estimations. The aim of this paper is to perform a critical analysis of the strengths and weaknesses of the particular types of methods of assessing information loss resulting from SDC. Moreover, some novel ideas on how to obtain effective and well-interpretable measures are proposed, including an innovative way of using a cyclometric function (arcus tangent) to determine the deviation of values from the original ones, as a result of SDC. Additionally, the inverse correlation matrix was applied in order to assess the influence of SDC on the strength of relationships between variables. The first presented method allows obtaining effective and well- -interpretable measures, while the other makes it possible to fully use the potential of the mutual relationships between variables (including the ones difficult to detect by means of classical statistical methods) for a better analysis of the consequences of SDC. Among other findings, the empirical verification of the utility of the suggested methods confirmed the superiority of the cyclometric function in measuring the distance between the curved deviations and the original data, and also heighlighted the need for a skilful correction of its flattening when large value arguments occur.

Download Full-text

Density-based microaggregation for statistical disclosure control

Expert Systems with Applications ◽

10.1016/j.eswa.2009.09.054 ◽

2010 ◽

Vol 37 (4) ◽

pp. 3256-3263 ◽

Cited By ~ 42

Author(s):

Jun-Lin Lin ◽

Tsung-Hsien Wen ◽

Jui-Chien Hsieh ◽

Pei-Chann Chang

Keyword(s):

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Understanding Microaggregation- A technique of Statistical Disclosure Control for Privacy Preserving and Data Publishing in Inter-Cloud

2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC) ◽

10.1109/icaecc.2018.8479452 ◽

2018 ◽

Author(s):

Veena Gadad ◽

Sowmyarani C.N.

Keyword(s):

Privacy Preserving ◽

Data Publishing ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Integrating Differential Privacy in the Statistical Disclosure Control Tool-Kit for Synthetic Data Production

Privacy in Statistical Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-57521-2_19 ◽

2020 ◽

pp. 271-280

Author(s):

Natalie Shlomo

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Data Production ◽

Statistical Disclosure ◽

Control Tool

Download Full-text

Statistical Disclosure Control Methods for Microdata from the Labour Force Survey

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.348.01 ◽

2020 ◽

Vol 3 (348) ◽

pp. 7-24

Author(s):

Michał Pietrzak

Keyword(s):

Labour Force ◽

Control Methods ◽

Labour Force Survey ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Original Dataset ◽

Level Data ◽

Statistical Disclosure ◽

The Impact

The aim of this article is to analyse the possibility of applying selected perturbative masking methods of Statistical Disclosure Control to microdata, i.e. unit‑level data from the Labour Force Survey. In the first step, the author assessed to what extent the confidentiality of information was protected in the original dataset. In the second step, after applying selected methods implemented in the sdcMicro package in the R programme, the impact of those methods on the disclosure risk, the loss of information and the quality of estimation of population quantities was assessed. The conclusion highlights some problematic aspects of the use of Statistical Disclosure Control methods which were observed during the conducted analysis.

Download Full-text