Statistical Disclosure Control Methods for Harmonised Protection of Census Data: a Grid Case

The 2011 Population and Housing Census in the Czech Republic was accompanied by a significant change in the technology used to prepare course of the fieldwork, along with changes in how the data are processed and how the outputs are disseminated. Grids are regular polygon networks that divide the territory of country in a grid-like way/pattern into equally large territorial units, to which aggregate statistical data are assigned. The disadvantage of grids is that these are territorially small units that are often minimally populated. This mainly has implications for the protection of individual data, which is associated with statistical disclosure control (SDC). The research question addressed in this paper is whether data protection (perturbation methods) leads to a change in the characteristics of the file either in terms of statistics of the whole file (i.e. for all grids) or in terms of spatial statistics, which indicate the spatial distribution of the analysed phenomenon. Two possible solutions to the issue of grid data protection are discussed. One comes from the Statistical Office of the European Communities (Eurostat) and the other from Cantabular, which is a product of the Sensible Code Company (SCC) based in Belfast. According to the Cantabular methodology, one variant was processed, while according to the Eurostat methodology, two variants were calculated, which differ by the parameter settings for maximum noise D and the variance of noise V. The results of the descriptive statistics show a difference in absolute differences when Cantabular and Europstat solutions are compared. In the case of other statistics, the results are fully comparable. This paper is devoted to one specific type of census output. The question is to what extent these results are relevant for other types of census outputs. They differ fundamentally in the number of dimensions (grids have only two dimensions). It would therefore be appropriate to use SDC procedures that allow greater flexibility in defining SDC parameters.

Download Full-text

Rounding based continuous data discretization for statistical disclosure control

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-019-01489-7 ◽

2019 ◽

Author(s):

Navoda Senavirathne ◽

Vicenç Torra

Keyword(s):

Machine Learning ◽

Data Protection ◽

Data Privacy ◽

Alternative Methods ◽

Continuous Data ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Discretization Methods ◽

Statistical Disclosure ◽

The Impact

Abstract “Rounding” can be understood as a way to coarsen continuous data. That is, low level and infrequent values are replaced by high-level and more frequent representative values. This concept is explored as a method for data privacy with techniques like rounding, microaggregation, and generalisation. This concept is explored as a method for data privacy in statistical disclosure control literature with perturbative techniques like rounding, microaggregation and non-perturbative methods like generalisation. Even though “rounding” is well known as a numerical data protection method, it has not been studied in depth or evaluated empirically to the best of our knowledge. This work is motivated by three objectives, (1) to study the alternative methods of obtaining the rounding values to represent a given continuous variable, (2) to empirically evaluate rounding as a data protection technique based on information loss (IL) and disclosure risk (DR), and (3) to analyse the impact of data rounding on machine learning based models. Here, in order to obtain the rounding values we consider discretization methods introduced in the unsupervised machine learning literature along with microaggregation and re-sampling based approaches. The results indicate that microaggregation based techniques are preferred over unsupervised discretization methods due to their fair trade-off between IL and DR.

Download Full-text

Density-based microaggregation for statistical disclosure control

Expert Systems with Applications ◽

10.1016/j.eswa.2009.09.054 ◽

2010 ◽

Vol 37 (4) ◽

pp. 3256-3263 ◽

Cited By ~ 42

Author(s):

Jun-Lin Lin ◽

Tsung-Hsien Wen ◽

Jui-Chien Hsieh ◽

Pei-Chann Chang

Keyword(s):

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Understanding Microaggregation- A technique of Statistical Disclosure Control for Privacy Preserving and Data Publishing in Inter-Cloud

2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC) ◽

10.1109/icaecc.2018.8479452 ◽

2018 ◽

Author(s):

Veena Gadad ◽

Sowmyarani C.N.

Keyword(s):

Privacy Preserving ◽

Data Publishing ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Integrating Differential Privacy in the Statistical Disclosure Control Tool-Kit for Synthetic Data Production

Privacy in Statistical Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-57521-2_19 ◽

2020 ◽

pp. 271-280

Author(s):

Natalie Shlomo

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Data Production ◽

Statistical Disclosure ◽

Control Tool

Download Full-text

Statistical Disclosure Control Methods for Microdata from the Labour Force Survey

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.348.01 ◽

2020 ◽

Vol 3 (348) ◽

pp. 7-24

Author(s):

Michał Pietrzak

Keyword(s):

Labour Force ◽

Control Methods ◽

Labour Force Survey ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Original Dataset ◽

Level Data ◽

Statistical Disclosure ◽

The Impact

The aim of this article is to analyse the possibility of applying selected perturbative masking methods of Statistical Disclosure Control to microdata, i.e. unit‑level data from the Labour Force Survey. In the first step, the author assessed to what extent the confidentiality of information was protected in the original dataset. In the second step, after applying selected methods implemented in the sdcMicro package in the R programme, the impact of those methods on the disclosure risk, the loss of information and the quality of estimation of population quantities was assessed. The conclusion highlights some problematic aspects of the use of Statistical Disclosure Control methods which were observed during the conducted analysis.

Download Full-text