Towards Machine Learning-Assisted Output Checking for Statistical Disclosure Control

Abstract “Rounding” can be understood as a way to coarsen continuous data. That is, low level and infrequent values are replaced by high-level and more frequent representative values. This concept is explored as a method for data privacy with techniques like rounding, microaggregation, and generalisation. This concept is explored as a method for data privacy in statistical disclosure control literature with perturbative techniques like rounding, microaggregation and non-perturbative methods like generalisation. Even though “rounding” is well known as a numerical data protection method, it has not been studied in depth or evaluated empirically to the best of our knowledge. This work is motivated by three objectives, (1) to study the alternative methods of obtaining the rounding values to represent a given continuous variable, (2) to empirically evaluate rounding as a data protection technique based on information loss (IL) and disclosure risk (DR), and (3) to analyse the impact of data rounding on machine learning based models. Here, in order to obtain the rounding values we consider discretization methods introduced in the unsupervised machine learning literature along with microaggregation and re-sampling based approaches. The results indicate that microaggregation based techniques are preferred over unsupervised discretization methods due to their fair trade-off between IL and DR.

Download Full-text

Density-based microaggregation for statistical disclosure control

Expert Systems with Applications ◽

10.1016/j.eswa.2009.09.054 ◽

2010 ◽

Vol 37 (4) ◽

pp. 3256-3263 ◽

Cited By ~ 42

Author(s):

Jun-Lin Lin ◽

Tsung-Hsien Wen ◽

Jui-Chien Hsieh ◽

Pei-Chann Chang

Keyword(s):

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Understanding Microaggregation- A technique of Statistical Disclosure Control for Privacy Preserving and Data Publishing in Inter-Cloud

2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC) ◽

10.1109/icaecc.2018.8479452 ◽

2018 ◽

Author(s):

Veena Gadad ◽

Sowmyarani C.N.

Keyword(s):

Privacy Preserving ◽

Data Publishing ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Integrating Differential Privacy in the Statistical Disclosure Control Tool-Kit for Synthetic Data Production

Privacy in Statistical Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-57521-2_19 ◽

2020 ◽

pp. 271-280

Author(s):

Natalie Shlomo

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Data Production ◽

Statistical Disclosure ◽

Control Tool

Download Full-text

Statistical Disclosure Control Methods for Microdata from the Labour Force Survey

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.348.01 ◽

2020 ◽

Vol 3 (348) ◽

pp. 7-24

Author(s):

Michał Pietrzak

Keyword(s):

Labour Force ◽

Control Methods ◽

Labour Force Survey ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Original Dataset ◽

Level Data ◽

Statistical Disclosure ◽

The Impact

The aim of this article is to analyse the possibility of applying selected perturbative masking methods of Statistical Disclosure Control to microdata, i.e. unit‑level data from the Labour Force Survey. In the first step, the author assessed to what extent the confidentiality of information was protected in the original dataset. In the second step, after applying selected methods implemented in the sdcMicro package in the R programme, the impact of those methods on the disclosure risk, the loss of information and the quality of estimation of population quantities was assessed. The conclusion highlights some problematic aspects of the use of Statistical Disclosure Control methods which were observed during the conducted analysis.

Download Full-text