Rounding based continuous data discretization for statistical disclosure control

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-019-01489-7 ◽

2019 ◽

Author(s):

Navoda Senavirathne ◽

Vicenç Torra

Keyword(s):

Machine Learning ◽

Data Protection ◽

Data Privacy ◽

Alternative Methods ◽

Continuous Data ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Discretization Methods ◽

Statistical Disclosure ◽

The Impact

Abstract “Rounding” can be understood as a way to coarsen continuous data. That is, low level and infrequent values are replaced by high-level and more frequent representative values. This concept is explored as a method for data privacy with techniques like rounding, microaggregation, and generalisation. This concept is explored as a method for data privacy in statistical disclosure control literature with perturbative techniques like rounding, microaggregation and non-perturbative methods like generalisation. Even though “rounding” is well known as a numerical data protection method, it has not been studied in depth or evaluated empirically to the best of our knowledge. This work is motivated by three objectives, (1) to study the alternative methods of obtaining the rounding values to represent a given continuous variable, (2) to empirically evaluate rounding as a data protection technique based on information loss (IL) and disclosure risk (DR), and (3) to analyse the impact of data rounding on machine learning based models. Here, in order to obtain the rounding values we consider discretization methods introduced in the unsupervised machine learning literature along with microaggregation and re-sampling based approaches. The results indicate that microaggregation based techniques are preferred over unsupervised discretization methods due to their fair trade-off between IL and DR.

Download Full-text

Statistical Disclosure Control Methods for Microdata from the Labour Force Survey

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.348.01 ◽

2020 ◽

Vol 3 (348) ◽

pp. 7-24

Author(s):

Michał Pietrzak

Keyword(s):

Labour Force ◽

Control Methods ◽

Labour Force Survey ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Original Dataset ◽

Level Data ◽

Statistical Disclosure ◽

The Impact

The aim of this article is to analyse the possibility of applying selected perturbative masking methods of Statistical Disclosure Control to microdata, i.e. unit‑level data from the Labour Force Survey. In the first step, the author assessed to what extent the confidentiality of information was protected in the original dataset. In the second step, after applying selected methods implemented in the sdcMicro package in the R programme, the impact of those methods on the disclosure risk, the loss of information and the quality of estimation of population quantities was assessed. The conclusion highlights some problematic aspects of the use of Statistical Disclosure Control methods which were observed during the conducted analysis.

Download Full-text

Combining Machine Learning and Statistical Disclosure Control to Promote Open Data

Communications in Computer and Information Science - Data Mining ◽

10.1007/978-981-13-6661-1_7 ◽

2019 ◽

pp. 83-93

Author(s):

Nasca Peng

Keyword(s):

Machine Learning ◽

Open Data ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Feedback-Based Integration of the Whole Process of Data Anonymization in a Graphical Interface

Algorithms ◽

10.3390/a12090191 ◽

2019 ◽

Vol 12 (9) ◽

pp. 191

Author(s):

Bernhard Meindl ◽

Matthias Templ

Keyword(s):

Statistical Disclosure Control ◽

Software Environment ◽

Web Based ◽

Disclosure Control ◽

Data Anonymization ◽

Know How ◽

Whole Process ◽

Statistical Disclosure ◽

Anonymized Data ◽

The Impact

The interactive, web-based point-and-click application presented in this article, allows anonymizing data without any knowledge in a programming language. Anonymization in data mining, but creating safe, anonymized data is by no means a trivial task. Both the methodological issues as well as know-how from subject matter specialists should be taken into account when anonymizing data. Even though specialized software such as sdcMicro exists, it is often difficult for nonexperts in a particular software and without programming skills to actually anonymize datasets without an appropriate app. The presented app is not restricted to apply disclosure limitation techniques but rather facilitates the entire anonymization process. This interface allows uploading data to the system, modifying them and to create an object defining the disclosure scenario. Once such a statistical disclosure control (SDC) problem has been defined, users can apply anonymization techniques to this object and get instant feedback on the impact on risk and data utility after SDC methods have been applied. Additional features, such as an Undo Button, the possibility to export the anonymized dataset or the required code for reproducibility reasons, as well its interactive features, make it convenient both for experts and nonexperts in R—the free software environment for statistical computing and graphics—to protect a dataset using this app.

Download Full-text

A Case Study of the Impact of Statistical Disclosure Control on Data Quality in the Individual UK Samples of Anonymised Records

Environment and Planning A Economy and Space ◽

10.1068/a38335 ◽

2007 ◽

Vol 39 (5) ◽

pp. 1101-1118 ◽

Cited By ~ 15

Author(s):

Kingsley Purdam ◽

Mark Elliot

Keyword(s):

Data Quality ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure ◽

The Individual ◽

The Impact

Download Full-text

Statistical Disclosure Control Methods for Harmonised Protection of Census Data: a Grid Case

Demografie ◽

10.54694/dem.0285 ◽

2021 ◽

Vol 63 (4) ◽

pp. 199-215

Author(s):

Jaroslav Kraus

Keyword(s):

Data Protection ◽

Perturbation Methods ◽

Census Data ◽

Research Question ◽

Two Dimensions ◽

Statistical Disclosure Control ◽

Grid Data ◽

The Czech Republic ◽

Disclosure Control ◽

Statistical Disclosure

The 2011 Population and Housing Census in the Czech Republic was accompanied by a significant change in the technology used to prepare course of the fieldwork, along with changes in how the data are processed and how the outputs are disseminated. Grids are regular polygon networks that divide the territory of country in a grid-like way/pattern into equally large territorial units, to which aggregate statistical data are assigned. The disadvantage of grids is that these are territorially small units that are often minimally populated. This mainly has implications for the protection of individual data, which is associated with statistical disclosure control (SDC). The research question addressed in this paper is whether data protection (perturbation methods) leads to a change in the characteristics of the file either in terms of statistics of the whole file (i.e. for all grids) or in terms of spatial statistics, which indicate the spatial distribution of the analysed phenomenon. Two possible solutions to the issue of grid data protection are discussed. One comes from the Statistical Office of the European Communities (Eurostat) and the other from Cantabular, which is a product of the Sensible Code Company (SCC) based in Belfast. According to the Cantabular methodology, one variant was processed, while according to the Eurostat methodology, two variants were calculated, which differ by the parameter settings for maximum noise D and the variance of noise V. The results of the descriptive statistics show a difference in absolute differences when Cantabular and Europstat solutions are compared. In the case of other statistics, the results are fully comparable. This paper is devoted to one specific type of census output. The question is to what extent these results are relevant for other types of census outputs. They differ fundamentally in the number of dimensions (grids have only two dimensions). It would therefore be appropriate to use SDC procedures that allow greater flexibility in defining SDC parameters.

Download Full-text

Towards Machine Learning-Assisted Output Checking for Statistical Disclosure Control

10.1007/978-3-030-85529-1_27 ◽

2021 ◽

pp. 335-345

Author(s):

Josep Domingo-Ferrer ◽

Alberto Blanco-Justicia

Keyword(s):

Machine Learning ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Statistical Disclosure Control for Data Privacy Using Sequence of Generalised Linear Models

Information Security and Privacy - Lecture Notes in Computer Science ◽

10.1007/978-3-319-40253-6_5 ◽

2016 ◽

pp. 77-93 ◽

Cited By ~ 1

Author(s):

Min Cherng Lee ◽

Robin Mitra ◽

Emmanuel Lazaridis ◽

An Chow Lai ◽

Yong Kheng Goh ◽

...

Keyword(s):

Data Privacy ◽

Linear Models ◽

Statistical Disclosure Control ◽

Generalised Linear Models ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Statistical Disclosure Control for Data Privacy Preservation

International Journal of Computer Applications ◽

10.5120/13899-1880 ◽

2013 ◽

Vol 80 (10) ◽

pp. 38-43 ◽

Cited By ~ 1

Author(s):

Sarat KumarChettri ◽

Bonani Paul ◽

Ajoy Krishna Dutta

Keyword(s):

Data Privacy ◽

Privacy Preservation ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Density-based microaggregation for statistical disclosure control

Expert Systems with Applications ◽

10.1016/j.eswa.2009.09.054 ◽

2010 ◽

Vol 37 (4) ◽

pp. 3256-3263 ◽

Cited By ~ 42

Author(s):

Jun-Lin Lin ◽

Tsung-Hsien Wen ◽

Jui-Chien Hsieh ◽

Pei-Chann Chang

Keyword(s):

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text

Understanding Microaggregation- A technique of Statistical Disclosure Control for Privacy Preserving and Data Publishing in Inter-Cloud

2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC) ◽

10.1109/icaecc.2018.8479452 ◽

2018 ◽

Author(s):

Veena Gadad ◽

Sowmyarani C.N.

Keyword(s):

Privacy Preserving ◽

Data Publishing ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Download Full-text