disclosure control Latest Research Papers

Statistical disclosure control for continuous variables using an extended skew‐t copula

Applied Stochastic Models in Business and Industry ◽

10.1002/asmb.2650 ◽

2021 ◽

Author(s):

Amanda M. Y. Chu ◽

Chun Yin Ip ◽

Benson S. Y. Lam ◽

Mike K. P. So

Keyword(s):

Continuous Variables ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure ◽

T Copula

Designing a User Interface for Improving the Usability of a Statistical Disclosure Control Tool

10.1109/ispa-bdcloud-socialcom-sustaincom52081.2021.00212 ◽

2021 ◽

Author(s):

Anshika Rawat ◽

Marijn Janssen ◽

Mortaza S. Bargh ◽

Sunil Choenni

Keyword(s):

User Interface ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure ◽

Control Tool

Resampling methods for generating continuous multivariate synthetic data for disclosure control

Journal of Data, Information and Management ◽

10.1007/s42488-021-00054-2 ◽

2021 ◽

Author(s):

Atikur R Khan ◽

Enamul Kabir

Keyword(s):

Synthetic Data ◽

Resampling Methods ◽

Disclosure Control

Microaggregation heuristic applied to statistical disclosure control

Information Sciences ◽

10.1016/j.ins.2020.09.069 ◽

2021 ◽

Vol 548 ◽

pp. 37-55

Author(s):

Augusto César Fadel ◽

Luiz Satoru Ochi ◽

José André de Moura Brito ◽

Gustavo Silva Semaan

Keyword(s):

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

WiP: A Distributed Approach for Statistical Disclosure Control Technologies

Information Systems Security - Lecture Notes in Computer Science ◽

10.1007/978-3-030-92571-0_9 ◽

2021 ◽

pp. 142-153

Author(s):

Afshin Amighi ◽

Mortaza S. Bargh ◽

Ahmad Omar

Keyword(s):

Statistical Disclosure Control ◽

Disclosure Control ◽

Distributed Approach ◽

Statistical Disclosure ◽

Control Technologies

Statistical Disclosure Control Methods for Harmonised Protection of Census Data: a Grid Case

Demografie ◽

10.54694/dem.0285 ◽

2021 ◽

Vol 63 (4) ◽

pp. 199-215

Author(s):

Jaroslav Kraus

Keyword(s):

Data Protection ◽

Perturbation Methods ◽

Census Data ◽

Research Question ◽

Two Dimensions ◽

Statistical Disclosure Control ◽

Grid Data ◽

The Czech Republic ◽

Disclosure Control ◽

Statistical Disclosure

The 2011 Population and Housing Census in the Czech Republic was accompanied by a significant change in the technology used to prepare course of the fieldwork, along with changes in how the data are processed and how the outputs are disseminated. Grids are regular polygon networks that divide the territory of country in a grid-like way/pattern into equally large territorial units, to which aggregate statistical data are assigned. The disadvantage of grids is that these are territorially small units that are often minimally populated. This mainly has implications for the protection of individual data, which is associated with statistical disclosure control (SDC). The research question addressed in this paper is whether data protection (perturbation methods) leads to a change in the characteristics of the file either in terms of statistics of the whole file (i.e. for all grids) or in terms of spatial statistics, which indicate the spatial distribution of the analysed phenomenon. Two possible solutions to the issue of grid data protection are discussed. One comes from the Statistical Office of the European Communities (Eurostat) and the other from Cantabular, which is a product of the Sensible Code Company (SCC) based in Belfast. According to the Cantabular methodology, one variant was processed, while according to the Eurostat methodology, two variants were calculated, which differ by the parameter settings for maximum noise D and the variance of noise V. The results of the descriptive statistics show a difference in absolute differences when Cantabular and Europstat solutions are compared. In the case of other statistics, the results are fully comparable. This paper is devoted to one specific type of census output. The question is to what extent these results are relevant for other types of census outputs. They differ fundamentally in the number of dimensions (grids have only two dimensions). It would therefore be appropriate to use SDC procedures that allow greater flexibility in defining SDC parameters.

Towards Machine Learning-Assisted Output Checking for Statistical Disclosure Control

10.1007/978-3-030-85529-1_27 ◽

2021 ◽

pp. 335-345

Author(s):

Josep Domingo-Ferrer ◽

Alberto Blanco-Justicia

Keyword(s):

Machine Learning ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Statistical Disclosure

Random disclosure in confidential statistical databases

Statistical Journal of the IAOS ◽

10.3233/sji-200704 ◽

2020 ◽

pp. 1-13

Author(s):

Rainer Lenz ◽

Tim Hochgürtel

Keyword(s):

Real World ◽

Data File ◽

Data Sets ◽

Statistical Disclosure Control ◽

Random Assignment ◽

National Legislation ◽

Disclosure Control ◽

Privacy Requirements ◽

Confidential Data ◽

Statistical Disclosure

As part of statistical disclosure control National Statistical Offices can only deliver confidential data being sufficiently protected meeting national legislation. When releasing confidential microdata to users, data holders usually apply what are called anonymisation methods to the data. In order to fulfil the privacy requirements, it is possible to measure the level of privacy of some confidential data file by simulating potential data intrusion scenarios matching publicly or commercially available data with the entire set of confidential data, both sharing a non-empty set of variables (quasi-identifiers). According to real world microdata, incompatibility between data sets and not unique combinations of quasi-identifiers are very likely. In this situation, it is nearly impossible to decide whether or not two records refer to the same underlying statistical unit. Even a successful assignment of records may be a fruitless disclosure attempt, if a rationale data intruder would keep distance from that match. The paper lines out that disclosure risks estimated thus far are overrated in the sense that revealed information is always a combination of both, systematically derived results and non-negligible random assignment.

Runners, repeaters, strangers and aliens: Operationalising efficient output disclosure control

Statistical Journal of the IAOS ◽

10.3233/sji-200661 ◽

2020 ◽

Vol 36 (4) ◽

pp. 1281-1293

Author(s):

Kyle Alves ◽

Felix Ritchie

Keyword(s):

Operations Management ◽

Practical Implementation ◽

Sensitive Data ◽

Disclosure Control ◽

Management Literature ◽

Statistical Disclosure ◽

Statistical Agencies ◽

Checking Procedure ◽

Staffing Needs ◽

Research Facilities

Statistical agencies and other government bodies increasingly use secure remote research facilities to provide access to sensitive data for research and analysis by internal staff and third parties. Such facilities depend on human intervention to ensure that the research outputs do not breach statistical disclosure control (SDC) rules. Output SDC can be principles-based, rules-based, or something in between. Principles-based is often seen as the gold standard statistically, as it improves both confidentiality protection and utility of outputs. However, some agencies are concerned that the operational requirements are too onerous for practical implementation, despite these statistical advantages. This paper argues that the choice of output checking procedure should be seen through an operational lens, rather than a statistical one. We take a popular conceptualisation of customer demand from the operations management literature and apply it to the problem of output checking. We demonstrate that principles-based output SDC addresses user and agency requirements more effectively than other approaches, and in a way which encourages user buy-in to the process. We also demonstrate how the principles-based approach aligns better with the statistical and staffing needs of the agency.

Information loss resulting from statistical disclosure control of output data

Wiadomości Statystyczne. The Polish Statistician ◽

10.5604/01.3001.0014.4121 ◽

2020 ◽

Vol 65 (9) ◽

pp. 7-27

Author(s):

Andrzej Młodak

Keyword(s):

Inverse Correlation ◽

Original Data ◽

Information Loss ◽

Sensitive Information ◽

Statistical Disclosure Control ◽

Output Data ◽

Specific Data ◽

Disclosure Control ◽

Statistical Disclosure

The most important methods of assessing information loss caused by statistical disclosure control (SDC) are presented in the paper. The aim of SDC is to protect an individual against identification or obtaining any sensitive information relating to them by anyone unauthorised. The application of methods based either on the concealment of specific data or on their perturbation results in information loss, which affects the quality of output data, including the distributions of variables, the forms of relationships between them, or any estimations. The aim of this paper is to perform a critical analysis of the strengths and weaknesses of the particular types of methods of assessing information loss resulting from SDC. Moreover, some novel ideas on how to obtain effective and well-interpretable measures are proposed, including an innovative way of using a cyclometric function (arcus tangent) to determine the deviation of values from the original ones, as a result of SDC. Additionally, the inverse correlation matrix was applied in order to assess the influence of SDC on the strength of relationships between variables. The first presented method allows obtaining effective and well- -interpretable measures, while the other makes it possible to fully use the potential of the mutual relationships between variables (including the ones difficult to detect by means of classical statistical methods) for a better analysis of the consequences of SDC. Among other findings, the empirical verification of the utility of the suggested methods confirmed the superiority of the cyclometric function in measuring the distance between the curved deviations and the original data, and also heighlighted the need for a skilful correction of its flattening when large value arguments occur.

disclosure control
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Statistical disclosure control for continuous variables using an extended skew‐t copula

Designing a User Interface for Improving the Usability of a Statistical Disclosure Control Tool

Resampling methods for generating continuous multivariate synthetic data for disclosure control

Microaggregation heuristic applied to statistical disclosure control

WiP: A Distributed Approach for Statistical Disclosure Control Technologies

Statistical Disclosure Control Methods for Harmonised Protection of Census Data: a Grid Case

Towards Machine Learning-Assisted Output Checking for Statistical Disclosure Control

Random disclosure in confidential statistical databases

Runners, repeaters, strangers and aliens: Operationalising efficient output disclosure control

Information loss resulting from statistical disclosure control of output data

Export Citation Format

disclosure controlRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Statistical disclosure control for continuous variables using an extended skew‐t copula

Designing a User Interface for Improving the Usability of a Statistical Disclosure Control Tool

Resampling methods for generating continuous multivariate synthetic data for disclosure control

Microaggregation heuristic applied to statistical disclosure control

WiP: A Distributed Approach for Statistical Disclosure Control Technologies

Statistical Disclosure Control Methods for Harmonised Protection of Census Data: a Grid Case

Towards Machine Learning-Assisted Output Checking for Statistical Disclosure Control

Random disclosure in confidential statistical databases

Runners, repeaters, strangers and aliens: Operationalising efficient output disclosure control

Information loss resulting from statistical disclosure control of output data

disclosure control
Recently Published Documents