disclosure control
Recently Published Documents


TOTAL DOCUMENTS

163
(FIVE YEARS 27)

H-INDEX

19
(FIVE YEARS 3)

2021 ◽  
Vol 548 ◽  
pp. 37-55
Author(s):  
Augusto César Fadel ◽  
Luiz Satoru Ochi ◽  
José André de Moura Brito ◽  
Gustavo Silva Semaan

Demografie ◽  
2021 ◽  
Vol 63 (4) ◽  
pp. 199-215
Author(s):  
Jaroslav Kraus

The 2011 Population and Housing Census in the Czech Republic was accompanied by a significant change in the technology used to prepare course of the fieldwork, along with changes in how the data are processed and how the outputs are disseminated. Grids are regular polygon networks that divide the territory of country in a grid-like way/pattern into equally large territorial units, to which aggregate statistical data are assigned. The disadvantage of grids is that these are territorially small units that are often minimally populated. This mainly has implications for the protection of individual data, which is associated with statistical disclosure control (SDC). The research question addressed in this paper is whether data protection (perturbation methods) leads to a change in the characteristics of the file either in terms of statistics of the whole file (i.e. for all grids) or in terms of spatial statistics, which indicate the spatial distribution of the analysed phenomenon. Two possible solutions to the issue of grid data protection are discussed. One comes from the Statistical Office of the European Communities (Eurostat) and the other from Cantabular, which is a product of the Sensible Code Company (SCC) based in Belfast. According to the Cantabular methodology, one variant was processed, while according to the Eurostat methodology, two variants were calculated, which differ by the parameter settings for maximum noise D and the variance of noise V. The results of the descriptive statistics show a difference in absolute differences when Cantabular and Europstat solutions are compared. In the case of other statistics, the results are fully comparable. This paper is devoted to one specific type of census output. The question is to what extent these results are relevant for other types of census outputs. They differ fundamentally in the number of dimensions (grids have only two dimensions). It would therefore be appropriate to use SDC procedures that allow greater flexibility in defining SDC parameters.


2020 ◽  
pp. 1-13
Author(s):  
Rainer Lenz ◽  
Tim Hochgürtel

As part of statistical disclosure control National Statistical Offices can only deliver confidential data being sufficiently protected meeting national legislation. When releasing confidential microdata to users, data holders usually apply what are called anonymisation methods to the data. In order to fulfil the privacy requirements, it is possible to measure the level of privacy of some confidential data file by simulating potential data intrusion scenarios matching publicly or commercially available data with the entire set of confidential data, both sharing a non-empty set of variables (quasi-identifiers). According to real world microdata, incompatibility between data sets and not unique combinations of quasi-identifiers are very likely. In this situation, it is nearly impossible to decide whether or not two records refer to the same underlying statistical unit. Even a successful assignment of records may be a fruitless disclosure attempt, if a rationale data intruder would keep distance from that match. The paper lines out that disclosure risks estimated thus far are overrated in the sense that revealed information is always a combination of both, systematically derived results and non-negligible random assignment.


2020 ◽  
Vol 36 (4) ◽  
pp. 1281-1293
Author(s):  
Kyle Alves ◽  
Felix Ritchie

Statistical agencies and other government bodies increasingly use secure remote research facilities to provide access to sensitive data for research and analysis by internal staff and third parties. Such facilities depend on human intervention to ensure that the research outputs do not breach statistical disclosure control (SDC) rules. Output SDC can be principles-based, rules-based, or something in between. Principles-based is often seen as the gold standard statistically, as it improves both confidentiality protection and utility of outputs. However, some agencies are concerned that the operational requirements are too onerous for practical implementation, despite these statistical advantages. This paper argues that the choice of output checking procedure should be seen through an operational lens, rather than a statistical one. We take a popular conceptualisation of customer demand from the operations management literature and apply it to the problem of output checking. We demonstrate that principles-based output SDC addresses user and agency requirements more effectively than other approaches, and in a way which encourages user buy-in to the process. We also demonstrate how the principles-based approach aligns better with the statistical and staffing needs of the agency.


2020 ◽  
Vol 65 (9) ◽  
pp. 7-27
Author(s):  
Andrzej Młodak

The most important methods of assessing information loss caused by statistical disclosure control (SDC) are presented in the paper. The aim of SDC is to protect an individual against identification or obtaining any sensitive information relating to them by anyone unauthorised. The application of methods based either on the concealment of specific data or on their perturbation results in information loss, which affects the quality of output data, including the distributions of variables, the forms of relationships between them, or any estimations. The aim of this paper is to perform a critical analysis of the strengths and weaknesses of the particular types of methods of assessing information loss resulting from SDC. Moreover, some novel ideas on how to obtain effective and well-interpretable measures are proposed, including an innovative way of using a cyclometric function (arcus tangent) to determine the deviation of values from the original ones, as a result of SDC. Additionally, the inverse correlation matrix was applied in order to assess the influence of SDC on the strength of relationships between variables. The first presented method allows obtaining effective and well- -interpretable measures, while the other makes it possible to fully use the potential of the mutual relationships between variables (including the ones difficult to detect by means of classical statistical methods) for a better analysis of the consequences of SDC. Among other findings, the empirical verification of the utility of the suggested methods confirmed the superiority of the cyclometric function in measuring the distance between the curved deviations and the original data, and also heighlighted the need for a skilful correction of its flattening when large value arguments occur.


Sign in / Sign up

Export Citation Format

Share Document