scholarly journals Examining Disclosure Risk and Data Utility: An Administrative Data Case Study

2014 ◽  
Vol 9 (1) ◽  
pp. 12-24
Author(s):  
Michael Comerford

The plethora of new data sources, combined with a growing interest in increased access to previously unpublished data, poses a set of ethical challenges regarding individual privacy. This paper sets out one aspect of those challenges: the need to anonymise data in such a form that protects the privacy of individuals while providing sufficient data utility for data users. This issue is discussed using a case study of Scottish Government’s administrative data, in which disclosure risk is examined and data utility is assessed using a potential ‘real-world’ analysis.

2020 ◽  
Vol 23 (6) ◽  
pp. 743-750
Author(s):  
Praveen Thokala ◽  
Peter Dodd ◽  
Hassan Baalbaki ◽  
Alan Brennan ◽  
Simon Dixon ◽  
...  

2015 ◽  
Vol 18 (3) ◽  
pp. A20
Author(s):  
M. Gavaghan ◽  
S. Armstrong ◽  
C. Taggart ◽  
S. Garfield

2015 ◽  
Vol 31 (4) ◽  
pp. 737-761 ◽  
Author(s):  
Matthias Templ

Abstract Scientific- or public-use files are typically produced by applying anonymisation methods to the original data. Anonymised data should have both low disclosure risk and high data utility. Data utility is often measured by comparing well-known estimates from original data and anonymised data, such as comparing their means, covariances or eigenvalues. However, it is a fact that not every estimate can be preserved. Therefore the aim is to preserve the most important estimates, that is, instead of calculating generally defined utility measures, evaluation on context/data dependent indicators is proposed. In this article we define such indicators and utility measures for the Structure of Earnings Survey (SES) microdata and proper guidelines for selecting indicators and models, and for evaluating the resulting estimates are given. For this purpose, hundreds of publications in journals and from national statistical agencies were reviewed to gain insight into how the SES data are used for research and which indicators are relevant for policy making. Besides the mathematical description of the indicators and a brief description of the most common models applied to SES, four different anonymisation procedures are applied and the resulting indicators and models are compared to those obtained from the unmodified data. The disclosure risk is reported and the data utility is evaluated for each of the anonymised data sets based on the most important indicators and a model which is often used in practice.


2018 ◽  
Vol 34 (4) ◽  
pp. 863-888 ◽  
Author(s):  
Arnout van Delden ◽  
Jan van der Laan ◽  
Annemarie Prins

Abstract Administrative data sources are increasingly used by National Statistical Institutes to compile statistics. These sources may be based on decentralised autonomous administrations, for instance municipalities that deliver data on their inhabitants. One issue that may arise when using these decentralised administrative data is that categorical variables are underreported by some of the data suppliers, for instance to avoid administrative burden. Under certain conditions overreporting may also occur. When statistical output on changes is estimated from decentralised administrative data, the question may arise whether those changes are affected by shifts in reporting frequencies. For instance, in a case study on hospital data, the values from certain data suppliers may have been affected by changes in reporting frequencies. We present an automatic procedure to detect suspicious data suppliers in decentralised administrative data in which shifts in reporting behaviour are likely to have affected the estimated output. The procedure is based on a predictive mean matching approach, where part of the original data values are replaced by imputed values obtained from a selected reference group. The method is successfully applied to a case study with administrative hospital data.


2015 ◽  
Vol 25 (1) ◽  
pp. 39-45 ◽  
Author(s):  
Jennifer Tetnowski

Qualitative case study research can be a valuable tool for answering complex, real-world questions. This method is often misunderstood or neglected due to a lack of understanding by researchers and reviewers. This tutorial defines the characteristics of qualitative case study research and its application to a broader understanding of stuttering that cannot be defined through other methodologies. This article will describe ways that data can be collected and analyzed.


2021 ◽  
pp. 1-22
Author(s):  
Emily Berg ◽  
Johgho Im ◽  
Zhengyuan Zhu ◽  
Colin Lewis-Beck ◽  
Jie Li

Statistical and administrative agencies often collect information on related parameters. Discrepancies between estimates from distinct data sources can arise due to differences in definitions, reference periods, and data collection protocols. Integrating statistical data with administrative data is appealing for saving data collection costs, reducing respondent burden, and improving the coherence of estimates produced by statistical and administrative agencies. Model based techniques, such as small area estimation and measurement error models, for combining multiple data sources have benefits of transparency, reproducibility, and the ability to provide an estimated uncertainty. Issues associated with integrating statistical data with administrative data are discussed in the context of data from Namibia. The national statistical agency in Namibia produces estimates of crop area using data from probability samples. Simultaneously, the Namibia Ministry of Agriculture, Water, and Forestry obtains crop area estimates through extension programs. We illustrate the use of a structural measurement error model for the purpose of synthesizing the administrative and survey data to form a unified estimate of crop area. Limitations on the available data preclude us from conducting a genuine, thorough application. Nonetheless, our illustration of methodology holds potential use for a general practitioner.


Author(s):  
Jonathan M Snowden ◽  
Audrey Lyndon ◽  
Peiyi Kan ◽  
Alison El Ayadi ◽  
Elliott Main ◽  
...  

Abstract Severe maternal morbidity (SMM) is a composite outcome measure that indicates serious, potentially life-threatening maternal health problems. There is great interest in defining SMM using administrative data for surveillance and research. In the US, one common way of defining SMM at the population level is an index developed by the Centers for Disease Control and Prevention. Modifications have been proposed to this index (e.g., excluding maternal transfusion); some research defines SMM using an index introduced by Bateman et al. Birth certificate data are also increasingly being used to define SMM. We compared commonly used US definitions of SMM to each other among all California births, 2007-2012, using the Kappa statistic and other measures. We also evaluated agreement between maternal morbidity fields on the birth certificate compared to claims data. Concordance was generally low between the 7 definitions of SMM analyzed (i.e., κ < 0.4 for 13 of 21 two-way comparisons), Low concordance was particularly driven by presence/absence of transfusion and claims data versus birth certificate definitions. Low agreement between administrative data-based definitions of SMM highlights that results can be expected to differ between them. Further research is needed on validity of SMM definitions, using more fine-grained data sources.


Water ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 818
Author(s):  
Markus Reisenbüchler ◽  
Minh Duc Bui ◽  
Peter Rutschmann

Reservoir sedimentation is a critical issue worldwide, resulting in reduced storage volumes and, thus, reservoir efficiency. Moreover, sedimentation can also increase the flood risk at related facilities. In some cases, drawdown flushing of the reservoir is an appropriate management tool. However, there are various options as to how and when to perform such flushing, which should be optimized in order to maximize its efficiency and effectiveness. This paper proposes an innovative concept, based on an artificial neural network (ANN), to predict the volume of sediment flushed from the reservoir given distinct input parameters. The results obtained from a real-world study area indicate that there is a close correlation between the inputs—including peak discharge and duration of flushing—and the output (i.e., the volume of sediment). The developed ANN can readily be applied at the real-world study site, as a decision-support system for hydropower operators.


2021 ◽  
pp. 263145412098771
Author(s):  
Biju Dominic ◽  
Reshmi

This case study is about misselling of insurance policies and associated ethical challenges in a leading insurance company. Pro-organisational ethical violations mostly remain unnoticed and are often protected by implausible explanations. In the long run, persistent rationalisation makes malpractices a norm. The present work describes the interventions applied by a consulting firm to bring behavioural integrity. The consulting firm found that socialisation, rationalisation and institutionalisation considerably influenced people’s behaviour at the workplace and normalised unethical behaviour of insurance agents. It architected the behaviour of salespeople by specifically designed interventions through self-control mechanism and nudges. These interventions developed integrity in employees and reduced the number of cautions, warnings and terminations.


Sign in / Sign up

Export Citation Format

Share Document