statistical disclosure limitation
Recently Published Documents


TOTAL DOCUMENTS

37
(FIVE YEARS 5)

H-INDEX

7
(FIVE YEARS 1)

2021 ◽  
Vol 9 (2) ◽  
pp. 250-267
Author(s):  
Lesaja Goran ◽  
G.Q. Wang ◽  
A. Oganian

In this paper, an improved Interior-Point Method (IPM) for solving symmetric optimization problems is presented. Symmetric optimization (SO) problems are linear optimization problems over symmetric cones. In particular, the method can be efficiently applied to an important instance of SO, a Controlled Tabular Adjustment (CTA) problem which is a method used for Statistical Disclosure Limitation (SDL) of tabular data. The presented method is a full Nesterov-Todd step infeasible IPM for SO. The algorithm converges to ε-approximate solution from any starting point whether feasible or infeasible. Each iteration consists of the feasibility step and several centering steps, however, the iterates are obtained in the wider neighborhood of the central path in comparison to the similar algorithms of this type which is the main improvement of the method. However, the currently best known iteration bound known for infeasible short-step methods is still achieved.


Author(s):  
Kevin L McKinney ◽  
Andrew S Green ◽  
Lars Vilhuber ◽  
John M Abowd

Abstract We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total flow-employment, beginning-of-quarter employment, full-quarter employment, average monthly earnings of full-quarter employees, and total quarterly payroll. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM), including OTM for Emergency Management. We account for errors due to coverage; record-level non-response; edit and imputation of item missing data; and statistical disclosure limitation. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs are a transition zone, where cells may be fit for use with caution. Tabulations involving one or two jobs, which are generally suppressed on fitness-for-use criteria in the QWI and synthesized in LODES, have substantial total variability but can still be used to estimate statistics for untabulated aggregates as long as the job count in the aggregate is more than 10.


2020 ◽  
Vol 117 (15) ◽  
pp. 8344-8352 ◽  
Author(s):  
Aloni Cohen ◽  
Kobbi Nissim

There is a significant conceptual gap between legal and mathematical thinking around data privacy. The effect is uncertainty as to which technical offerings meet legal standards. This uncertainty is exacerbated by a litany of successful privacy attacks demonstrating that traditional statistical disclosure limitation techniques often fall short of the privacy envisioned by regulators. We define “predicate singling out,” a type of privacy attack intended to capture the concept of singling out appearing in the General Data Protection Regulation (GDPR). An adversary predicate singles out a dataset x using the output of a data-release mechanism M(x) if it finds a predicate p matching exactly one row in x with probability much better than a statistical baseline. A data-release mechanism that precludes such attacks is “secure against predicate singling out” (PSO secure). We argue that PSO security is a mathematical concept with legal consequences. Any data-release mechanism that purports to “render anonymous” personal data under the GDPR must prevent singling out and, hence, must be PSO secure. We analyze the properties of PSO security, showing that it fails to compose. Namely, a combination of more than logarithmically many exact counts, each individually PSO secure, facilitates predicate singling out. Finally, we ask whether differential privacy and k-anonymity are PSO secure. Leveraging a connection to statistical generalization, we show that differential privacy implies PSO security. However, and in contrast with current legal guidance, k-anonymity does not: There exists a simple predicate singling out attack under mild assumptions on the k-anonymizer and the data distribution.


Author(s):  
Amy O'Hara ◽  
Quentin Brummet

An expanding body of data privacy research reveals that computational advances and ever-growing amounts of publicly retrievable data increase re-identification risks. Because of this, data publishers are realizing that traditional statistical disclosure limitation methods may not protect privacy. This paper discusses the use of differential privacy at the US Census Bureau to protect the published results of the 2020 census. We first discuss the legal framework under which the Census Bureau intends to use differential privacy. The Census Act in the US states that the agency must keep information confidential, avoiding “any publication whereby the data furnished by any particular establishment or individual under this title can be identified.” The fact that Census may release fewer statistics in 2020 than in 2010 is leading scholars to parse the meaning of identification and reevaluate the agency’s responsibility to balance data utility with privacy protection. We then describe technical aspects of the application of differential privacy in the U.S. Census. This data collection is enormously complex and serves a wide variety of users and uses -- 7.8 billion statistics were released using the 2010 US Census. This complexity strains the application of differential privacy to ensure appropriate geographic relationships, respect legal requirements for certain statistics to be free of noise infusion, and provide information for detailed demographic groups. We end by discussing the prospects of applying formal mathematical privacy to other information products at the Census Bureau. At present, techniques exist for applying differential privacy to descriptive statistics, histograms, and counts, but are less developed for more complex data releases including panel data, linked data, and vast person-level datasets. We expect the continued development of formally private methods to occur alongside discussions of what privacy means and the policy issues involved in trading off protection for accuracy.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Natalie Shlomo

An overview of traditional types of data dissemination at statistical agencies is provided including definitions of disclosure risks, the quantification of disclosure risk and data utility and common statistical disclosure limitation (SDL) methods. However, with technological advancements and the increasing push by governments for openand accessible data, new forms of data dissemination are currently being explored. We focus on web-based applications such as flexible table builders and remote analysis servers, synthetic data and remote access. Many of these applications introduce new challenges for statistical agencies as they are gradually relinquishing some of their control on what data is released. There is now more recognition of the need for perturbative methods to protect the confidentiality of data subjects. These new forms of data dissemination are changing the landscape of how disclosure risks are conceptualized and the types of SDL methods that need to be applied to protect thedata. In particular, inferential disclosure is the main disclosure risk of concern and encompasses the traditional types of disclosure risks based on identity and attribute disclosures. These challenges have led to statisticians exploring the computer science definition of differential privacy and privacy- by-design applications. We explore how differential privacy can be a useful addition to the current SDL framework within statistical agencies.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Natalie Shlomo

During this period of preparing for the special issue of the Journal of Privacy and Confidentiality in honour of Steve Fienberg, we received news of the tragic events that occurred at the Tree of Life Synagogue in Pittsburgh on October 27th, 2018 and the sudden senseless death of Joyce Fienberg. Whilst Steve was a great support and mentor to me as I embarked on my PhD research at the Hebrew University and the University of Southampton in 2004, he was married to an extraordinary woman who showed endless kindness to me and all of Steve’s students and mentees. I had a wonderful visit to CMU during my sabbatical period in November 2011 spending much quality time with both Steve and Joyce. As my mentor, Steve marked my PhD dissertation in 2007, provided me with advice and support as I embarked on an academic career and provided many recommendation and promotion letters over the years. I can honestly credit Steve with where I am at today in my academic career. Steve was instrumental in bringing differential privacy to the forefront of research in statistical disclosure limitation and provided many opportunities to bring statisticians and computer scientists together for collaborations. Our most recent initiative was the Data Linkage and Anonymisation Programme at the Isaac Newton Institute of Mathematical Sciences at the University of Cambridge from July through December 2016. Steve was to participate in the programme but alas his illness took the better of him during that time. In fact, Steve was to participate in all three programmes that were running at the Institute: Data Linkage and Anonymisation, Theoretical Foundations for Statistical Network Analysis and Probability and Statistics in Forensic Science which only goes to show the breadth and depth of his research activities and achievements. He was sorely missed. I can only hope that these words of devotion and appreciation will provide some comfort to Steve and Joyce’s family. I end with a Hebrew blessing - Zichronom livracha – may their memories be a blessing.


Author(s):  
John M Abowd

The dual problems of respecting citizen privacy and protecting the confidentiality of their data have become hopelessly conflated in the “Big Data” era. There are orders of magnitude more data outside an agency’s firewall than inside it—compromising the integrity of traditional statistical disclosure limitation methods. And increasingly the information processed by the agency was “asked” in a context wholly outside the agency’s operations—blurring the distinction between what was asked and what is published. Already, private businesses like Microsoft, Google and Apple recognize that cybersecurity (safeguarding the integrity and access controls for internal data) and privacy protection (ensuring that what is published does not reveal too much about any person or business) are two sides of the same coin. This is a paradigm-shifting moment for statistical agencies.


Sign in / Sign up

Export Citation Format

Share Document