scholarly journals Synthetic Data Generation with Differential Privacy via Bayesian Networks

2021 ◽  
Vol 11 (3) ◽  
Author(s):  
Ergute Bao ◽  
Xiaokui Xiao ◽  
Jun Zhao ◽  
Dongping Zhang ◽  
Bolin Ding

This paper describes PrivBayes, a differentially private method for generating synthetic datasets that was used in the 2018 Differential Privacy Synthetic Data Challenge organized by NIST.

Author(s):  
Anne-Sophie Charest

Synthetic datasets generated within the multiple imputation framework are now commonly used by statistical agencies to protect the confidentiality of their respondents. More recently, researchers have also proposed techniques to generate synthetic datasets which offer the formal guarantee of differential privacy. While combining rules were derived for the first type of synthetic datasets, little has been said on the analysis of differentially-private synthetic datasets generated with multiple imputations. In this paper, we show that we can not use the usual combining rules to analyze synthetic datasets which have been generated to achieve differential privacy. We consider specifically the case of generating synthetic count data with the beta-binomial synthetizer, and illustrate our discussion with simulation results. We also propose as a simple alternative a Bayesian model which models explicitly the mechanism for synthetic data generation.


2021 ◽  
Vol 11 (3) ◽  
Author(s):  
Ryan McKenna ◽  
Gerome Miklau ◽  
Daniel Sheldon

We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation.


2021 ◽  
Author(s):  
Bangzhou Xin ◽  
Yangyang Geng ◽  
Teng Hu ◽  
Sheng Chen ◽  
Wei Yang ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Claire McKay Bowen ◽  
Joshua Snoke

Differentially private synthetic data generation offers a recent solution to release analytically useful data while preserving the privacy of individuals in the data. In order to utilize these algorithms for public policy decisions, policymakers need an accurate understanding of these algorithms' comparative performance. Correspondingly, data practitioners also require standard metrics for evaluating the analytic qualities of the synthetic data. In this paper, we present an in-depth evaluation of several differentially private synthetic data algorithms using actual differentially private synthetic data sets created by contestants in the recent National Institute of Standards and Technology Public Safety Communications Research (NIST PSCR) Division's ``"Differential Privacy Synthetic Data Challenge." We offer analyses of these algorithms based on both the accuracy of the data they create and their usability by potential data providers. We frame the methods used in the NIST PSCR data challenge within the broader differentially private synthetic data literature. We implement additional utility metrics, including two of our own, on the differentially private synthetic data and compare mechanism utility on three categories. Our comparative assessment of the differentially private data synthesis methods and the quality metrics shows the relative usefulness, general strengths and weaknesses, preferred choices of algorithms and metrics. Finally we describe the implications of our evaluation for policymakers seeking to implement differentially private synthetic data algorithms on future data products.


2019 ◽  
Vol 77 (4) ◽  
pp. 1308-1317 ◽  
Author(s):  
Ammar Mahmood ◽  
Mohammed Bennamoun ◽  
Senjian An ◽  
Ferdous Sohel ◽  
Farid Boussaid ◽  
...  

Abstract Underwater imaging is being extensively used for monitoring the abundance of lobster species and their biodiversity in their local habitats. However, manual assessment of these images requires a huge amount of human effort. In this article, we propose to automate the process of lobster detection using a deep learning technique. A major obstacle in deploying such an automatic framework for the localization of lobsters in diverse environments is the lack of large annotated training datasets. Generating synthetic datasets to train these object detection models has become a popular approach. However, the current synthetic data generation frameworks rely on automatic segmentation of objects of interest, which becomes difficult when the objects have a complex shape, such as lobster. To overcome this limitation, we propose an approach to synthetically generate parts of the lobster. To handle the variability of real-world images, these parts were inserted into a set of diverse background marine images to generate a large synthetic dataset. A state-of-the-art object detector was trained using this synthetic parts dataset and tested on the challenging task of Western rock lobster detection in West Australian seas. To the best of our knowledge, this is the first automatic lobster detection technique for partially visible and occluded lobsters.


2007 ◽  
Author(s):  
Marek K. Jakubowski ◽  
David Pogorzala ◽  
Timothy J. Hattenberger ◽  
Scott D. Brown ◽  
John R. Schott

2004 ◽  
pp. 211-234 ◽  
Author(s):  
Lewis Girod ◽  
Ramesh Govindan ◽  
Deepak Ganesan ◽  
Deborah Estrin ◽  
Yan Yu

2021 ◽  
Author(s):  
Maria Lyssenko ◽  
Christoph Gladisch ◽  
Christian Heinzemann ◽  
Matthias Woehrle ◽  
Rudolph Triebel

Sign in / Sign up

Export Citation Format

Share Document