Synthetic Data Generation with Differential Privacy via Bayesian Networks

This paper describes PrivBayes, a differentially private method for generating synthetic datasets that was used in the 2018 Differential Privacy Synthetic Data Challenge organized by NIST.

Download Full-text

How Can We Analyze Differentially-Private Synthetic Datasets?

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v2i2.589 ◽

2011 ◽

Vol 2 (2) ◽

Cited By ~ 9

Author(s):

Anne-Sophie Charest

Keyword(s):

Count Data ◽

Differential Privacy ◽

Synthetic Data ◽

Data Generation ◽

Multiple Imputations ◽

Synthetic Data Generation ◽

Simple Alternative ◽

Simulation Results ◽

Statistical Agencies ◽

Synthetic Datasets

Synthetic datasets generated within the multiple imputation framework are now commonly used by statistical agencies to protect the confidentiality of their respondents. More recently, researchers have also proposed techniques to generate synthetic datasets which offer the formal guarantee of differential privacy. While combining rules were derived for the first type of synthetic datasets, little has been said on the analysis of differentially-private synthetic datasets generated with multiple imputations. In this paper, we show that we can not use the usual combining rules to analyze synthetic datasets which have been generated to achieve differential privacy. We consider specifically the case of generating synthetic count data with the beta-binomial synthetizer, and illustrate our discussion with simulation results. We also propose as a simple alternative a Bayesian model which models explicitly the mechanism for synthetic data generation.

Download Full-text

Private FL-GAN: Differential Privacy Synthetic Data Generation Based on Federated Learning

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp40776.2020.9054559 ◽

2020 ◽

Cited By ~ 1

Author(s):

Bangzhou Xin ◽

Wei Yang ◽

Yangyang Geng ◽

Sheng Chen ◽

Shaowei Wang ◽

...

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation

Download Full-text

Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

Journal of Privacy and Confidentiality ◽

10.29012/jpc.778 ◽

2021 ◽

Vol 11 (3) ◽

Author(s):

Ryan McKenna ◽

Gerome Miklau ◽

Daniel Sheldon

Keyword(s):

Differential Privacy ◽

Data Distribution ◽

Synthetic Data ◽

Processing Method ◽

High Dimensional ◽

Data Generation ◽

Synthetic Data Generation ◽

Low Dimensional ◽

Noisy Measurements ◽

Broad Interest

We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation.

Download Full-text

Federated Synthetic Data Generation with Differential Privacy

Neurocomputing ◽

10.1016/j.neucom.2021.10.027 ◽

2021 ◽

Author(s):

Bangzhou Xin ◽

Yangyang Geng ◽

Teng Hu ◽

Sheng Chen ◽

Wei Yang ◽

...

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation

Download Full-text

Comparative Study of Differentially Private Synthetic Data Algorithms from the NIST PSCR Differential Privacy Synthetic Data Challenge

Journal of Privacy and Confidentiality ◽

10.29012/jpc.748 ◽

2021 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

Claire McKay Bowen ◽

Joshua Snoke

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Synthesis Methods ◽

Data Sets ◽

Data Generation ◽

Comparative Performance ◽

Synthetic Data Generation ◽

Private Data ◽

Future Data ◽

Public Policy Decisions

Differentially private synthetic data generation offers a recent solution to release analytically useful data while preserving the privacy of individuals in the data. In order to utilize these algorithms for public policy decisions, policymakers need an accurate understanding of these algorithms' comparative performance. Correspondingly, data practitioners also require standard metrics for evaluating the analytic qualities of the synthetic data. In this paper, we present an in-depth evaluation of several differentially private synthetic data algorithms using actual differentially private synthetic data sets created by contestants in the recent National Institute of Standards and Technology Public Safety Communications Research (NIST PSCR) Division's ``"Differential Privacy Synthetic Data Challenge." We offer analyses of these algorithms based on both the accuracy of the data they create and their usability by potential data providers. We frame the methods used in the NIST PSCR data challenge within the broader differentially private synthetic data literature. We implement additional utility metrics, including two of our own, on the differentially private synthetic data and compare mechanism utility on three categories. Our comparative assessment of the differentially private data synthesis methods and the quality metrics shows the relative usefulness, general strengths and weaknesses, preferred choices of algorithms and metrics. Finally we describe the implications of our evaluation for policymakers seeking to implement differentially private synthetic data algorithms on future data products.

Download Full-text

Automatic detection of Western rock lobster using synthetic data

ICES Journal of Marine Science ◽

10.1093/icesjms/fsz223 ◽

2019 ◽

Vol 77 (4) ◽

pp. 1308-1317 ◽

Cited By ~ 2

Author(s):

Ammar Mahmood ◽

Mohammed Bennamoun ◽

Senjian An ◽

Ferdous Sohel ◽

Farid Boussaid ◽

...

Keyword(s):

Automatic Segmentation ◽

Synthetic Data ◽

Data Generation ◽

Underwater Imaging ◽

Rock Lobster ◽

Art Object ◽

Synthetic Data Generation ◽

Learning Technique ◽

Human Effort ◽

Synthetic Datasets

Abstract Underwater imaging is being extensively used for monitoring the abundance of lobster species and their biodiversity in their local habitats. However, manual assessment of these images requires a huge amount of human effort. In this article, we propose to automate the process of lobster detection using a deep learning technique. A major obstacle in deploying such an automatic framework for the localization of lobsters in diverse environments is the lack of large annotated training datasets. Generating synthetic datasets to train these object detection models has become a popular approach. However, the current synthetic data generation frameworks rely on automatic segmentation of objects of interest, which becomes difficult when the objects have a complex shape, such as lobster. To overcome this limitation, we propose an approach to synthetically generate parts of the lobster. To handle the variability of real-world images, these parts were inserted into a set of diverse background marine images to generate a large synthetic dataset. A state-of-the-art object detector was trained using this synthetic parts dataset and tested on the challenging task of Western rock lobster detection in West Australian seas. To the best of our knowledge, this is the first automatic lobster detection technique for partially visible and occluded lobsters.

Download Full-text