Private FL-GAN: Differential Privacy Synthetic Data Generation Based on Federated Learning

2021 ◽

Vol 11 (3) ◽

Author(s):

Ryan McKenna ◽

Gerome Miklau ◽

Daniel Sheldon

Keyword(s):

Differential Privacy ◽

Data Distribution ◽

Synthetic Data ◽

Processing Method ◽

High Dimensional ◽

Data Generation ◽

Synthetic Data Generation ◽

Low Dimensional ◽

Noisy Measurements ◽

Broad Interest

We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation.

Download Full-text

Federated Synthetic Data Generation with Differential Privacy

Neurocomputing ◽

10.1016/j.neucom.2021.10.027 ◽

2021 ◽

Author(s):

Bangzhou Xin ◽

Yangyang Geng ◽

Teng Hu ◽

Sheng Chen ◽

Wei Yang ◽

...

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation

Download Full-text

Comparative Study of Differentially Private Synthetic Data Algorithms from the NIST PSCR Differential Privacy Synthetic Data Challenge

Journal of Privacy and Confidentiality ◽

10.29012/jpc.748 ◽

2021 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

Claire McKay Bowen ◽

Joshua Snoke

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Synthesis Methods ◽

Data Sets ◽

Data Generation ◽

Comparative Performance ◽

Synthetic Data Generation ◽

Private Data ◽

Future Data ◽

Public Policy Decisions

Differentially private synthetic data generation offers a recent solution to release analytically useful data while preserving the privacy of individuals in the data. In order to utilize these algorithms for public policy decisions, policymakers need an accurate understanding of these algorithms' comparative performance. Correspondingly, data practitioners also require standard metrics for evaluating the analytic qualities of the synthetic data. In this paper, we present an in-depth evaluation of several differentially private synthetic data algorithms using actual differentially private synthetic data sets created by contestants in the recent National Institute of Standards and Technology Public Safety Communications Research (NIST PSCR) Division's ``"Differential Privacy Synthetic Data Challenge." We offer analyses of these algorithms based on both the accuracy of the data they create and their usability by potential data providers. We frame the methods used in the NIST PSCR data challenge within the broader differentially private synthetic data literature. We implement additional utility metrics, including two of our own, on the differentially private synthetic data and compare mechanism utility on three categories. Our comparative assessment of the differentially private data synthesis methods and the quality metrics shows the relative usefulness, general strengths and weaknesses, preferred choices of algorithms and metrics. Finally we describe the implications of our evaluation for policymakers seeking to implement differentially private synthetic data algorithms on future data products.

Download Full-text

How Can We Analyze Differentially-Private Synthetic Datasets?

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v2i2.589 ◽

2011 ◽

Vol 2 (2) ◽

Cited By ~ 9

Author(s):

Anne-Sophie Charest

Keyword(s):

Count Data ◽

Differential Privacy ◽

Synthetic Data ◽

Data Generation ◽

Multiple Imputations ◽

Synthetic Data Generation ◽

Simple Alternative ◽

Simulation Results ◽

Statistical Agencies ◽

Synthetic Datasets

Synthetic datasets generated within the multiple imputation framework are now commonly used by statistical agencies to protect the confidentiality of their respondents. More recently, researchers have also proposed techniques to generate synthetic datasets which offer the formal guarantee of differential privacy. While combining rules were derived for the first type of synthetic datasets, little has been said on the analysis of differentially-private synthetic datasets generated with multiple imputations. In this paper, we show that we can not use the usual combining rules to analyze synthetic datasets which have been generated to achieve differential privacy. We consider specifically the case of generating synthetic count data with the beta-binomial synthetizer, and illustrate our discussion with simulation results. We also propose as a simple alternative a Bayesian model which models explicitly the mechanism for synthetic data generation.

Download Full-text

Synthetic Data Generation with Differential Privacy via Bayesian Networks

Journal of Privacy and Confidentiality ◽

10.29012/jpc.776 ◽

2021 ◽

Vol 11 (3) ◽

Author(s):

Ergute Bao ◽

Xiaokui Xiao ◽

Jun Zhao ◽

Dongping Zhang ◽

Bolin Ding

Keyword(s):

Bayesian Networks ◽

Differential Privacy ◽

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation ◽

Synthetic Datasets

This paper describes PrivBayes, a differentially private method for generating synthetic datasets that was used in the 2018 Differential Privacy Synthetic Data Challenge organized by NIST.

Download Full-text