Bayesian non‐parametric generation of fully synthetic multivariate categorical data in the presence of structural zeros

Abstract Multivariate categorical data nested within households often include reported values that fail edit constraints—for example, a participating household reports a child’s age as older than his biological parent’s age—and have missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations of errors and missing values, generates plausible datasets that satisfy all edit constraints, and can preserve multivariate relationships within and across individuals in the same household. We illustrate the approach using data from the 2012 American Community Survey.

Download Full-text

Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data

Journal of the American Statistical Association ◽

10.1080/01621459.2016.1231612 ◽

2017 ◽

Vol 112 (520) ◽

pp. 1708-1719 ◽

Cited By ~ 9

Author(s):

Daniel Manrique-Vallier ◽

Jerome P. Reiter

Keyword(s):

Categorical Data ◽

Multivariate Categorical

Download Full-text

Multivariate Categorical Data — Mosaic Plots

Statistics and Computing - Graphics of Large Datasets ◽

10.1007/0-387-37977-0_5 ◽

2007 ◽

pp. 105-124 ◽

Cited By ~ 3

Author(s):

Heike Hofmann

Keyword(s):

Categorical Data ◽

Multivariate Categorical

Download Full-text

Univaiuate anu multivariate categorical data analysis for block designs

Communication in Statistics- Theory and Methods ◽

10.1080/03610928208828306 ◽

1982 ◽

Vol 11 (11) ◽

pp. 1209-1231

Author(s):

R.P. Bhargava

Keyword(s):

Data Analysis ◽

Categorical Data ◽

Block Designs ◽

Categorical Data Analysis ◽

Multivariate Categorical

Download Full-text

Quantification of Multivariate Categorical Data Considering Typicality of Item

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2007.p0035 ◽

2007 ◽

Vol 11 (1) ◽

pp. 35-39 ◽

Cited By ~ 1

Author(s):

Chi-Hyon Oh ◽

◽

Katsuhiro Honda ◽

Hidetomo Ichihashi ◽

Keyword(s):

Numerical Experiment ◽

Objective Function ◽

Fuzzy Clustering ◽

Categorical Data ◽

The Other ◽

Additional Parameter ◽

Homogeneity Analysis ◽

Degree Of Membership ◽

Multivariate Categorical

We propose simultaneously applying homogeneity analysis and fuzzy clustering that simultaneously partitions individuals and items in categorical multivariate datasets. This objective function includes two types of memberships. One is conventional membership representing the degree of membership of each individual in each cluster. The other is an additional parameter that represents typicality of item. A numerical experiment demonstrates that our proposal is useful in quantifying categorical data, taking the typicality of each item into account.

Download Full-text