Doctoral Dissertation Defense: Abhishek Guin
Advisors: Drs. Bimal Sinha and Anindya Roy
Wednesday, July 21, 2021 · 11 AM - 1 PM
Title: Bayesian Analysis of Synthetic Data under Multiple Linear
Regression, Multivariate Normal and Multivariate Regression Models
Abstract
Statistical Disclosure Control (SDC) methods are used to preserve confidentiality of publicly released microdata, without compromising on its fundamental structure, so as to ensure adequate and accurate statistical analysis of the data. The synthetic data approach is a popular form of SDC methodology where (all or part of) the real data are not released, but are instead used to create synthetic data which are released.
In this dissertation we develop Bayesian inference based on singly or multiply imputed synthetic data, when the original data are derived from the following models: multiple linear regression, multivariate normal and multivariate regression. We assume that the synthetic data are generated by using two methods: plug-in sampling, where unknown parameters in the data model are set equal to observed values of their point estimators based on the original data, and synthetic data are drawn from this estimated version of the model; posterior predictive sampling, where an imputed posterior distribution of the unknown parameters is used to generate a posterior draw, which in turn is plugged in the original model to produce synthetic data. In the single imputation case, the procedures developed here fill the gap in the existing literature where inferential methods are only available for multiple imputation and by being based on exact distributions, it may even be applied to cases where the sample size is small. Simulation results are presented to demonstrate how the proposed methodology performs compared to the theoretical predictions. We also outline some ways to extend the proposed methodology for certain scenarios where the required set of conditions do not hold.