5. Inference from multiple nonparametric synthetic populations
Qi Dong, Michael R. Elliott and Trivellore E. Raghunathan
Previous | Next
Assume we generate synthetic populations, using the nonparametric method described in
Section 4, and that our inferential target is a function of the population data (e.g., population mean, correlation,
population maximum likelihood estimator of a regression parameter, etc.). We can compute as the estimate of obtained from pooling the synthetic populations that impute the
unobserved units of since these are direct draws from the
posterior predictive distribution of the population, we can compute posterior
means, quantiles, and credible intervals from the corresponding empirical
estimates from the draws, if is sufficiently large.
However, in
many settings, the computational effort required to impute the population may
be very large, even if the full population is not required to be synthesized. Hence
an alternative approach for inference is to approximate the posterior
predictive distribution of a scalar population statistic via a distribution:
where
The result follows immediately from Section 4.1
of Raghunathan et al. 2003, and
is based on the standard Rubin (1987) multiple imputation combining rules,
treating the unobserved units of as missing data and the sampled units as
observed data. The average "within� imputation variance is zero, since the
entire population is being synthesized; hence the posterior variance of
is entirely a function of the between-imputation
variance, and the degrees of freedom is simply given by the number of FPBB
samples. (When the population is extremely large, we need only synthesize a
draw sufficiently large for average "within� imputation variance to be trivial
relative to the between imputation variance
) The result assumes that
- a result guaranteed by our weighted FPBB estimator - as
well as a a sufficiently large sample size for Bayesian asymptotics to apply.
Previous | Next