Inference for partially synthetic, public use microdata sets - ARCHIVED

Articles and reports: 12-001-X20030026785

Description:

To avoid disclosures, one approach is to release partially synthetic, public use microdata sets. These comprise the units originally surveyed, but some collected values, for example sensitive values at high risk of disclosure or values of key identifiers, are replaced with multiple imputations. Although partially synthetic approaches are currently used to protect public use data, valid methods of inference have not been developed for them. This article presents such methods. They are based on the concepts of multiple imputation for missing data but use different rules for combining point and variance estimates. The combining rules also differ from those for fully synthetic data sets developed by Raghunathan, Reiter and Rubin (2003). The validity of these new rules is illustrated in simulation studies.

Issue Number: 2003002
Author(s): Reiter, J.P.

Main Product: Survey Methodology

FormatRelease dateMore information
PDFJanuary 27, 2004