Combining synthetic data with subsampling to create public use microdata files for large scale surveys

Articles and reports: 12-001-X201200111687

Description:

To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants' confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemination. We propose to create disclosure-protected subsamples from large scale surveys based on multiple imputation. The idea is to replace identifying or sensitive values in the original sample with draws from statistical models, and release subsamples of the disclosure-protected data. We present methods for making inferences with the multiple synthetic subsamples.

Issue Number: 2012001

Author(s): Reiter, J.P.

Main Product: Survey Methodology

Format	Release date	More information
PDF	June 27, 2012

Related information

Subjects and keywords

Subjects

Statistical methods
- Disclosure control and data dissemination
- Editing and imputation

Keywords

Report a problem or mistake on this page

Date modified:: 2024-10-06

Language selection

Search and menus

Search

Combining synthetic data with subsampling to create public use microdata files for large scale surveys - ARCHIVED

Related information

Subjects

Keywords