Survey Methodology
Combining data from surveys and related sources

by Dexter Cahoy and Joseph SedranskNote 1

  • Release date: June 30, 2023

Abstract

To improve the precision of inferences and reduce costs there is considerable interest in combining data from several sources such as sample surveys and administrative data. Appropriate methodology is required to ensure satisfactory inferences since the target populations and methods for acquiring data may be quite different. To provide improved inferences we use methodology that has a more general structure than the ones in current practice. We start with the case where the analyst has only summary statistics from each of the sources. In our primary method, uncertain pooling, it is assumed that the analyst can regard one source, survey r, as the single best choice for inference. This method starts with the data from survey r and adds data from those other sources that are shown to form clusters that include survey r. We also consider Dirichlet process mixtures, one of the most popular nonparametric Bayesian methods. We use analytical expressions and the results from numerical studies to show properties of the methodology.

Key Words: Administrative data; Bayesian methods; Clustering; Dirichlet process mixture; Pooling data; Survey sampling.

Table of contents

How to cite

Cahoy, D., and Sedransk, J. (2023). Combining data from surveys and related sources. Survey Methodology, Statistics Canada, Catalogue No. 12-001-X, Vol. 49, No. 1. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2023001/article/00003-eng.htm.

Note

Date modified: