Multiple-frame surveys for a multiple-data-source world
Section 1. Introduction

Table of contents

Throughout his 33-year career at the Census Bureau and subsequent 32-year career at Westat, Joe Waksberg repeatedly relied on multiple data sources to improve the quality of estimates while reducing costs. He used external data sources to evaluate coverage in the U.S. decennial census (Marks and Waksberg, 1966; Waksberg and Pritzker, 1969), to calibrate survey weights, and to improve efficiency or oversample rare populations when designing surveys (Hendricks, Igra and Waksberg, 1980; Cohen, DiGaetano and Waksberg, 1988; DiGaetano, Judkins and Waksberg, 1995; Waksberg, 1995; Waksberg, Judkins and Massey, 1997b).

On several occasions, Waksberg integrated data from two or more surveys directly in order to improve coverage or to obtain larger sample sizes for subpopulations (Waksberg, 1986; Burke, Mohadjer, Green, Waksberg, Kirsch and Kolstad, 1994; Waksberg, Brick, Shapiro, Flores-Cervantes and Bell, 1997a). In these multiple-frame surveys, independent samples were selected from sampling frames that together were thought to cover all, or almost all, of the target population. The data from the samples were combined to obtain estimates for the population as a whole and for subpopulations of interest. Waksberg approached the design of these multiple-frame surveys from the perspective of controlling both sampling and nonsampling errors, and found that using multiple frames met the challenges of producing reliable estimates in the face of increased data collection costs (with higher nonresponse for less expensive collection methods) and incomplete frame coverage.

Statistical agencies and survey organizations today face the same types of challenges that Waksberg addressed $-$ declining response rates and increasing costs of survey data collection $-$ but at an intensified level. At the same time, the emergence of new data sources provides opportunities for obtaining information about parts of populations of interest $-$ sometimes with amazing rapidity. Many organizations are now using or researching methods for integrating data from multiple sources to improve the accuracy or timeliness of population estimates.

I feel tremendously honored to be asked to give the Waksberg lecture, and in this paper I want to build on Waksberg’s insights about multiple-frame surveys by discussing their use as an organizing principle for combining information from multiple sources. Traditionally, multiple-frame surveys have integrated data from $Q$ probability samples $S_{1}, \dots, S_{Q}$ that are selected independently from $Q$ frames. But the general structure can be expanded to include frames that consist of administrative records or nonprobability samples. The structure can also be expanded to situations in which some data sources do not measure the variables of interest $y$ but they measure covariates $x$ that can be used to predict $y .$

A number of authors have reviewed methods for combining data from multiple sources; see, for example, Citro (2014), Lohr and Raghunathan (2017), National Academies of Sciences, Engineering, and Medicine (2017, 2018), Thompson (2019), Zhang and Chambers (2019), Beaumont (2020), Yang and Kim (2020), and Rao (2021). The sources include traditional probability samples, administrative data sets, sensor data, social network data, and general convenience samples.

Although the types of data (and the speed with which some types of data can be collected) have changed in recent years, the basic structure of the problem for combining data sources is unchanged from the earliest dual-frame surveys. Section 2 discusses the structure and assumptions for traditional multiple-frame surveys through the example of the National Survey of America’s Families, a dual-frame survey that Waksberg worked on during the 1990s. Section 3 reviews methods for calculating estimates of population characteristics from traditional multiple-frame surveys where all assumptions are met, including the special case in which one sample is a census of a subset of the population. Section 4 then discusses how the multiple-frame structure incorporates many of the methods currently used for combining data, sometimes with relaxed assumptions. Section 5 addresses issues for designing data collection systems that control sampling and nonsampling errors, with a discussion of possible future directions for research.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2022-01-06

Language selection

Search and menus

Search

Multiple-frame surveys for a multiple-data-source world
Section 1. Introduction

Multiple-frame surveys for a multiple-data-source world Section 1. Introduction

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Multiple-frame surveys for a multiple-data-source world
Section 1. Introduction