Multiple-frame surveys for a multiple-data-source world
Section 1. Introduction
Throughout his 33-year career at the Census Bureau and
subsequent 32-year career at Westat, Joe Waksberg repeatedly relied on multiple
data sources to improve the quality of estimates while reducing costs. He used
external data sources to evaluate coverage in the U.S. decennial census (Marks and Waksberg, 1966; Waksberg and
Pritzker, 1969), to calibrate survey
weights, and to improve efficiency or oversample rare populations when
designing surveys (Hendricks, Igra and Waksberg, 1980; Cohen,
DiGaetano and Waksberg, 1988; DiGaetano, Judkins and Waksberg, 1995; Waksberg, 1995; Waksberg, Judkins and Massey, 1997b).
On several occasions, Waksberg integrated data from
two or more surveys directly in order to improve coverage or to obtain larger
sample sizes for subpopulations (Waksberg,
1986; Burke, Mohadjer, Green, Waksberg,
Kirsch and Kolstad, 1994; Waksberg, Brick, Shapiro, Flores-Cervantes and Bell, 1997a). In these multiple-frame surveys, independent samples were selected from
sampling frames that together were thought to cover all, or almost all, of the
target population. The data from the samples were combined to obtain estimates
for the population as a whole and for subpopulations of interest. Waksberg
approached the design of these multiple-frame surveys from the perspective of
controlling both sampling and nonsampling errors, and found that using multiple
frames met the challenges of producing reliable estimates in the face of
increased data collection costs (with higher nonresponse for less expensive
collection methods) and incomplete frame coverage.
Statistical agencies and survey organizations today
face the same types of challenges that Waksberg addressed
declining response rates and increasing costs of
survey data collection
but at an intensified level. At the same time,
the emergence of new data sources provides opportunities for obtaining
information about parts of populations of interest
sometimes with amazing rapidity. Many
organizations are now using or researching methods for integrating data from
multiple sources to improve the accuracy or timeliness of population estimates.
I feel tremendously honored to be asked to give the
Waksberg lecture, and in this paper I want to build on Waksberg’s insights
about multiple-frame surveys by discussing their use as an organizing principle
for combining information from multiple sources. Traditionally, multiple-frame
surveys have integrated data from
probability
samples
that are
selected independently from
frames. But the
general structure can be expanded to include frames that consist of
administrative records or nonprobability samples. The structure can also be
expanded to situations in which some data sources do not measure the variables
of interest
but they
measure covariates
that can be
used to predict
A number of authors have reviewed methods for
combining data from multiple sources; see, for example, Citro (2014), Lohr and Raghunathan (2017),
National Academies of Sciences, Engineering, and Medicine (2017, 2018),
Thompson (2019), Zhang and Chambers (2019), Beaumont (2020), Yang and Kim
(2020), and Rao (2021). The sources
include traditional probability samples, administrative data sets, sensor data,
social network data, and general convenience samples.
Although the types of data (and the speed with which
some types of data can be collected) have changed in recent years, the basic
structure of the problem for combining data sources is unchanged from the
earliest dual-frame surveys. Section 2 discusses the structure and assumptions
for traditional multiple-frame surveys through the example of the National
Survey of America’s Families, a dual-frame survey that Waksberg worked on
during the 1990s. Section 3 reviews methods for calculating estimates of
population characteristics from traditional multiple-frame surveys where all
assumptions are met, including the special case in which one sample is a census
of a subset of the population. Section 4 then discusses how the
multiple-frame structure incorporates many of the methods currently used for
combining data, sometimes with relaxed assumptions. Section 5 addresses
issues for designing data collection systems that control sampling and
nonsampling errors, with a discussion of possible future directions for
research.
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa