Multiple-frame surveys for a multiple-data-source world
Section 5. Design of data collection systems

Table of contents

Section 4 discussed how estimators for integrated data can be thought of within a multiple-frame survey structure. This structure can also be used when designing data collection systems that make use of multiple sources. Hartley (1962) derived the values of $n^{(1)},$ $n^{(2)},$ and $θ$ that minimize the variance of $\hat{Y} (θ)$ in (3.2) when $S_{1}$ and $S_{2}$ are both simple random samples. His basic method can be extended to explore effects of sample design choices for other situations by considering mean squared errors under a range of potential bias assumptions.

There has been a substantial amount of work on optimal design and effects of nonresponse for dual-frame cellular/landline telephone surveys. Brick, Dipko, Presser, Tucker and Yuan (2006) and Brick et al. (2011) investigated nonsampling errors; Lu, Sahr, Iachan, Denker, Duffy and Weston (2013) performed a simulation study to calculate the anticipated mean squared error under various cost models and potential biases. Lohr and Brick (2014), studying allocation of resources in dual-frame telephone surveys with nonresponse, found that for some cost structures a screening survey, in which respondents with landlines are screened out of the cell phone sample, was more cost-efficient than an overlap survey. Levine and Harter (2015) presented graphical results to provide allocation guidance, considering the variance inflation from weight variation. Chen, Stubblefield and Stoner (2021) considered the design problem of oversampling minority populations in dual-frame telephone surveys, using optimal allocation methods from stratified sampling. Most of these articles focus on minimizing the variance of estimates for a fixed cost, and do not consider the effects of potential bias.

A number of papers in the 1980s studied error structures and designs for dual-frame surveys, typically supplementing a sample from an RDD frame with a sample from an area frame that was assumed to have full coverage. Biemer (1984) and Choudhry (1989) explored optimal designs theoretically and through simulation studies. Groves and Lepkowski (1985, 1986), and Traugott, Groves and Lepkowski (1987) investigated dual-frame designs with a view to minimizing mean squared error when estimates from the RDD frame may be biased. Lepkowski and Groves (1986) found that as the amount of bias increased in the RDD sample, its optimal allocation decreased, reaching an allocation of zero when the bias was 9 percent of the anticipated estimated percentage.

A small amount of bias can have similar effect for the situation considered in Section 3.4, where a census is taken from incomplete Frame 2 and a high-quality probability sample is taken from complete Frame 1. The plots in Figure 5.1 show the root mean squared error (RMSE) for an estimated proportion when $S_{1}$ is a simple random sample of size $n$ and $S_{2}$ is a census of domain ${1, 2},$ for combinations of overlap size $N_{{1, 2}} / N$ in {0.25, 0.5, 0.9} and bias in {0, 0.01, 0.03}. The population proportion is 0.2 in domain ${1}$ and 0.3 in domain ${1, 2},$ and the overall population proportion is estimated using $\hat{Y} (θ) / N$ for $\hat{Y} (θ)$ in (3.2). The lines show the RMSE for each $n$ for $θ = 1$ $(S_{2}$ is not used at all), $θ = 0$ (the estimated proportion in domain {1, 2} comes from $S_{2}$ and $S_{1}$ contributes only for estimating the proportion in domain ${1}),$ and $θ = 1 / 2 .$ In the bottom row of plots, the bias from $S_{2}$ begins dominating the RMSE even for relatively small sample sizes from $S_{1} .$ A small amount of measurement bias can cancel the supposed advantage from data integration. This example assumes the error in $S_{2}$ is from measurement bias, but is similar in spirit to the example in Meng (2018), which shows that even when the selection bias from a convenience sample is small, a simple random sample of size 400 may have more useful information than a convenience sample of size 500 million.

As Thompson (2019) noted, many of the methods that have been developed for combining data from multiple sources have been situation-specific, with solutions tailored to the particular circumstances of that problem. One would not expect these methods to perform as well, on average, for other situations because of regression-to-the-mean effects. Before adopting a data combination method, it may be desirable to perform additional simulation studies that consider outcomes when the model assumptions are not met.

Lohr and Raghunathan (2017) discussed issues for designing data collection systems that leverage multiple data sources, focusing on the situation in which a probability survey is used in conjunction with administrative data sources that cover parts of the population. They considered using administrative data sources for (1) improving the frame for the probability sample, (2) providing contextual information for interpreting the survey data, (3) providing information for nonresponse follow-up and bias assessment, and (4) designing the entire data collection system to take advantage of inexpensive data collection afforded by some of the frames while obtaining complete coverage from the probability survey. Thinking of the design problem in the multiple-frame paradigm can be helpful for the last point. Lohr and Raghunathan (2017) suggested that when Frame 1 is complete but expensive to sample, while Frame 2 is incomplete but less expensive to sample $-$ this includes the situation considered in Section 3.4 of this paper $-$ it may be desirable to use a two-phase screening survey for the sample from Frame 1 and rely on the sample from Frame 2 to supply information for domain ${1, 2}.$ That is the strategy that Waksberg and colleagues followed for designing the NSAF.

Figure 5.1 Root mean squared error of estimated population proportion under differing amounts of overlap and bias

Description for Figure 5.1

Figure illustrating the root mean squared error of estimated population proportion under differing amounts of overlap (0.25, 0.5 et 0.9 from left to right respectively) and bias (0, 0.01 et 0.03 from top to bottom respectively) when the first sample is a simple random sample and the second sample is a census of domain {1, 2}. In each graph, the lines show the RMSE for theta equal to 0, 0.5 and 1. In the bottom row of plots, the bias from the second sample begins dominating the RMSE even for relatively small sample sizes from the first sample.

When there may be measurement error or domain misclassification, however, a more robust design may be preferred. Optimal designs for dual-frame surveys allocate resources so as to minimize the variance of estimated population totals of interest for fixed cost. Designs that are optimal under Assumptions (A1) to (A6) are not necessarily optimal when some of those assumptions are violated. The multiple-frame structure allows consideration of potential design performance under relaxation of the assumptions.

Hartley (1962) showed that a dual-frame survey resulted in substantial improvements in efficiency for the situations in Figure 2.2(a, b) when data can be inexpensively obtained from Frame 2 and $N_{{1, 2}} / N$ is large. But when Frame 1 is complete, and the costs are comparable or $N_{{1, 2}} / N$ is small, the extra complexity from using a dual-frame survey may outweigh its advantages. If, in addition, there is likely to be domain misclassification or if $y$ is measured differently across the surveys, a dual-frame survey will be more complicated than a single sample from Frame 1 and may produce biased estimates.

On the other hand, using multiple data sources can also help assess nonsampling errors. Hartley (1974) wrote that when he presented his work on multiple-frame surveys at a conference, a discussant suggested that a “fairer” comparison would be to compare the variance from a dual-frame sample with that from a single sample of the same cost from the incomplete but cheap frame. Hartley responded (page 107): “The difficulty about this is, of course, that the bias through incompleteness may be of a magnitude which would make the single frame survey useless. If no a priori information on this bias is available, the two frame survey can in fact be regarded as an economical method of measuring this bias and eliminating it.”

Thus, it may be desirable to design the data collection system with multiple goals of (1) obtaining estimates of key population quantities with small mean squared error, (2) assessing nonsampling errors from data sources, and (3) providing information to improve future survey designs. Some of the issues to consider include:

Quality and stability of data sources. Classical multiple-frame survey design theory assumes that the frames are fixed. But it may be desired to use alternative data sources in which the frame is changing over time (for example, web-scraped prices) to help provide more timely information in coordination with a probability survey. Theory is needed on how to do this. If relying on data supplied by an external source, will those data continue to be available, and in the same form?
Measurement of domain membership. If possible, information should be collected from each source to allow accurate determination of domain membership. If the information items collected in administrative sources cannot be altered, sometimes items can be added to probability samples that allow domain determination.
Redundancy. For the situation in Section 3.4, where a census of part of the population is supplemented by a probability sample, a screening design might be optimal for $S_{1} .$ But a screening design does not allow assessment of potential differences in measurement from the two samples. Some degree of overlap may be desired among the data sources in order to assess differences among the domain estimates from different sources.
When an imputation model is developed for $y$ based on relationships between $y$ and $x$ from a data source with incomplete coverage, there is a danger that this model will not apply to the other parts of the population. It may be desired to take a small sample from the uncovered part of the population for purposes of evaluating the model.
Relative amounts of information for different domains. When data sources include administrative records or large convenience samples, there may be much more information about some parts of the population than others. The issue becomes how to obtain reliable information on the missing parts of the population. When that information comes from a sample, there may be high weight variation. Levine and Harter (2015) studied the issue of weight variation in dual-frame telephone surveys. Some of the weight variation may be reduced by obtaining additional administrative data sources on underrepresented subpopulations, but there is a danger that, as organizations move away from expensive probability samples, some subpopulations will be omitted from all sources.
Robustness to design assumptions. Designs that are optimal in theory often turn out to be less so in practice. Exploring the anticipated design performance under violations of the assumptions can be helpful for modifying a theoretically optimal design. In some cases, combining information across sources may result in worse estimates than using a single source, or it may be decided that the gains from combining data are not worth the extra trouble.
Waksberg (1998) advised: “Do not treat statistical procedures as mechanical operations; be prepared for the unexpected.” Having a design with some robustness to the assumptions gives flexibility for unexpected problems.
Auxiliary information. Many of the methods for integrating data rely on auxiliary information to perform imputations or predict domain membership. Mercer, Lau and Kennedy (2018) argued that for calibration, the richness of the auxiliary information is far more important than the particular method used to calibrate, and the same is true for other data combination methods. Having rich auxiliary information (beyond demographic variables) allows for better data integration models $-$ and for better assessment of their performance.

Waksberg argued that a survey statistician needs to look at the entirety of the problem, not just the optimal design for measuring a single variable. He said that a sampling statistician should “think not only about the specific questions that are asked, but the broader aspects of these questions: whether the questions make sense and can be solved, or whether they should be modified or changed. This is how I’ve tried to have people with whom I work think about the issues: Here’s a question, how do you respond to this specific question? Is it the right question? What statistics will you get by a narrow interpretation of the question, and is there a better way to proceed?” (Morganstein and Marker, 2000, page 304).

In this paper, I have suggested that multiple-frame surveys can serve as an organizing structure for designing and evaluating data-integration systems. This can help clarify the strengths and weaknesses of each source and, perhaps, result in a better way to proceed.

Acknowledgements

I am grateful to the Waksberg Award Committee for selecting me for this honor, to Mike Brick for helpful discussions, and to the associate editor and two referees whose constructive suggestions improved this paper.

References

Aidara, C.A.T. (2019). Quasi random resampling designs for multiple frame surveys. Statistica, 79, 321-338.

Alleva, G., Arbia, G., Falorsi, P.D., Nardelli, V. and Zuliani, A. (2020). A sampling approach for the estimation of the critical parameters of the SARS-CoV-2 epidemic: An operational design. https://arxiv.org/ftp/arxiv/papers/2004/2004.06068.pdf, last accessed March 28, 2021.

Arcos, A., Martínez, S., Rueda, M. and Martínez, H. (2017). Distribution function estimates from dual frame context. Journal of Computational and Applied Mathematics, 318, 242-252.

Arcos, A., Rueda, M., Trujillo, M. and Molina, D. (2015). Review of estimation methods for landline and cell phone surveys. Sociological Methods & Research, 44, 458-485.

Baffour, B., Haynes, M., Western, M., Pennay, D., Misson, S. and Martinez, A. (2016). Weighting strategies for combining data from dual-frame telephone surveys: Emerging evidence from Australia. Journal of Official Statistics, 32, 549-578.

Bankier, M.D. (1986). Estimators based on several stratified samples with applications to multiple frame surveys. Journal of the American Statistical Association, 81, 1074-1079.

Beaumont, J.-F. (2020). Are probability surveys bound to disappear for the production of official statistics? Survey Methodology, 46, 1, 1-28. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2020001/article/00001-eng.pdf.

Beaumont, J.-F., and Rao, J.N.K. (2021). Pitfalls of making inferences from non-probability samples: Can data integration through probability samples provide remedies? The Survey Statistician, 83, 11-22.

Biemer, P.P. (1984). Methodology for optimal dual frame sample design. Bureau of the Census SRD Research Report CENSUS/SRD/RR-84/07.

Brick, J.M., Flores-Cervantes, I.F., Lee, S. and Norman, G. (2011). Nonsampling errors in dual frame telephone surveys. Survey Methodology, 37, 1, 1-12. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2011001/article/11443-eng.pdf.

Brick, J.M., Dipko, S., Presser, S., Tucker, C. and Yuan, Y. (2006). Nonresponse bias in a dual frame sample of cell and landline numbers. Public Opinion Quarterly, 70, 780-793.

Brick, J.M., Shapiro, G., Flores-Cervantes, I., Ferraro, D. and Strickler, T. (1999). 1997 NSAF Snapshot Survey Weights. Washington, DC: Urban Institute.

Burke, J., Mohadjer, L., Green, J., Waksberg, J., Kirsch, I.S. and Kolstad, A. (1994). Composite estimation in national and state surveys. In Proceedings of the Survey Research Methods Section, 873-878. Alexandria, VA: American Statistical Association.

Chauvet, G. (2016). Variance estimation for the 2006 French housing survey. Mathematical Population Studies, 23, 147-163.

Chauvet, G., and de Marsac, G.T. (2014). Estimation methods on multiple sampling frames in two-stage sampling designs. Survey Methodology, 40, 2, 335-346. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2014002/article/14090-eng.pdf.

Chen, S., Stubblefield, A. and Stoner, J.A. (2021). Oversampling of minority populations through dual-frame surveys. Journal of Survey Statistics and Methodology, 9, 626-649.

Chen, Y., Li, P. and Wu, C. (2020). Doubly robust inference with nonprobability survey samples. Journal of the American Statistical Association, 115, 2011-2021.

Chipperfield, J., Chessman, J. and Lim, R. (2012). Combining household surveys using mass imputation to estimate population totals. Australian & New Zealand Journal of Statistics, 54, 223-238.

Choudhry, G.H. (1989). Cost-variable optimization of dual frame design for estimating proportions. In Proceedings of the Survey Research Methods Section, 566-571. Alexandria, VA: American Statistical Association.

Chu, A., Brick, J.M. and Kalton, G. (1999). Weights for combining surveys across time or space. Bulletin of the International Statistical Institute, 2, 103-104.

Citro, C.F. (2014). From multiple modes for surveys to multiple data sources for estimates. Survey Methodology, 40, 2, 137-161. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2014002/article/14128-eng.pdf.

Cohen, S.B., DiGaetano, R. and Waksberg, J. (1988). Sample design of the NMES survey of American Indians and Alaska Natives. In Proceedings of the Survey Research Methods Section, 740-745. Alexandria, VA: American Statistical Association.

Cunningham, P., Shapiro, G. and Brick, J.M. (1999). 1997 NSAF In-Person Survey Methods. Washington, DC: Urban Institute.

Dever, J.A. (2018). Combining probability and nonprobability samples to form efficient hybrid estimates: An evaluation of the common support assumption. In Proceedings of the 2018 Federal Committee on Statistical Methodology (FCSM) Research Conference. https://nces.ed.gov/FCSM/pdf/A4_Dever_2018FCSM.pdf, last accessed July 7, 2021.

Deville, J.-C., and Lavallée, P. (2006). Indirect sampling: The foundations of the generalized weight share method. Survey Methodology, 32, 2, 165-176. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2006002/article/9551-eng.pdf.

DiGaetano, R., Judkins, D. and Waksberg, J. (1995). Oversampling minority school children. In Proceedings of the Survey Research Methods Section, 503-508. Alexandria, VA: American Statistical Association.

Fay, R.E., and Herriot, R.A. (1979). Estimates of income for small places: An application of James-Stein procedures to census data. Journal of the American Statistical Association, 74, 269-277.

Ferraz, C., and Vogel, F. (2015). Multiple frame sampling. In Handbook on Master Sampling Frames for Agricultural Statistics: Frame Development, Sample Design and Estimation, 89-106. Rome: Food and Agriculture Organization of the United Nations.

Fuller, W.A., and Burmeister, L.F. (1972). Estimators for samples selected from two overlapping frames. In Proceedings of the Social Statistics Section, 245-249. Alexandria, VA: American Statistical Association.

Groves, R.M., and Lepkowski, J.M. (1985). Dual frame, mixed mode survey designs. Journal of Official Statistics, 1, 263-286.

Groves, R.M., and Lepkowski, J.M. (1986). An experimental implementation of a dual frame telephone sample design. In Proceedings of the Survey Research Methods Section, 340-345. Alexandria, VA: American Statistical Association.

Groves, R.M., and Wissoker, D. (1999). 1997 NSAF Early Nonresponse Studies. Washington, DC: Urban Institute.

Hartley, H.O. (1962). Multiple frame surveys. In Proceedings of the Social Statistics Section, 203-206. Alexandria, VA: American Statistical Association.

Hartley, H.O. (1974). Multiple frame methodology and selected applications. Sankhyā, Series C, 36, 99-118.

Haziza, D., and Lesage, É. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics, 32, 129-145.

Hendricks, S., Igra, A. and Waksberg, J. (1980). Ethnic stratification in the California Hypertension Survey. In Proceedings of the Survey Research Methods Section, 680-685. Alexandria, VA: American Statistical Association.

Kalton, G., and Anderson, D.W. (1986). Sampling rare populations. Journal of the Royal Statistical Society. Series A (General), 149, 65-82.

Kim, J.K., and Rao, J.N.K. (2012). Combining data from two independent surveys: A model-assisted approach. Biometrika, 99, 85-100.

Kim, J.K., and Tam, S.-M. (2021). Data integration by combining big data and survey sample data for finite population inference. International Statistical Review, 89, 382-401.

Kott, P.S., and Vogel, F.A. (1995). Multiple-frame business surveys. In Business Survey Methods, (Eds., B.G. Cox, D.A. Binder, B.N. Chinnappa, A. Christianson, M.J. Colledge and P.S. Kott), 185-203. New York: John Wiley & Sons, Inc.

Lavallée, P. (2007). Indirect Sampling. New York: Springer.

Lavallée, P., and Rivest, L.-P. (2012). Capture-recapture sampling and indirect sampling. Journal of Official Statistics, 28, 1-27.

Lee, S. (2006). Propensity score adjustment as a weighting scheme for volunteer panel web surveys. Journal of Official Statistics, 22, 329-349.

Lee, S., and Valliant, R. (2009). Estimation for volunteer panel web surveys using propensity score adjustment and calibration adjustment. Sociological Methods & Research, 37, 319-343.

Lepkowski, J.M., and Groves, R.M. (1986). A mean squared error model for dual frame, mixed mode survey design. Journal of the American Statistical Association, 81, 930-937.

Levine, B., and Harter, R. (2015). Optimal allocation of cell-phone and landline respondents in dual-frame surveys. Public Opinion Quarterly, 79, 91-104.

Lin, D., Liu, Z. and Stokes, L. (2019). A method to correct for frame membership error in dual frame estimators. Survey Methodology, 45, 3, 543-565. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2019003/article/00008-eng.pdf.

Lohr, S.L. (2007). Recent developments in multiple frame surveys. In Proceedings of the Survey Research Methods Section, 3257-3264. Alexandria, VA: American Statistical Association.

Lohr, S.L. (2011). Alternative survey sample designs: Sampling with multiple overlapping frames. Survey Methodology, 37, 2, 197-213. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2011002/article/11608-eng.pdf.

Lohr, S.L. (2014). When should a multiple frame survey be used? The Survey Statistician, 69 (January), 17-21.

Lohr, S.L. (2022). Sampling: Design and Analysis, Third Edition. Boca Raton, FL: CRC Press.

Lohr, S.L., and Brick, J.M. (2014). Allocation for dual frame telephone surveys with nonresponse. Journal of Survey Statistics and Methodology, 2, 388-409.

Lohr, S.L., and Raghunathan, T.E. (2017). Combining survey data with other data sources. Statistical Science, 32, 293-312.

Lohr, S.L., and Rao, J.N.K. (2000). Inference from dual frame surveys. Journal of the American Statistical Association, 95, 271-280.

Lohr, S.L., and Rao, J.N.K. (2006). Estimation in multiple-frame surveys. Journal of the American Statistical Association, 101, 1019-1030.

Lu, B., Peng, J. and Sahr, T. (2013). Estimation bias of different design and analytical strategies in dual-frame telephone surveys: An empirical evaluation. Journal of Statistical Computation and Simulation, 83, 2352-2368.

Lu, B., Sahr, T., Iachan, R., Denker, M., Duffy, T. and Weston, D. (2013). Design and analysis of dual-frame telephone surveys for health policy research. World Medical & Health Policy, 5, 217-232.

Lu, Y. (2014a). Chi-squared tests in dual frame surveys. Survey Methodology, 40, 2, 323-334. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2014002/article/14096-eng.pdf.

Lu, Y. (2014b). Regression coefficient estimation in dual frame surveys. Communications in Statistics – Simulation and Computation, 43, 1675-1684.

Lu, Y., and Lohr, S. (2010). Gross flow estimation in dual frame surveys. Survey Methodology, 36, 1, 13-22. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2010001/article/11248-eng.pdf.

Lu, Y., Fu, Y. and Zhang, G. (2021). Nonparametric regression estimators in dual frame surveys. Communications in Statistics – Simulation and Computation, 50, 854-864.

Marks, E., and Waksberg, J. (1966). Evaluation of coverage in the 1960 Census of Population through case-by-case checking. In Proceedings of the Social Statistics Section, 62-70. Alexandria, VA: American Statistical Association.

Mecatti, F. (2007). A single frame multiplicity estimator for multiple frame surveys. Survey Methodology, 33, 2, 151-157. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2007002/article/10492-eng.pdf.

Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. The Annals of Applied Statistics, 12, 685-726.

Mercer, A., Lau, A. and Kennedy, C. (2018). For Weighting Online Opt-In Samples, What Matters Most? Washington, DC: Pew Research.

Metcalf, P., and Scott, A. (2009). Using multiple frames in health surveys. Statistics in Medicine, 28, 1512-1523.

Montanari, G.E. (1987). Post-sampling efficient QR-prediction in large-sample surveys. International Statistical Review, 55, 191-202.

Montanari, G.E. (1998). On regression estimation of finite population means. Survey Methodology, 24, 1, 69-77. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1998001/article/3911-eng.pdf.

Morganstein, D., and Marker, D. (2000). A conversation with Joseph Waksberg. Statistical Science, 15, 299-312.

National Academies of Sciences, Engineering, and Medicine (2017). Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy. Washington, DC: National Academies Press.

National Academies of Sciences, Engineering, and Medicine (2018). Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps. Washington, DC: National Academies Press.

O’Muircheartaigh, C., and Pedlow, S. (2002). Combining samples vs. cumulating cases: A comparison of two weighting strategies in NLSY97. In Proceedings of the Survey Research Methods Section, 2557-2562. Alexandria, VA: American Statistical Association.

Ranalli, M.G., Arcos, A., Rueda, M.d.M. and Teodoro, A. (2016). Calibration estimation in dual-frame surveys. Statistical Methods & Applications, 25, 321-349.

Rao, J.N.K. (1994). Estimating totals and distribution functions using auxiliary information at the estimation stage. Journal of Official Statistics, 10, 153-165.

Rao, J.N.K. (2021). On making valid inferences by combining data from surveys and other sources. Sankhyā, Series B, 83-B, 242-272.

Rao, J.N.K., and Molina, I. (2015). Small Area Estimation, 2^nd Ed. Hoboken, NJ: Wiley.

Rivers, D. (2007). Sampling for web surveys. Paper presented at the Joint Statistical Meetings.

Rueda, M.d.M., Arcos, A., Molina, D. and Ranalli, M.G. (2018). Estimation techniques for ordinal data in multiple frame surveys with complex sampling designs. International Statistical Review, 86, 51-67.

Saegusa, T. (2019). Large sample theory for merged data from multiple sources. The Annals of Statistics, 47, 1585-1615.

Särndal, C.-E., and Lundström, S. (2005). Estimation in Surveys with Nonresponse. Hoboken, NJ: Wiley.

Skinner, C.J. (1991). On the efficiency of raking ratio estimation for multiple frame surveys. Journal of the American Statistical Association, 86, 779-784.

Skinner, C.J., and Rao, J.N.K. (1996). Estimation in dual frame surveys with complex designs. Journal of the American Statistical Association, 91, 349-356.

Thompson, M.E. (2019). Combining data from new and traditional sources in population surveys. International Statistical Review, 87, S79-S89.

Traugott, M.W., Groves, R.M. and Lepkowski, J.M. (1987). Using dual frame designs to reduce nonresponse in telephone surveys. Public Opinion Quarterly, 51, 522-539.

Urban Institute and Child Trends (2007). National Survey of America’s Families (NSAF), 1997. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor].

Valliant, R., and Dever, J.A. (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods & Research, 40, 105-137.

Waksberg, J. (1986). Discussion of papers on new approaches in telephone sample design. In Proceedings of the Survey Research Methods Section, 367-369. Alexandria, VA: American Statistical Association.

Waksberg, J. (1995). Distribution of poverty in census block groups (BGs) and implications for sample design. In Proceedings of the Survey Research Methods Section, 497-502. Alexandria, VA: American Statistical Association.

Waksberg, J. (1998). The Hansen era: Statistical research and its implementation at the US Census Bureau, 1940-1970. Journal of Official Statistics, 14, 119-135.

Waksberg, J., and Pritzker, L. (1969). Changes in census methods. Journal of the American Statistical Association, 64, 1141-1149.

Waksberg, J., Judkins, D. and Massey, J.T. (1997b). Geographic-based oversampling in demographic surveys of the United States. Survey Methodology, 23, 1, 61-71. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1997001/article/3107-eng.pdf.

Waksberg, J., Brick, J.M., Shapiro, G., Flores-Cervantes, I. and Bell, B. (1997a). Dual-frame RDD and area sample with particular focus on low-income population. In Proceedings of the Survey Research Methods Section, 713-718. Alexandria, VA: American Statistical Association.

Waksberg, J., Brick, J.M., Shapiro, G., Flores-Cervantes, I., Bell, B. and Ferraro, D. (1998). Nonresponse and coverage adjustment for a dual-frame survey. Proceedings: Symposium 97, New Directions in Surveys and Censuses, 193-198. Ottawa: Statistics Canada.

Wolter, K.M., Ganesh, N., Copeland, K.R., Singleton, J.A. and Khare, M. (2019). Estimation tools for reducing the impact of sampling and nonresponse errors in dual-frame RDD telephone surveys. Statistics in Medicine, 38, 4718-4732.

Yang, S., and Kim, J.K. (2020). Statistical data integration in survey sampling: A review. Japanese Journal of Statistics and Data Science, 3, 625-650.

Yang, S., Kim, J.K. and Hwang, Y. (2021). Integration of data from probability surveys and big found data for finite population inference using mass imputation. Survey Methodology, 47, 1, 29-58. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2021001/article/00004-eng.pdf.

Zhang, L.-C. (2019). Log-linear models of erroneous list data. In Analysis of Integrated Data (Eds., L.-C. Zhang and R.L. Chambers), 197-218. Boca Raton, FL: CRC Press.

Zhang, L.-C., and Chambers, R.L. (Eds.) (2019). Analysis of Integrated Data. Boca Raton, FL: CRC Press.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2022-01-06

Language selection

Search and menus

Search

Multiple-frame surveys for a multiple-data-source world
Section 5. Design of data collection systems

Acknowledgements

References

Multiple-frame surveys for a multiple-data-source world Section 5. Design of data collection systems

Acknowledgements

References

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Multiple-frame surveys for a multiple-data-source world
Section 5. Design of data collection systems