Survey Methodology
Integration of data from probability surveys and big found data for finite population inference using mass imputation
by Shu Yang, Jae Kwang Kim and Youngdeok HwangNote 1
- Release date: June 24, 2021
Abstract
Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining data from a probability survey and big found data. We focus on the case when the study variable is observed in the big data only, but the other auxiliary variables are commonly observed in both data. Unlike the usual imputation for missing data analysis, we create imputed values for all units in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency.
Key Words: Calibration weighting; Data fusion; Generalized additive model; Matching; Nearest neighbor imputation; Post stratification.
Table of contents
- Section 1. Introduction
- Section 2. Basic setup
- Section 3. Methodology
- Section 4. Other techniques for mass imputation
- Section 5. Regression calibration
- Section 6. Empirical experiments
- Section 7. Real-data application
- Section 8. Discussion
- Acknowledgements
- Appendix
- References
How to cite
Yang, S., Kim, J.K. and Hwang, Y. (2021). Integration of data from probability surveys and big found data for finite population inference using mass imputation. Survey Methodology, Statistics Canada, Catalogue No. 12-001-X, Vol. 47, No. 1. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2021001/article/00004-eng.htm.
Note
- Date modified: