Survey Methodology
Integrating probability and non-probability samples through deep learning-based mass imputation
by Sixia Chen, Chao Xu and James CutlerNote 1
- Release date: December 23, 2025
Abstract
Although probability samples have been regarded as the gold standard to collect information for population-based study, non-probability samples have been used frequently in practice due to low cost, convenience, and the lack of the sampling frame for the survey. Naïve estimates based on non-probability samples without any adjustments may be misleading due to selection bias. Recently, a valid data integration approach that includes mass imputation, propensity score weighting, and calibration has been used to improve the representativeness of non-probability samples. The effectiveness of the mass imputation approach depends on the underlying model assumptions. In this paper, we propose using deep learning for the mass imputation in the combining of probability and non-probability samples and compare it with several modern machine learning-based mass imputation approaches, including generalized additive modeling, regression tree, random forest, and XG-boosting. In the simulation study, deep learning-based approaches have been shown to be more robust and effective than other mass imputation approaches against the failure of underlying model assumptions under non-linearity scenarios.
Key Words: Data integration; Machine learning; Nonprobability sample; Selection bias; Variance estimation.
Table of contents
- Section 1. Introduction
- Section 2. Basic setups
- Section 3. Proposed method
- Section 4. Simulation studies
- Section 5. Discussion
- Acknowledgements
- References
How to cite
Chen, S., Xu, C. and Cutler, J. (2025). Integrating probability and non-probability samples through deep learning-based mass imputation. Survey Methodology, 51(2), 493-508. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2025002/article/00007-eng.pdf.
Note
- Date modified: