Survey Methodology
Handling non‑probability samples through inverse probability weighting with an application to Statistics Canada’s crowdsourcing data
by Jean-François Beaumont, Keven Bosa, Andrew Brennan, Joanne Charlebois and Kenneth ChuNote 1
- Release date: June 25, 2024
Abstract
Non-probability samples are being increasingly explored in National Statistical Offices as an alternative to probability samples. However, it is well known that the use of a non-probability sample alone may produce estimates with significant bias due to the unknown nature of the underlying selection mechanism. Bias reduction can be achieved by integrating data from the non-probability sample with data from a probability sample provided that both samples contain auxiliary variables in common. We focus on inverse probability weighting methods, which involve modelling the probability of participation in the non-probability sample. First, we consider the logistic model along with pseudo maximum likelihood estimation. We propose a variable selection procedure based on a modified Akaike Information Criterion (AIC) that properly accounts for the data structure and the probability sampling design. We also propose a simple rank-based method of forming homogeneous post-strata. Then, we extend the Classification and Regression Trees (CART) algorithm to this data integration scenario, while again properly accounting for the probability sampling design. A bootstrap variance estimator is proposed that reflects two sources of variability: the probability sampling design and the participation model. Our methods are illustrated using Statistics Canada’s crowdsourcing and survey data.
Key Words: Akaike Information Criterion; Classification and Regression Trees; Logistic model; Participation probability; Statistical data integration; Variable selection.Table of contents
- Section 1. Introduction
- Section 2. Data integration scenario
- Section 3. Estimation of the participation probability using a logistic model
- Section 4. Estimation of the participation probability using nppCART
- Section 5. Bootstrap variance estimation
- Section 6. Empirical evaluation of methods using real data
- Section 7. Conclusion
- Appendix 1
- Appendix 2
- Appendix 3
- References
How to cite
Beaumont, J.-F., Bosa, K., Brennan, A., Charlebois, J. and Chu, K. (2024). Handling non-probability samples through inverse probability weighting with an application to Statistics Canada’s crowdsourcing data. Survey Methodology, Statistique Canada, n° 12‑001‑X au catalogue, vol. 50, n° 1. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2024001/article/00004-eng.htm.
Note
- Date modified: