Survey Methodology
Handling non‑probability samples through inverse probability weighting with an application to Statistics Canada’s crowdsourcing data

by Jean-François Beaumont, Keven Bosa, Andrew Brennan, Joanne Charlebois and Kenneth ChuNote 1

  • Release date: June 25, 2024

Abstract

Non-probability samples are being increasingly explored in National Statistical Offices as an alternative to probability samples. However, it is well known that the use of a non-probability sample alone may produce estimates with significant bias due to the unknown nature of the underlying selection mechanism. Bias reduction can be achieved by integrating data from the non-probability sample with data from a probability sample provided that both samples contain auxiliary variables in common. We focus on inverse probability weighting methods, which involve modelling the probability of participation in the non-probability sample. First, we consider the logistic model along with pseudo maximum likelihood estimation. We propose a variable selection procedure based on a modified Akaike Information Criterion (AIC) that properly accounts for the data structure and the probability sampling design. We also propose a simple rank-based method of forming homogeneous post-strata. Then, we extend the Classification and Regression Trees (CART) algorithm to this data integration scenario, while again properly accounting for the probability sampling design. A bootstrap variance estimator is proposed that reflects two sources of variability: the probability sampling design and the participation model. Our methods are illustrated using Statistics Canada’s crowdsourcing and survey data.

Key Words:  Akaike Information Criterion; Classification and Regression Trees; Logistic model; Participation probability; Statistical data integration; Variable selection.

Table of contents

How to cite

Beaumont, J.-F., Bosa, K., Brennan, A., Charlebois, J. and Chu, K. (2024). Handling non-probability samples through inverse probability weighting with an application to Statistics Canada’s crowdsourcing data. Survey Methodology, Statistique Canada,  12‑001‑X au catalogue, vol. 50,  1. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2024001/article/00004-eng.htm.

Note

Date modified: