Survey Methodology

Release date: June 25, 2024

The journal Survey Methodology Volume 50, Number 1 (June 2024) contains the following fourteen papers:

Special issue for papers presented at the 29th Morris Hansen Lecture

Preface to the special issue for papers presented at the 29th Morris Hansen Lecture on the use of nonprobability samples

by Partha Lahiri

Abstract

This paper is an introduction to the special issue on the use of nonprobability samples featuring three papers that were presented at the 29th Morris Hansen Lecture by Courtney Kennedy, Yan Li and Jean-François Beaumont.

HTML version  PDF version

Exploring the assumption that commercial online nonprobability survey respondents are answering in good faith

by Courtney Kennedy, Andrew Mercer and Arnold Lau

Abstract

Statistical approaches developed for nonprobability samples generally focus on nonrandom selection as the primary reason survey respondents might differ systematically from the target population. Well-established theory states that in these instances, by conditioning on the necessary auxiliary variables, selection can be rendered ignorable and survey estimates will be free of bias. But this logic rests on the assumption that measurement error is nonexistent or small. In this study we test this assumption in two ways. First, we use a large benchmarking study to identify subgroups for which errors in commercial, online nonprobability samples are especially large in ways that are unlikely due to selection effects. Then we present a follow-up study examining one cause of the large errors: bogus responding (i.e., survey answers that are fraudulent, mischievous or otherwise insincere). We find that bogus responding, particularly among respondents identifying as young or Hispanic, is a significant and widespread problem in commercial, online nonprobability samples, at least in the United States. This research highlights the need for statisticians working with commercial nonprobability samples to address bogus responding and issues of representativeness – not just the latter.

HTML version  PDF version

Comments on “Exploring the assumption that commercial online nonprobability survey respondents are answering in good faith”

by J. Michael Brick

Abstract

Nonprobability samples are quick and low-cost and have become popular for some types of survey research. Kennedy, Mercer and Lau examine data quality issues associated with opt-in nonprobability samples frequently used in the United States. They show that the estimates from these samples have serious problems that go beyond representativeness. A total survey error perspective is important for evaluating all types of surveys.

HTML version  PDF version

Comments on “Exploring the assumption that commercial online nonprobability survey respondents are answering in good faith”

by Michael R. Elliott

Abstract

Kennedy, Mercer, and Lau explore misreporting by respondents in non-probability samples and discover a new feature, namely that of deliberate misreporting of demographic characteristics. This finding suggests that the “arms race” between researchers and those determined to disrupt the practice of social science is not over and researchers need to account for such respondents if using high-quality probability surveys to help reduce error in non-probability samples.

HTML version  PDF version

Comments on “Exploring the assumption that commercial online nonprobability survey respondents are answering in good faith”

by Aditi Sen

Abstract

This discussion summarizes the interesting new findings around measurement errors in opt-in surveys by Kennedy, Mercer and Lau (KML). While KML enlighten readers about “bogus responding” and possible patterns in them, this discussion suggests combining these new-found results with other avenues of research in nonprobability sampling, such as improvement of representativeness.

HTML version  PDF version

Authors’ response to comments on “Exploring the assumption that commercial online nonprobability survey respondents are answering in good faith”

by Courtney Kennedy, Andrew Mercer and Arnold Lau

Abstract

Our comments respond to discussion from Sen, Brick, and Elliott. We weigh the potential upside and downside of Sen’s suggestion of using machine learning to identify bogus respondents through interactions and improbable combinations of variables. We join Brick in reflecting on bogus respondents’ impact on the state of commercial nonprobability surveys. Finally, we consider Elliott’s discussion of solutions to the challenge raised in our study.

HTML version  PDF version

Exchangeability assumption in propensity-score based adjustment methods for population mean estimation using non-probability samples

by Yan Li

Abstract

Nonprobability samples emerge rapidly to address time-sensitive priority topics in different areas. These data are timely but subject to selection bias. To reduce selection bias, there has been wide literature in survey research investigating the use of propensity-score (PS) adjustment methods to improve the population representativeness of nonprobability samples, using probability-based survey samples as external references. Conditional exchangeability (CE) assumption is one of the key assumptions required by PS-based adjustment methods. In this paper, I first explore the validity of the CE assumption conditional on various balancing score estimates that are used in existing PS-based adjustment methods. An adaptive balancing score is proposed for unbiased estimation of population means. The population mean estimators under the three CE assumptions are evaluated via Monte Carlo simulation studies and illustrated using the NIH SARS-CoV-2 seroprevalence study to estimate the proportion of U.S. adults with COVID-19 antibodies from April 01 ‒ August 04, 2020.

HTML version  PDF version

Comments on “Exchangeability assumption in propensity-score based adjustment methods for population mean estimation using non-probability samples”

by Jae Kwang Kim and Yonghyun Kwon

Abstract

Pseudo weight construction for data integration can be understood in the two-phase sampling framework. Using the two-phase sampling framework, we discuss two approaches to the estimation of propensity scores and develop a new way to construct the propensity score function for data integration using the conditional maximum likelihood method. Results from a limited simulation study are also presented.

HTML version  PDF version

Comments on “Exchangeability assumption in propensity-score based adjustment methods for population mean estimation using non-probability samples”:

Causal inference, non-probability sample, and finite population

by Takumi Saegusa

Abstract

In some of non-probability sample literature, the conditional exchangeability assumption is considered to be necessary for valid statistical inference. This assumption is rooted in causal inference though its potential outcome framework differs greatly from that of non-probability samples. We describe similarities and differences of two frameworks and discuss issues to consider when adopting the conditional exchangeability assumption in non-probability sample setups. We also discuss the role of finite population inference in different approaches of propensity scores and outcome regression modeling to non-probability samples.

HTML version  PDF version

Author’s response to comments on “Exchangeability assumption in propensity-score based adjustment methods for population mean estimation using non-probability samples”

by Yan Li

Abstract

In this rejoinder, I address the comments from the discussants, Dr. Takumi Saegusa, Dr. Jae-Kwang Kim and Ms. Yonghyun Kwon. Dr. Saegusa’s comments about the differences between the conditional exchangeability (CE) assumption for causal inferences versus the CE assumption for finite population inferences using nonprobability samples, and the distinction between design-based versus model-based approaches for finite population inference using nonprobability samples, are elaborated and clarified in the context of my paper. Subsequently, I respond to Dr. Kim and Ms. Kwon’s comprehensive framework for categorizing existing approaches for estimating propensity scores (PS) into conditional and unconditional approaches. I expand their simulation studies to vary the sampling weights, allow for misspecified PS models, and include an additional estimator, i.e., scaled adjusted logistic propensity estimator (Wang, Valliant and Li (2021), denoted by sWBS). In my simulations, it is observed that the sWBS estimator consistently outperforms or is comparable to the other estimators under the misspecified PS model. The sWBS, as well as WBS or ABS described in my paper, do not assume that the overlapped units in both the nonprobability and probability reference samples are negligible, nor do they require the identification of overlap units as needed by the estimators proposed by Dr. Kim and Ms. Kwon.

HTML version  PDF version

Handling non-probability samples through inverse probability weighting with an application to Statistics Canada’s crowdsourcing data

by Jean-François Beaumont, Keven Bosa, Andrew Brennan, Joanne Charlebois and Kenneth Chu

Abstract

Non-probability samples are being increasingly explored in National Statistical Offices as an alternative to probability samples. However, it is well known that the use of a non-probability sample alone may produce estimates with significant bias due to the unknown nature of the underlying selection mechanism. Bias reduction can be achieved by integrating data from the non-probability sample with data from a probability sample provided that both samples contain auxiliary variables in common. We focus on inverse probability weighting methods, which involve modelling the probability of participation in the non-probability sample. First, we consider the logistic model along with pseudo maximum likelihood estimation. We propose a variable selection procedure based on a modified Akaike Information Criterion (AIC) that properly accounts for the data structure and the probability sampling design. We also propose a simple rank-based method of forming homogeneous post-strata. Then, we extend the Classification and Regression Trees (CART) algorithm to this data integration scenario, while again properly accounting for the probability sampling design. A bootstrap variance estimator is proposed that reflects two sources of variability: the probability sampling design and the participation model. Our methods are illustrated using Statistics Canada’s crowdsourcing and survey data.

HTML version  PDF version

Comments on “Handling non-probability samples through inverse probability weighting with an application to Statistics Canada’s crowdsourcing data”

by Julie Gershunskaya and Vladislav Beresovsky

Abstract

Beaumont, Bosa, Brennan, Charlebois and Chu (2024) propose innovative model selection approaches for estimation of participation probabilities for non-probability sample units. We focus our discussion on the choice of a likelihood and parameterization of the model, which are key for the effectiveness of the techniques developed in the paper. We consider alternative likelihood and pseudo-likelihood based methods for estimation of participation probabilities and present simulations implementing and comparing the AIC based variable selection. We demonstrate that, under important practical scenarios, the approach based on a likelihood formulated over the observed pooled non-probability and probability samples performed better than the pseudo-likelihood based alternatives. The contrast in sensitivity of the AIC criteria is especially large for small probability sample sizes and low overlap in covariates domains.

HTML version  PDF version

Comments on “Handling non-probability samples through inverse probability weighting with an application to Statistics Canada’s crowdsourcing data”

by Changbao Wu

Abstract

We provide comparisons among three parametric methods for the estimation of participation probabilities and some brief comments on homogeneous groups and post-stratification.

HTML version  PDF version

Authors’ response to comments on “Handling non-probability samples through inverse probability weighting with an application to Statistics Canada’s crowdsourcing data”:

Some new developments on likelihood approaches to estimation of participation probabilities for non-probability samples

by Jean-François Beaumont, Keven Bosa, Andrew Brennan, Joanne Charlebois and Kenneth Chu

Abstract

Inspired by the two excellent discussions of our paper, we offer some new insights and developments into the problem of estimating participation probabilities for non-probability samples. First, we propose an improvement of the method of Chen, Li and Wu (2020), based on best linear unbiased estimation theory, that more efficiently leverages the available probability and non-probability sample data. We also develop a sample likelihood approach, similar in spirit to the method of Elliott (2009), that properly accounts for the overlap between both samples when it can be identified in at least one of the samples. We use best linear unbiased prediction theory to handle the scenario where the overlap is unknown. Interestingly, our two proposed approaches coincide in the case of unknown overlap. Then, we show that many existing methods can be obtained as a special case of a general unbiased estimating function. Finally, we conclude with some comments on nonparametric estimation of participation probabilities.

HTML version  PDF version


Date modified: