This response contains additional remarks on a few selected issues raised by the discussants.
Key Words: Data defect correlation; Double robustness; Inverse probability weighting; Model assumptions; Model-based prediction; Validation sample.
Let me start by thanking the Editor of Survey
Methodology, Jean-François Beaumont, for organizing the discussions and
putting together a glamour array of discussants. Each discussant looked at the
topic of non-probability survey samples, and more generally topics on data
integration and combining data from multiple sources, with some unique
perspectives. I have enjoyed reading the discussions and I believe they are
significant contributions to dealing with non-probability and other types of
samples with selection bias. In what follows, I will make some additional
remarks on a few selected issues raised by the discussants.
Michael A. Bailey
Dr. Bailey focused on the limitations of the estimation
methods I presented under the assumptions A1-A4, and called for further development when these assumptions,
and the so-called “MAR assumption” A1
in particular, are violated. Bailey used non-probabilistic polling as an
example to argue that “non-response (can indeed) depends on the study variable”
and the danger of A1 being
violated is real.
While the criticism on the limitations of the methods
reviewed in my paper is fair and square, the statements “(Wu) is fishing in one
fairly specific corner of the pond” and “shying away from MNAR models” seem to
show significant underappreciation on the importance of methodological
development under the standard assumptions A1-A4 which were used by several authors on non-probability survey
samples. First of all, the assumption A1
is on the participation (or inclusion/selection) mechanism for non-probability
samples, which is not the same as “non-response”. There are many scenarios
where these assumptions can indeed be justified, especially for surveys using
web- or phone-panels where the initial participation in those panels depends
largely on certain demographic variables. Second, participation behaviour in
non-probability surveys can be confounded by certain study variables during
data collection in the same way we face in probability surveys on non-response,
which is exactly how the current literature on non-probability surveys has been
evolving in dealing with those issues. Third, any methodological advances in
addressing the so-called “MNAR models” for non-probability surveys would
require the foundation and thorough understanding established under the
assumptions A1-A4.
Bailey also stated that “while MAR violations are a
problem in probability sampling (arising due to non-response among randomly
contacted individuals), MAR violations are more serious in a non-probability
world”. I heartily concur. As a matter of fact, violations of the positivity
assumption A2 are as serious as
violations of the “MAR assumption” A1,
and the two are intercorrelated. Violations of A2 imply that for some units in the target population,
leading to the undercoverage problem that is as notorious as non-response. When
A2 is violated but A1 holds, it is often believed that
model-based prediction estimators can mitigate the biases due to undercoverage.
Under the assumption A1 the
sample inclusion indicator variable and the study variable are conditionally independent given which implies that
It follows that a valid prediction model can be built using the observed data (i.e., units with Unfortunately, the equation (1) implicitly
requires and prediction-based estimators are not immune
to potential undercoverage biases. Bailey’s call for “a framework that
encompasses the possibility of MAR violations” is in line with some of the
current research effort on dealing with undercoverage and “non-ignorable”
participation mechanisms for non-probability survey samples. See, for instance,
Chen, Li and Wu (2023), Cho, Kim and Qiu (2022) and Yuan, Li and Wu (2022),
among others. In a nutshell, valid statistical inferences under those scenarios
require either external data such as a validation sample or additional
assumptions such as the existence of instrumental variables.
I am on the exact same page of discontent as Bailey with
the “missing at random” label, since the term might be confused with “randomly
missing” (Wu and Thompson, 2020, page 195). The term “ignorable” is also
an unfortunate choice of terminology for missing data and causal inference
literature, since it certainly cannot be ignored by the data analyst (Rivers,
2007). I use the standard term “propensity scores” for non-probability samples,
while several other authors are in favour of “participation probabilities”,
including Beaumont (2020) and Rao (2021).
Michael R. Elliott
Dr. Elliott discussed several issues with augmented
materials and an expanded list of references. They are important additions to
the current topic, especially the reviews on “additional approaches to
combining data from probability and non-probability surveys” and sensitivity
analysis on “unverifiable assumptions”.
Elliott’s discussions on distinctions between
descriptive parameters and analytic parameters and weighting versus modelling
raised the critical issue of efficiency of the IPW estimators in practice. It
has been known for probability survey samples that the inverse probability
weighted Horvitz-Thompson estimator of the population total is extremely inefficient (in terms of large
variance) when the sample selection probabilities are unequal but have very weak correlation to
the study variable although the estimator remains unbiased under
such scenarios. Basu’s elephant example (Basu, 1971) described a “convincing
case” where the inverse probability weighted and unbiased Horvitz-Thompson
estimator failed miserably, leading to the dismissal of the circus
statistician. Discussions on weighting versus modelling, i.e., the IPW
estimators versus model-based prediction estimators for descriptive population
parameters, are highly relevant for both theoretical developments and practical
applications. Our job as a statistician in dealing with non-probability survey
samples could be very much in limbo unless we develop solid guidelines and
diagnostic tools for choosing suitable approaches with the given dataset and
inferential problems.
Elliott echoed my call for a few large scale probability
surveys with rich information on auxiliary variables with the statement “it is
increasingly critical for an organized and ideally government funded stable of high-quality
probability surveys to be put into place for routine data collection”. His
comments on new areas of research on issues with privacy and confidentiality
due to the need for microdata under the context of analyzing non-probability
survey samples are a visionary call and deserve an increased amount of
attention from the research community.
Zhonglei Wang and Jae kwang Kim
Dr. Wang and Dr. Kim presented two new approaches to propensity
score based estimation, one uses the so-called information projection through a
density ratio model and the other employs uniformly calibration functions over
a reproducing kernel Hilbert space. These are new adventures in the field, and
Kim and his collaborators have the experience and the analytic power to move
the research forward.
The starting point for both approaches is the following
equation which connects the propensity scores to the density ratios,
The propensity scores only require the model on given and Justification of the equation given above,
however, requires a joint randomization framework involving both the model for the propensity scores and the
superpopulation model on From a consistency view point regarding the
final estimator of the finite population mean of the joint framework imposes very little
restrictions if the density ratios are modelled nonparametrically. The
consequential impact of the approach is on variance and variance estimation.
Variance of an estimator under a joint randomization framework involves more
than one component, and variance estimation has further complications if
nonparametric procedures are involved. Efficiency comparisons between the
proposed methods and some of the existing methods need to be carried out under
suitable settings. I am eager to see further developments on the proposed
methods.
Sharon L. Lohr
Dr. Lohr’s extended discussions on diagnostic tools for
assessing model assumptions are highly valuable to the topic. Her explorations
of existing ideas and methods and the adaptations to the current setting
highlight the seemingly different but deeply connected issues faced by both
nonprobability and probability survey samples. One such issue is the
undercoverage problem (i.e., violations of assumption A2) and the interweave of assumptions A1 and A2. Lohr was
rightfully concerned with prediction based estimators where the prediction
model of given is built based on the nonprobability sample and the mass imputation estimator is computed
using observed in the reference probability sample a scenario where each of the two assumptions A1 and A2 does not stand alone. The undercoverage problem is an example
where “space-age procedures will not rescue stone-age data”. Lohr advocated to
“take a small probability sample to investigate assumptions”, which is of
necessity in theory since rigorously defendable methods under certain scenarios
require validation samples. Developments of compromising strategies with
existing data sources, however, are more appealing but also more challenging in
practice.
Lohr’s observation “nonprobability samples have the
potential to improve data equity” is an important one, since inclusion of units
from groups which may be invisible in probability samples can be boosted
relatively easily for nonprobability samples. Lohr also observed that
“historically disadvantaged groups may be underrepresented in all data sources,
including (nonprobability samples)”. Addressing the issue of data equity with
nonprobability survey samples presents both opportunities and challenges.
Lohr’s question “when should one use nonprobability
samples” is a tough one. The same question can be asked for any other
statistical methods. We do not seem to always question the validity of the
methods and the usefulness of the results in many other scenarios due to our
unchecked confidence that the required assumptions seem to be reasonable. For
nonprobability samples, we have a more vulnerable situation regarding
assumptions, and assessments and diagnostics of these assumptions are more
difficult than cases with controlled experiments and/or more structured data.
From this view point, Lohr’s extended discussion on assessing assumptions
should be read with deep appreciation. In practice, an important confidence
booster on the assumptions is the thorough investigation at the “design stage”,
if such a stage can be conceived prior to data collection, on variables which
might be related to participation behaviour, and to include these variables as
part of the sample with further exploration of existing data sources containing
these variables.
Xiao-Li Meng
Dr. Meng’s discussion, with the formal title
“Miniaturizing data defect correlation: A versatile strategy for handling non-probability
samples”, should be a standalone discussion paper itself. Meng weaved through a
number of issues in estimating a finite population mean with a nonprobability
sample, and explored strategies and directions for constructing an
approximately unbiased estimator using the central concept of the so-called data
defect correlation (ddc). The discussions are fascinating and thought-provoking,
and will surely generate more discussions and research endeavours on
implications of the ddc. I would like to use this opportunity to comment
briefly on the ddc in relation to three basic concepts in probability
sampling: sampling strategy, undercoverage, and model-assisted
estimation. It is not a nostalgia for the good old days when probability
sampling was the golden standard but rather an appreciation of how research in
survey sampling has been evolving and the potential usefulness of the ddc
in dealing with nonprobability survey samples.
The term sampling strategy refers to the pair of
sampling design and estimation method (Thompson, 1997, Section 2.4; Rao,
2005, Section 3.1). The two components go hand in hand and are the
backbone of conventional probability survey sampling theory. For the estimation
of the population total of the study variable using a probabiliity sample with first order inclusion probabilities ,
the Horvitz-Thompson estimator with the weight is the unique unbiased estimator among a class
of linear estimators (Wu and Thompson, 2020). The theoretical argument for the
result is straightforward due to the known inclusion probabilities under the given sampling design. Using the
notation of Meng, the ddc involves three variables: the study variable the weight variable the sample inclusion indicator and is defined as the finite population
correlation coefficient between and The ddc implicitly puts and as an inseparable pair for any inference
strategy, with corresponding to the unknown “design” and for the “estimation method”. With the unknown
“design” characterized by the unknown “divine probabilities” for the nonprobability sample, Meng showed
through his equation (3.3) that is essentially a required condition for
unbiased estimation of if nothing is assumed on the outcome
regression model. The result provides a justification of the use of inverse probability
weighted (IPW) estimator for nonprobability samples as the only sensible choice
if a superpopulation model on the study variable is not involved.
The problem of undercoverage has been discussed
extensively in the existing literature on probability sampling. For
nonprobability samples the issue is closely related to the violation of the
positivity assumption A2 as
discussed in Section 7.2 of my paper and my comments to the discussions of
Bailey, Elliott and Lohr, with additional details given in Chen et al.
(2023). Let where is the uncovered subpopulation with Let where and are respectively the sizes of the two
subpopulations and Let and denote respectively the covariance with
respect to the discrete uniform distribution over and It can be shown that
where for is the set of units for the nonprobability
sample, and and are respectively the population means of and for the study variable Equation (2) has two immediate implications.
First, if the estimation method is valid in the sense that the value of is small, then the bias of the estimator due to undercoverage depends on (i.e., the size of the uncovered subpopulation
and (i.e., the difference between and a statement which has previously been
established under probability sampling. Second, the equation reveals a scenario
for potential counterbalancing: A biased estimator for the “sampled population mean” can be less biased for the target population
mean if and have the same plus or minus sign.
Meng’s discussions on quasi-randomization and/or
super-population using the ddc provided a much deeper understanding on
doubly robust estimation. Historically, model-assisted estimation
started to emerge in survey sampling in the early 1970s, and the approach has
the same spirit of double robustness. The generalized difference estimator of
the population mean as discussed in Cassel, Särndal and Wretman (1976)
is given by
where is a probability sample, the are the first order inclusion probabilities,
and is an arbitrary sequence of known numbers. The
estimator is exactly unbiased for under the probability sampling design for any sequence and is also model-unbiased if we choose Cassel et al. (1976) showed a main
theoretical result that the choice is optimal leading to minimum model-based
expectation of the design-based variance when the model has certain structure in
variance. The first part of the results on unbiasedness is under or the second part on optimality is under and Note that the estimator with the choice has exactly the same structure of the doubly
robust estimator discussed extensively in the missing data and causal inference
literature since the 1990s, with the “divine probabilities” being unknown and estimated in the latter
cases.
The use of ddc in practice requires additional
information from the population. Meng’s proposal of creating a representative
miniature out of a biased sample echoes the call for a validation sample with a
small size, since such a sample “can (also) eliminate many practitioners’s
anxiety and potential mistakes for not knowing how to properly use the
weights”.
“There is no such thing as probability sample in real
life” is probably a defendable statement for human populations. Probability
samples, however, do exist in other fields such as business and establishment
surveys, agricultural surveys, and natural resource inventory surveys; see Wu
and Thompson (2020) for further detail. For humans, any rigorous rules and
precise procedures are almost surely as aspiration, not prescription.
References
Basu, D. (1971). An essay
on the logical foundations of survey sampling. Part One. In Foundations of
Statistical Inference, (Eds., V.P. Godambe and D.A. Sprott),
Toronto, 203-242.
Cassel, C.M., Särndal, C.-E.
and Wretman, J.H. (1976). Some results on generalized difference estimation and
generalized regression estimation for finite populations. Biometrika, 63, 615-620.
Chen, Y., Li, P. and Wu,
C. (2023). Dealing with undercoverage for non-probability survey samples. Survey
Methodology, under review.
Cho, S., Kim, J.K. and
Qiu, Y. (2022). Multiple Bias Calibration
for Valid Statistical Inference with Selection Bias. Working paper.
Rao, J.N.K. (2021). On
making valid inferences by integrating data from surveys and other sources. Sankhy B, 83, 242-272.
Rivers, D. (2007).
Sampling for web surveys. In Proceedings of the Survey Research Methods
Section, Joint Statistical Meetings, American Statistical Association,
Alexandria, VA, 1-26.
Thompson, M.E. (1997). Theory
of Sample Surveys. Chapman & Hall, London.
Wu, C., and Thompson,
M.E. (2020). Sampling Theory and Practice. Springer, Cham.
Yuan, M., Li, P. and Wu,
C. (2022). Inference with Non-Ignorable Sample
Inclusion for Non-Probability Survey Samples. Working paper.
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.