Comments on “Statistical inference with non-probability survey samples”
Section 4. Unverifiable assumptions: Recent developments in sensitivity analysis
Wu
provides four key assumptions required to correct for selection bias in
non-probability surveys using data from probability surveys: they can be
roughly summarized as “selection at random” or SAR (covariates in the
non-probability sample explain the probability of selection in the
non-probability sample); “positivity” (all elements in the population have a
non-zero probability of selection into the non-probability sample);
“independence” (elements are selected independently into the non-probability
sample); and “common covariates” (there exists a probability survey with
covariates whose subset matched the covariates required for the MAR assumption
to hold). It might be worth noting that the first two assumptions basically
require the non-probability survey to be a probability survey “in disguise” that is, there really are non-zero
probabilities of selection into the non-probability survey for all elements in
the population, but we as analysts just do not know what they are.
In
practice neither of these assumptions probably hold precisely. Some recent work
has focused on the failure of the first, the SAR assumption. Some existing
measures borrowed from the non-response literature have been repurposed here:
for example, the R-indicator measure (Schouten, Cobben and Bethlehem, 2009),
which in this context is the measure of the variability in the probabilities of
selection in the non-probability sample:
can range between 0 and 1, where 1 is achieved
when probabilities of selection are constant suggesting something akin to a simple random
sample, with less chance for selection bias and 0 suggesting all elements are either included
with probability 1 or 0, maximizing the risk of selection bias.
Of
course, in the absence of the outcome in the probability sample, there is no way to
directly assess selection bias. Hence recent work has extended Andridge and
Little (2011), which develops a sensitivity analysis using a pattern-mixture
model, wherein selection into non-probability sample is allowed to depend
entirely on a scalar reduction to the covariates entirely on the outcome or some convex combination thereof. Little,
West, Boonstra and Hu (2020), Andridge, West, Little, Boonstra and
Alvarado-Leiton (2019), and West, Little, Andridge, Boonstra, Ware, Pandit and
Alvarado-Leiton (2021) consider sensitivity to this assumption in the
estimation of the mean of a normally distributed variable, the mean of a binary
outcome, and the regression parameters in a linear regression model,
respectively, in non-probability samples. By varying the convex mixing
parameter sensitivity to the SAR assumption can be
assessed. Boonstra, Little, West, Andridge and Alvarado-Leiton (2021) finds
that these “standard measures of bias” (SMB) compare favorably with
alternatives such as in a simulation study. An important point to
note is that the methods that extend Andridge and Little (2011) do not depend
on assumption of common covariates in a probability sample. This suggests that
methods that use information available in the probability sample to assess SAR
are an open area for development.
The
second assumption positivity is also unlikely to exist precisely in many
practical settings. My own work in this area has focused on naturalistic
driving studies, which typically involve convenience samples in a limited
geographical area: for example, the Second Strategic Highways Research Program
(SHRP2) recruited drivers in six specific geographic regions across the United
States (Transportation Research Board (TRB) of the National Academy of
Sciences, 2013). This corresponds to the second scenario given by Wu in Section 7.2,
where only a subpopulation has any chance of being selected into the
non-probability sample, which as he notes has “no simple fix”. Following his
notation of providing an indicator of membership in the
subpopulation, it would seem that if that is, if the distribution of is the same for and after weighting for within the stratum then lack of positivity would have no impact
on inference. This is likely a tall order in the most general settings but
might be reasonably well approximated if the analysis of interest involves a
subset of that is only weakly associated with even before adjustment.
Finally,
regarding the fourth assumption existence of a probability sample with
available I very much second Wu’s observation that
methods to take advantage of multiple probability surveys need more
development. However, it remains more likely that a researcher will struggle to
find a single probability sample with sufficient covariates than struggle with
a surfeit of options (Wu’s “rich person’s problem”). To this end I will
conclude with a call to action by the survey community.
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa