Statistical inference with non-probability survey samples
Section 2. Assumptions and inferential frameworks
Suppose that the target population
consists of
labelled units. Associated with unit
are values
and
for the auxiliary variables
and the study variable
The discussions focus on a single
but the dataset most likely contains multiple
study variables. Let
be the population mean which is the parameter
of interest. Let
be the dataset for the non-probability survey
sample
with
participating units. For most practical
scenarios, the simple sample mean
is a biased estimator of
and hence is invalid.
2.1 Assumptions
Let
be the
indicator variable for unit
being included
in the non-probability sample
. Note that the variable
is defined for
all
in the target
population. Let
We call the
the propensity
scores, a term borrowed from the missing data literature (Rosenbaum and Rubin,
1983). Some authors use the term participation probabilities; see, for
instance, Beaumont (2020) and Rao (2021), among others. The propensity scores
characterize
the sample inclusion and participation mechanisms. They are unknown and require
suitable model assumptions for the development of valid estimation methods. The
following three basic assumptions were used by Chen, Li, and Wu (2020), which
were adapted from the missing data literature.
A1 The
sample inclusion and participation indicator
and the study
variable
are independent
given the set of covariates
i.e.,
A2 All the units
in the target population have non-zero propensity scores, i.e.,
A3 The indicator
variables
are independent
given the set of auxiliary variables
Assumption A1 is similar to the missing at random
(MAR) assumption for missing data analysis. Under A1, we have
Assumption A2 can be problematic in practice; see
Section 7 for further discussions. Assumption A3 typical holds when participants are approached one at a time
but can be questionable when clustered selections are used. It is shown in
Section 4 that estimation of
under
assumption A1 requires auxiliary
information from the target population. The ideal scenario is that the complete
auxiliary information
is available.
The more practical scenario is that auxiliary information can be obtained from
an existing probability survey.
A4 There exists a
probability survey sample
of size
with
information on the auxiliary variables
(but not on
available in
the dataset
where
are the design
weights for the probability sample
The
is called the reference
probability survey sample. The most crucial part of assumption A4 is that the set of auxiliary
variables
is observed in
both the non-probability sample
and the
probability sample
A reference
probability survey sample is often available in practice but the common set of
auxiliary variables may not contain all the components to satisfy assumption A1.
2.2 Inferential
frameworks
There are three possible
sources of variation under the general setting of two samples
and
(i) The model
for the
propensity scores on the sample inclusion and participation in the
non-probability survey sample
(ii) The model
for the outcome
regression
or imputation;
and (iii) The probability sampling design
for the
reference probability survey sample
For the three
approaches to inference to be discussed in Sections 3 and 4, the reference
probability sample
is always
involved. Each of the three approaches requires a joint randomization framework
involving
and one of
- Model-based
prediction approach: The
framework under
the joint randomization of the outcome regression model
and the
probability sampling design
- Inverse
probability weighting using estimated propensity scores: The
framework under
the joint randomization of the propensity score model
and the
probability sampling design
- Doubly
robust inference: The
framework or
the
framework, with
no specification of which one.
The inferential framework
is the foundation for theoretical development. Consistency of point estimators
needs to be established under the suitable joint randomization. Theoretical variances
typically involve two components, one from each source of variation, and
correct derivations of the two components are the key to the construction of
consistent variance estimators under the designated inferential framework.
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa