Comments on “Statistical inference with non-probability survey samples” – Miniaturizing data defect correlation: A versatile strategy for handling non-probability samples
Section 4. Quasi-randomization or super-population implementations
In a nutshell, the quasi-randomization approach focuses
on making a constant variable (induced by FPI When our sample is genuinely selected by a
probabilistic scheme by design, then for is a design probability, free of but it can depend on for example when includes a stratifying variable. When the
design probability is unavailable, we first need to invoke a divine
probability. This could be a natural one given by the finite population, such
as the propensity induced by FPI, where or an imagined super-population one such as
the being generated independently from where This positivity assumption is necessary if the
finite population is pre-specified, or its imposition defines the finite
population that can be studied. (This is a practically rather relevant consideration,
such as in election polling, where the finite population may not be always
pre-specified even theoretically.) Since these divine probabilities are unknown
and serve as our estimand, we need to assume some device probabilities, such as
via a generalized linear model to proceed, even though we don’t really
believe in any particular choice of
For our current discussion, suppose our divine
probability is given by the super-population Bernoulli model. Let and where Because the here is controlled by a divine probability,
the sample size is no longer a design variable to be
conditioned upon in our replication scheme; it is generally no longer an
ancillary statistic. Nevertheless, we should condition on a universal requirement for constructing
data-driven estimates for Fortunately this conditioning does not create
mathematical complications to the simplicity granted by the independence among as functions of This is because but the normalizing constant ‒ which
depends on the entire ‒ is not relevant for the developments in this
article, such as assigning weights that are proportional to
Consequently, under this divine probability, which
corresponds to (the true model for) the -model setting in Wu (2022), we have for any
chosen by (3.1)
where is with respect to the (unknown) divine
probability over (for fixed It follows then that, regardless of whether we
want to ensure zero expectation in (3.2) or in (4.1), we will impose that is, the well-known inverse probability weighting.
Therefore, if our postulated model permits us to reliably capture in reality, then because it has mean zero (with respect to the
divine probability), and it is a weighted average of essentially independent Bernoulli variables,
as seen in (3.1).
This is a randomization oriented approach because it
treats the entire finite population attribute values as fixed, and the hypothetical replications
are generated only by repeated realizations of the recording indicator Of course, in general, the values of are unknown, and worse they are inestimable
from a non-probability sample without further assumptions. To proceed, we pose
assumptions such as missing at random, i.e., and the requirement of an auxiliary sample so
that we have some values of with We also have choices on how to estimate the
inclusion propensity parametrically or non-parametrically. These
assumptions, requirements, and estimation methods are all essential for
practical implementation, as carefully reviewed and discussed by Wu (2022);
also see Tan (2010) for a detailed comparison of various estimation strategies.
Nevertheless, the overarching idea of quasi-randomization methods is to choose to free from in expectation over the posited hypothetical
replications, to regain the freedom guaranteed by probability sampling.
Complementarily, the super-population approaches aim to
miniaturize via making the other variable in that is, free of in expectation, but over a different
hypothetical replication scheme. Here the idea is to choose an that is a good approximation to such that the residual will be zero in expectation conditioning on Typically, this is done by considering a joint
model for given and with a specific regression model using the notation in Wu (2022). It is
important to recognize that, although we only specify the regression model given we must include in the replications in order to capture the
possible dependence of on the entire which is the key concern for non-probability
samples. Indeed, it is this joint specification that permits the adoption of
the missing at random assumption to reduce which in turn permits us to focus on
specifying a single regression model for both observed and unobserved individuals.
Therefore, when we write we mean the expectation with respect to
where is left unspecified, unlike with the
quasi-randomization approach.
It follows then that, conditioning on and which does not alter because and are independent given we have
Clearly, (4.3) becomes zero when we choose and that the model is (first-order) correctly specified,
that is, This summarizes the super-population approach,
and it renders for similar reasons as given for the
quasi-randomization framework.
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa