Comments on “Statistical inference with non-probability survey samples” – Miniaturizing data defect correlation: A versatile strategy for handling non-probability samples
Section 5. Quasi-randomization and super-population implementations
Once a joint model for is set up, of course we can use it for
estimating both and the regression function each of which is made possible by the
availability of the auxiliary probability sample, and the assumption of missing
at random. But as shown before, correctly specifying and estimating one of them
is sufficient for miniaturizing However, from (4.3), in order for the
covariance/correlation to be zero, neither multiplicative correction to via nor the additive adjustment for via need to be correct. All we need is that, after
the correction or adjustment, what is left would be uncorrelated with each
other. The aforementioned framework of Collaborative TMLE was built essentially
on this insight (e.g., see Section 3.1 of van der Laan and
Gruber, 2009), though the heavy mathematical treatments in its literature might
have discouraged readers to seek such intuitive understanding.
To provide a simple illustration, consider a finite
population that is an i.i.d. sample from a super-population model:
The non-probability sample is generated by a mechanism such that that is, it is determined by the magnitude of only. Suppose we mis-specify the function form
for (e.g., the divine model may not be monotone in
but the device model such as the conventional
logistic link is), as well the regression model by choosing Since is uncorrelated with or under we know that our least-square estimator for would still be valid for even under the mis-specified regression model.
This turns out to be sufficient to ensure the asymptotic unbiasedness (as of the following “doubly robust” estimator for
the finite-population mean,
where indicates the auxiliary sample (of only). Or equivalently,
which makes it clearer that any bias in is controlled by the covariance (or
correlation) involving since the covariance involving is already miniaturized by the assumption that
the auxiliary sample is probabilistic (which, for simplicity, is assumed to be
a simple random sample).
Here is any weight function such that where the expectation is with respect to and with being the least-square estimator for from the biased sample, and and can be chosen arbitrarily. Because the
finite-population covariance/correlation between and is for and the misfitted parts for or do not contribute to the ddc
(asymptotically) since they are uncorrelated with each other under the
super-population model, leading to further robustness going beyond “double
robustness”. This of course does not mean that we can misfit a model
arbitrarily and still obtain valid estimators, but it does imply that having at
least one model being correct is a sufficient, but not necessary, condition for
the validity of the doubly robust estimators.
It is also worth stressing that, in formatting the
regression model, we do not necessarily need to invoke a device probability,
e.g., a super-population regression model, because the FPI variable provides a
finite-population regression via applying the least-squares method to regress on This regression fitting itself says little
about whether the resulting regression line is a good fit to or not. However, the example above indicates
that, for the purpose of estimating the population average of the lack of fit may not matter that much, as
long as the “residual” has little correlation with as two functions of the FPI variable Indeed, as discussed in Section 3, we can
consider including in the regression model How effective this strategy is in general is a
topic of further research.
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa