“Optimal” calibration weights under unit nonresponse in survey sampling
Section 1. Introduction
In a survey the response (nonresponse) mechanism for units is in reality unknown. To avoid defining a proper probability measure which might not be meaningful or realistic, one usually discusses the nonresponse situation in terms of a propensity for a unit to participate. To be able to take into account the possible nonresponse effect on estimators, it is however the practice to treat the propensities as probabilities to be estimated (e.g., propensity scores). This can be done for individual units, for groups of units or as an “average” over the whole response set.
For example, in Haziza and Lesage (2016) two main approaches are discussed: calibration weighting with and without foregoing propensity score weighting, the former case involving model-based estimation. The authors warn against potential negative effects on the bias and variance for the resulting estimators when not taking into account the propensities. (These two options of weighting are referred to by the authors as two-step and one-step procedures, respectively not to be mistaken for the two- and single-step calibrations as defined by Särndal and Lundström (2005).) However, in the simulation study by Haziza and Lesage (2016) the sampling design plays no role, since there and the focus is solely on how the auxiliary information relates to the study variable and the nonresponse mechanism.
In this paper we propose to use a nonresponse version of what in the full response case is called the (design-based) optimal regression estimator. The underlying distance measure is a quadratic form with a more complex structure (see Andersson and Thorburn (2015)) than the one leading to the GREG estimator (see Deville and Särndal (1992)). As it turns out there is also room for refinement in terms of the average response propensity (probability) when constructing the distance measure under nonresponse, which leads to a modified “optimal” estimator.
1.1 Outline of the paper
Section 2 starts with an introduction to the calibration idea under full response before dealing with the nonresponse situation. Three estimators of a population total are mainly considered: the GREG related estimator and two versions of the “optimal” estimator. Some theoretical results for the resulting bias follows. Section 3 contains a simulation study where simple random sampling and Poisson sampling are used for illustration. The Poisson design enables us to construct and investigate a situation where the auxiliary information is involved in the design as well as in the nonresponse mechanism. We also illustrate the risks of using an incorrect model when estimating individual propensities. We end with concluding remarks in Section 4.
1.2 Notation and setup
We will start with a population of size from which we take a probability sample of size with inclusion probabilities Nonresponse means that we only observe the response set of size Our aim is to estimate the study variable total We assume access to an auxiliary variable vector of dimension where either and are known (the population level) or and are known (the sample level) or possibly a mixture of these cases:
- Date modified: