4. Two-step calibration weighting
Phillip S. Kott and Dan Liao
Previous | Next
4.1 Calibration weighting in two steps
In practice, the
components of
are often 0/1 group-membership
identifiers, and the groups are mutually exclusive and exhaustive. In that
situation,
can only take on
values. Almost any weight-adjustment function,
will yield equivalent results. An
example is the linear function,
of Lundström and Särndal (1999).
One popular
weight-adjustment function that sometimes cannot be used (note the italicized “almost” in the previous paragraph) is
which assumes response is a
logistic function of
The problem is that this
weight-adjustment function cannot return values less than unity. We noted in
the previous section, that sometimes one may need
to be less than 1. A routine that
tries to use
and fit the calibration equations
will fail.
This can be a
particular problem when assuming a logistic response model and trying to
calibrate to the population in a single step. There may be a component of
say
that is always nonnegative, but
the original sample and response set are such that
even though
cannot exceed
Thus, calibrating to the
population will always fail because no
can be less than 1.
Calibrating to the
original sample, by contrast, need not fail, since
This suggest that one calibrates first to the original sample, which
removes the response bias if the assumed response model holds, and then to the
population, which removes the response bias if the prediction model holds. Estevao
and Särndal (2002) discuss a variety of ways to calibrate in steps, but we
focus on a single method here.
A second advantage
of calibration weighting in two steps can be realized even when the calibration
variables used in both steps are the same or a subset of those used in the
single step. This happens when the response model holds, and the linear
prediction model is only roughly true. Some version or “optimal” estimation can
then be used in the second calibration-weighting step to increase efficiency.
Rao (1994) introduced the notion of the optimal regression estimator. It was
put into calibration-weighting form and discussed further in Bankier (2002) and
Kott (2009, Section 4.2). Detail and how this can be done are provided in
Sections 4.2 and 5.
4.2 Estimation and variance estimation when calibrating in
two steps
In this
subsection, we start with a fairly general two-step calibration estimator for a
total and then address estimating its variance. The first calibration-weighting
step, which is to the original sample, employs
as the vector of response-model
variables and
as the calibration vector. Each
has
components. The weight-adjustment
function has the form described in equation (2.4) with
now replacing
The calibration equation is
The second
calibration-weighting step, which is to the population, employs
and
each with
components. The nonresponse bias
under the response model is removed in the first step. For the
weight-adjustment function for the second step, we propose using
where
may be set almost at whim (but see below). The
right-hand side of equation (4.1) can vary across the
(and so
can depend on
and
yet
making
it asymptotically indistinguishable from the linear function:
For
simplicity, we will call
and
and
respectively.
From a quasi-sampling-design viewpoint, both are asymptotically identical to
unity. The second calibration equation is
Because this equation must hold, there are
limits on the available choices for
and
in equation (4.1).
A good
simultaneous variance estimator for
is (as we shall see)
where
and
Let
now be the vector composed of the
non-duplicated components of
and
and define
analogously. Sufficient
conditions for (4.2) to be a simultaneous variance estimator include the
corresponding components of equation (4.1) depending on whether either the
response model in equation (2.4) holds with
replacing
or the prediction model is
whether or not
is sampled or responds if sampled, and the
are uncorrelated random variables with
variances equal to
where
need not be specified other than having finite
components. Now, both
and
are assumed to be of full rank and bounded as
the sample size grows arbitrarily large.
The variance
estimator in equation (4.2) is almost the same as the estimator in (3.1):
has been replaced with
and
with
while
substitutes for
(we will get to a small
difference shortly). Observe that
is effectively an expression of
the “residual” from the second calibration-weighting step. This residual is
multiplied by the weight-adjustment factor
which is asymptotically unity
from the quasi-sampling-design-based perspective and a constant from the
prediction-model viewpoint. The product is then used to create the first-step
“regression-coefficient”
in equation (4.4) and its
accompanying “residual”
in equation (4.5). We do the
second step regression first because
It is for
estimating the prediction model of
as an estimator of
that the last appearance of
on the right-hand side of
equation (4.2) is not squared, as it would be if
substituted for
everywhere. From a quasi-design
viewpoint,
is asymptotically identical to
unity, so whether or not it is squared makes no asymptotic difference.
Observe that the
have been inserted in equation (4.3)
for the same reason as
was inserted into
in equation (3.1). Since the
are asymptotically unity,
however, they are not really needed (and serve no function whatever from a
prediction-model viewpoint). A similar argument applies to the
in equation (4.4): they are
asymptotically unity from the quasi-sampling-design viewpoint (and part of an
estimate of 0 from a prediction-model viewpoint).
Previous | Next