5. Robust estimation of domain totals
Cyril Favre Martinoz, David Haziza and Jean-François Beaumont
Previous | Next
In practice, we
usually want to produce estimates for population domains as well as an estimate
at the global level. Let
be the total of the
variable in domain
We assume that the domains form a
partition of the population such that
where
is the number of domains. Let
be the set of sampled units in
domain
The expansion estimator of
is given by
We have the consistency relation
In the presence of
influential values, we can apply a robust procedure separately for each domain
using the method described in Section 3, which leads to
robust estimators,
A robust estimator of the total
at the population level,
is easily obtained by aggregating
the robust estimators
Thus, we have
The consistency relation between
the domain-level estimates and the population-level estimate is therefore
satisfied. However, aggregating
robust estimators, each suffering
from a potential bias, may produce a highly biased aggregate robust estimator,
In most cases, the bias of
will be negative, since each of
the
estimators has a negative bias.
To avoid having an
estimator with an unacceptable bias, we first compute the robust estimator
(4.8),
for each domain. Then, we
independently compute a robust estimator of the total
in the population,
given by (4.8). In this case,
however, the consistency relation is no longer necessarily satisfied. In other
words, we have
in general. It is therefore
necessary to force consistency between the robust domain estimates and the aggregate
robust estimate using a method similar to calibration. To do so, we compute final
robust estimates
that are as close as possible to
the initial robust estimates
based on a particular distance function,
and that satisfy the calibration equation
In the case of the
generalized chi-square distance function, we are seeking final robust
estimates,
such that
is minimized
subject to (5.1). The coefficient
in the above expression is a
weight assigned to the initial estimate in domain
and is interpreted as its
importance in the minimization problem. Using the Lagrange multipliers method,
we can easily obtain a solution to this minimization problem. The solution is
given by
where
and
for
We make the
following remarks: (i) If
then the final robust estimate
is identical to the initial
robust estimate
Thus, if we want to ensure that
the initial estimate in domain
is not modified excessively, we
simply associate it with a small value of
This point is also illustrated
empirically in Section 6.2. (ii) Note that like the initial robust estimates at
the domain level,
for
the initial robust estimate at
the population level,
can also be modified. (iii) If
(in other words, the initial
robust estimate for the population level is not modified) and
for
where
is a strictly positive constant,
expression (5.3) simplifies to
In this case,
the initial estimates
are all modified by the same
factor,
(iv) How can we set the values of
in practice? It seems natural to
adopt the following choice:
where
is the estimated coefficient of
variation (CV) associated with domain
For example, in a repeated
survey, the estimated CV observed in a previous iteration can be used. This
choice of
is based on the fact that we will
not want to make a large change in the initial estimate associated with a
domain that has a small estimated CV. In such a domain, the problem of influential
values is clearly less serious, and the initial robust estimate
is expected to be relatively
close to the actual total
In other words, the robust
estimator
should have low bias and be
relatively stable. It therefore makes sense not to attempt to change the
initial robust estimate substantially. (v) In (5.2), we used the generalized
chi-square distance, which leads to the linear method. In the literature on
calibration (e.g., Deville and Särndal 1992), there are a number of other
calibration methods. In particular, there is the Kullback-Leibler distance,
which leads to the exponential method and the logit and truncated linear
methods. Using the last two methods, we can specify positive bounds
and
such that
In other words, we ensure that
the ratio
falls within the interval between
and
Note that the calibration
procedure may lead to
for a certain
which is counterintuitive. In
this case, we simply include the constraint
for
in the calibration procedure.
(vi) An alternative is to express
as a weighted sum of the initial
values using modified weights:
where
and
is given by either (4.3) or
(4.6). We can also write the estimator
as a weighted sum with the
initial weights using modified values:
where
and
is given by either (4.1) or
(4.4). (vii) We may want to find the winsorization thresholds
such that the standard winsorized
estimator or the Dalén-Tambay winsorized estimator is equal to
We can follow a procedure similar
to the one in Section 4, and we can use an algorithm similar to the one in the
Appendix. A necessary condition for the existence of a solution is that
(viii) With the proposed
calibration procedure, more than one partition of the population can be dealt
with jointly. For example, we may be interested in publishing both provincial
estimates and industry estimates. If so, we simply insert the following
calibration equations into the calibration procedure:
where
and
denote the number of provinces
and the number of industries respectively. The method can also be applied to more
than two partitions of the population.
Previous | Next