Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible
Section 3. Benchmarked estimators
We now proceed to develop benchmarked estimators of the
small area means
using unit level model (2.2) or augmented
versions of it. We assume that a reliable direct estimator
of the population total
is available, where
and
is the total of small area
Let
be the model-based small area estimator of
It is desirable to ensure that the aggregated
values of
agree with the reliable estimator
The small area means estimators
are said to be benchmarked to
if
Let
be a GREG estimator with weights calibrated at
the population level on a vector of auxiliary variables
This estimator is analogous to the combined
regression estimator if one views the small areas as strata. The vector of
auxiliary variables
may or may not be the same as
We distinguish two cases in this context:
and
The first case,
implies that all the components of
also belong to
and that
may or may not have additional components that
are different from those contained in
The second case,
implies that some of the components of
do not appear in
We assume that the first component of both
vectors
and
are equal to one, as they represent an
intercept term.
For a given sample
auxiliary data
and basic design weights
the GREG estimator of the population total
is given by
where the
GREG weights
are
given by
In equation (3.2),
where
represents the known small area total, whereas
and
represent respectively the direct design-based
Horvitz-Thompson estimators of
and
Note that
Using the
GREG weights
estimators of
and
are
given by
The
small area estimates
and
given respectively by (2.11) and (2.13), do
not satisfy the benchmarking equation (3.1) for
that is the total estimates
and
do not match the GREG estimator
We need to adjust
and
so that the sum of these modified small area
estimators add up to
when they are summed over all the
small areas.
A very simple modification to the
and
is called ratio benchmarking. It consists of
multiplying each
and
by the common adjustment factors
and
respectively, leading to the ratio benchmarked
estimators
It readily
follows that both
and
satisfy
equation (3.1) with
In
equation (3.5) and hereafter the subscript
denotes
that the estimators are benchmarked to
Note that the
and
in equation (3.5) are multiplied by the same
factor regardless of their precision and ignoring the particular small area
characteristics, such as the variability of the units within a small area, or
the small area sample size. Consequently, the resulting benchmarked estimators,
and
based on this simple procedure, are just
proportional modifications of estimators
and
respectively, to obtain the desired
concordance. This limitation can be avoided by using the small area model (2.2)
to construct the benchmarked estimators.
We now proceed to show how model (2.2) can be used to
obtain estimators benchmarked to
In Sections 3.1 and 3.2 we adapt the
procedures in Stefan and Hidiroglou (2020) for obtaining benchmarked estimators
to the case of non‑negligible sampling rates. In Sections 3.3 and 3.4 we
introduce two restricted benchmarked estimators based on the procedure proposed
by Ugarte et al. (2009). The benchmarked estimators of Sections 3.1
and 3.2 rely on the assumption that
whereas the estimators of Sections 3.3
and 3.4 can be computed for any vector
or
3.1 Augmented EBLUP
benchmarked estimators
The GREG weights
should be used in the estimation to achieve
benchmarking to
A possible way that
can be incorporated in the estimation is by
augmenting the small area model (2.2) with a suitable auxiliary variable that
is a function of
This procedure is based on the augmented model
approach used by Wang et al. (2008), whereby estimates obtained using the
FH area-level model could be forced to add up to specified totals. Stefan and
Hidiroglou (2020) adapted the Wang et al. (2008) approach under the basic
unit-level model and for negligible sampling rates. They showed that
benchmarking to
could be obtained by augmenting model (2.2)
with the GREG weights
We extend Stefan and Hidiroglou (2020) to the
case when the sampling rates are non‑negligible. For this case, benchmarking to
is achieved by augmenting model (2.2) with
This leads to the augmented model given by
The random
effects
are
assumed to be i.i.d.
and
independent of the unit errors
and the
are
assumed to be i.i.d.
The
EBLUP estimators of
and
in (3.6)
are respectively denoted by
and
We can
now spell Result 1 for
and
Result 1. The EBLUP estimators
and
based on
model (3.6) obey the following equation
where
Proof: See Appendix A.
It follows
from equation (3.7) that small area estimators benchmarked to
are
given by
The subscript
indicates that
is based
on an augmented small area model.
3.2 You-Rao benchmarked
estimators
The procedure proposed by You and Rao (2002) can be used
with any survey weights
However, there is no guarantee that the
resulting YR estimator will be benchmarked to
When the sampling rates are negligible, Stefan
and Hidiroglou (2020) obtained benchmarked estimators with the You and Rao’s
(2002) procedure based on the weights
of the GREG estimator. When the sampling rates
are non‑negligible, we now show that the weights
lead to YR benchmarked estimators.
Let
and
be YR estimators of
and
respectively with
replaced by
Using
and the
estimates
for
a YR estimator, denoted as
can be computed with equation (2.13). However,
is not benchmarked to
even if it uses the weights
The original YR procedure leads to a
self-benchmarked estimator in a limited number of cases.
To achieve the benchmark to
a YR modified estimator, denoted as
is defined as follows:
The following
proves that
defined
by (3.9) benchmarks to
Result 2. Let
and
be
respectively the YR estimators of
and
constructed with weights
Then,
satisfy
the following equation:
Proof: See Appendix A.
Given
the weights
are calibrated on
at the small area level if they satisfy the
following equations
Equations (3.10)
implies equation (3.3), however, the reverse is not true. If the weights
satisfy
(3.10), and since
it
follows that the weights
are also
calibrated on
at the
small area level. In turn, this implies that
as we
assume that vector
contains
the constant regressor equal to 1. It follows that
Thus,
the YR estimator
constructed with
is
self-benchmarked to
in the
special case when the GREG weights are calibrated at the small area level (see
You and Rao, 2002).
3.3 Restricted EBLUP
benchmarked estimator
In Section 2 we showed that the EBLUP estimators of
can be obtained if the function
defined in (2.5) is minimized with respect to
It therefore follows that an EBLUP estimator
can be viewed as the solution to an unrestricted minimization problem. The idea
of restricted EBLUP estimators is to obtain new estimators of
by minimizing
subject to the restriction given by the
benchmark condition. The procedure was used by Pfeffermann and Barnard (1991)
under the FH area-level model. More recently, Ugarte et al. (2009) applied
the procedure under the BHF unit-level model to obtain benchmarking to a
synthetic estimator. Ugarte et al. (2009) described the restricted
estimator as a generalized least squares estimator subject to a restriction by
noticing that the minimization can be conducted as in the econometrics theory
of regression estimation under linear constraints. We now describe the procedure
in Ugarte et al. (2009).
We denote by
and
the new restricted EBLUP estimators of
Then, the restricted EBLUP estimator of
denoted as
is given by equation (2.4), where
are replaced by
for
We impose that the estimators
be benchmarked to
that is they satisfy equation (3.1) with
After carrying out some algebra, it can be
shown that the benchmark to
of estimators
is equivalent to the following linear
constraint equation
where
is the
total of non-observed
values
with
and
is an
estimator of
based on
The
restricted EBLUP estimators
are
therefore obtained as the solution to the minimization of function
given by
(2.5) subject to the linear constraint (3.11).
The Lagrange multiplier method can be used to solve the
constrained minimization of
After straightforward algebra, it can be shown
that estimators
are given by
where
are the
(unconstrained) EBLUP estimators of
is the
empirical version of matrix
defined
in (2.7), and
Then,
using
in (2.4),
the estimator
can be
rewritten as
Remark 2. The matrix
does not
exist for samples when
In such
cases, we noticed that equation (2.8) cannot be used to compute the
unconstrained estimators
However
can
still be computed when
because
the alternative equation (2.9) can be used for
. Equation (3.12) clearly shows that the
constrained
cannot
be computed for samples when estimator
is
truncated to zero, and no alternative equation exists in these cases.
It, therefore, follows that the methods of estimation
for the variance components commonly used in SAE cannot be used to compute the restricted
EBLUP estimator. In Section 3.4 and Appendix B we describe an
alternative method that produces a strictly positive estimation of
that can be applied in conjunction with
such that a restricted benchmarked estimator
of
always exists.
3.4 Restricted
You-Rao benchmarked estimator
We showed in Section 2.2 that YR estimators of
and
can be obtained as a solution to mixed model
equations obtained by minimizing the sample weighted function
given by (2.14). That is, we showed that, by
defining a function
with weights
and
and then minimizing
we obtain the same estimators as those given
by the You and Rao’s (2002) procedure. We now minimize function
under the benchmark constraint given by (3.11).
The result is a restricted YR estimator that is benchmarked to
Minimization of
given the benchmark restriction (3.11) results
in estimators of
that are guaranteed to be benchmarked for any
weights that define the function
Thus, one may choose any set of weights
in
In a limited design-based simulation study, we
compared three restricted YR estimators based on three options with respect to
i.
ii.
and iii.
We found no significant difference between these three estimators in terms of
design mean squared error. Given this last point and that the unrestricted
benchmarked YR estimators described in Section 3.2 were based on
we chose to define the restricted YR estimator
based on these weights.
Let
be defined in terms of
and
Minimization of
with respect to
subject to the benchmark constraint (3.11)
results in the restricted YR estimators of
denoted as
They are given by:
where
estimators
are
given by (2.15), and
is the
empirical version of
given by
(2.16). Using
and
of
restricted YR estimates
of
unobserved
for
are then
used to compute a benchmarked restricted YR estimator:
As
in the case of the restricted EBLUP estimator, the estimators
given by (3.14) do not exist if FC, ML or REML
results in a truncated estimate for
Consequently,
can only be estimated by
with a method of estimation for the variance
components that always leads to strictly positive estimates for
A
null estimate of
poses no problem in computing EBLUP and YR
estimators. However, we noticed that the restricted EBLUP and the restricted YR
estimators cannot be computed if
In order to get around this problem, we use a
method proposed by Moghtased-Azar, Tehranchi and Amiri-Simkooei (2014) that
guarantees that the estimator of
will be strictly positive. This method is
based on the concept of a re-parameterized restricted maximum likelihood
estimation (reREML). Their idea is to use functions whose range is the set of
all positive real numbers, namely positive-valued functions (PVFs), for unknown
variance components in the stochastic model instead of using variance
components themselves. Their numerical results showed the successful estimation
of non-negativity estimation of variance components (as positive values) as
well as covariance components (as negative or positive values).
We
used a Fisher-scoring algorithm to obtain iteratively the reREML estimates of
the variance components of the basic unit-level model given by (2.2) (see
Appendix B for details). We also carried out a small simulation and found
out that for area sample sizes equal to or larger than 3, the Fisher-scoring
algorithm converged in less than 15 iterations. When we only considered the
samples that produced a null estimate
we observed that the algorithm converged even
faster (see Figure 4.1 in Section 4).