Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible
Section 2. EBLUP and pseudo-EBLUP estimation
Consider the one-fold nested error regression model
where
is the
variable of interest for the
population unit in the
small
area,
is a
vector of auxiliary variables with
is a
vector
of regression parameters and
is the
number of population units in the
small area,
. The random small area effects
are
assumed to be i.i.d.
and
independent of the unit errors
which
are assumed i.i.d.
We draw
samples
of size
independently within each small area
according to a specified sampling design with
first-order inclusion probabilities denoted by
for
The
total sample size is
where
The
resulting basic design weights are given by
We
assume that the sample design is ignorable, and that selection bias is absent.
This implies that model (2.1) also holds for the sample data:
Model (2.2) is a special case of the general linear
mixed model. Defining
and
it follows that model (2.2) can be expressed
in a matrix form by stacking the observations. The resulting equation is
where
and
with
a vector
of dimension
composed
of ones. We denote by
and
the
variance matrices of the random vectors
and
respectively. Then
and
It
follows that the variance matrix of vector
denoted
as
is given
by
The parameters of interest are the small area means
where
If
is large, the sampling fraction
of the
small area is negligible. This set-up
corresponds to the case of an infinite
population or negligible sampling rates. It follows that the small area
means
can be approximated by
(see Rao and Molina, 2015, page 174),
where
and
is the vector of population means of the
for the
area. An estimator of
is given by
(Rao and Molina, 2015, page 175), where
and
are estimators of
and
respectively. If
is not large enough or if the sampling rates
are not negligible, parameters
cannot be approximated by linear combinations
of
and
This corresponds to the case of a finite population. Let
be the set of the
unobserved
-values in small area
If we assume that we know the
for each individual in the population, an
estimator
of
is based on the observed values
and predicted values
for
That is, estimator
is given by
Much of the
SAE theory deals with the infinite population case, whereas the literature on
the finite population case is more limited. In this paper we focus on finite
population (or non‑negligible sampling rates) case, thereby constructing
estimators based on (2.4).
2.1 EBLUP estimation
We denote by
and
the BLUP predictors of
and
respectively. These estimators are given by
and
Under the normality assumption of
and
it can be shown that
and
can be obtained by maximizing the joint
density of
and
with respect to
and
This is equivalent to minimizing the function
This leads to
the following mixed model equations
where
(see Rao and
Molina, 2015, page 99 for details). The variance components
in
equations (2.6) and (2.7) are generally unknown. Three methods of estimation,
FC, ML and REML, are commonly used in SAE to estimate the variance components
A well-known
difficulty with these methods is that the estimate of
can take
on negative values. This estimate is truncated to zero when this occurs, that
is
is set
to 0. Empirical versions of
and
denoted
as
and
are
obtained if the unknown variance components
are
replaced by estimators
It
follows from equation (2.6) that EBLUP estimators of model parameters
denoted
as
and
are
given by
Using (2.8),
it can be proved that
and
are
where
and
Remark 1. It is easier to invert matrices
and
than
Consequently, it is simpler to use the mixed
model equations (2.8) than equations (2.9) for computing
and
However,
when
is equal
to zero, equations (2.8) cannot be used because the
term in
matrix
does not
exist. In such cases,
and
can only
be computed using (2.9).
Under model (2.2), it can be shown that
and
in
satisfy
Estimators
and
are used to compute EBLUP predictions
for the
unobserved units in small area
:
for
An EBLUP estimator of
denoted as
is obtained by replacing in (2.4)
by
It follows that
is
where
represents the sum of non sampled values
2.2 You-Rao
estimation
You and Rao (2002) proposed a pseudo-EBLUP small area
mean estimator (YR estimator) that incorporates the design weights
into the formula of the EBLUP estimator. A
property of the pseudo-EBLUP estimator is that the design consistency is
preserved as the area sample size increases. Furthermore, the YR predictor
offers protection against model failure or an informative sampling design (see among
others Hidiroglou and Estevao, 2016 and Verret, Rao and Hidiroglou, 2015 for
details). Pseudo EBLUP estimators can be constructed using the procedure in You
and Rao (2002) with survey weights
that may be calibrated on some vector of
auxiliary variables. Let
and
be the YR estimators of
and
respectively based on weights
(see You and Rao, 2002 for details). The
estimators
and
satisfy the estimating unit-level based
equations
Equations (2.12)
represent the survey-weighted version of equations (2.10). You-Rao predictions
of
are
computed as
for
Replacing
by
in (2.4)
leads to the YR estimator of
in the
case of non negligible sampling rates:
Estimators
and
can alternatively be obtained as solutions to
weighted mixed model equations similar to (2.6) (see Huang and Hidiroglou, 2003
for details). To this end, we define matrices
and
where
for
Let
be the sample weighted version of
where
with
and
representing the square root of matrices
and
respectively. In the first term of
the
model error associated with the observation
is
weighted by the corresponding survey weight
whereas
in the second term of
the
factor
in the
diagonal matrix
represents the weight attached to the small
area effect
It can
be shown that the minimization of
with
respect to
and
leads to
It
follows that
are given
by
where the
known values of
and
are given by
and
and
are
empirical versions of
and
obtained
by estimating
and
by
and
respectively. Equation (2.15) can
alternatively be written as
where
and