Development of a small area estimation system at Statistics Canada
Section 4. Unit level model
The original unit
level model was proposed by Battese et al. (1988). They assumed the
following nested error model
where
are the
random effects and are independent of the random errors,
with
The
production system includes a slight modification to the error structure of the
random errors. That is,
where
are
positive constants that account for heteroscedasticity.
The production system computes small area
estimates for means
and totals
The
values are fixed positive constants known for
all population units. The addition of
was necessary to allow the use of the system
by some business surveys conducted at Statistics Canada (see Rubin-Bleuer, Jang
and Godbout, 2016). The available auxiliary data are either totals
or means
In what follows, we provide the estimators
of the population means
say
where
Estimates of the corresponding totals
are obtained by multiplying
by
The design weighted sample mean of the
and
are respectively
and
The model
based weighted means are
and
Battese et al. (1988) did not include
survey design weights in their procedure, thereby forsaking design consistency
unless the design was self-weighting. We refer to this estimator as EBLUP
However, EBLUP is the most efficient estimator
under model (4.1), with error structure
and this is the reason that it is included in
the production system.
Kott (1989), Prasad and Rao (1999), and You
and Rao (2002) proposed the use of design-consistent model based estimators for the area means by including the survey
weight. The You and Rao (2002) procedure was suitably modified to reflect the
heteroscedastic residuals and the
The resulting Pseudo-EBLUP estimator, denoted
as PEBLUP
was included in the production system as it is
design consistent.
The EBLUP estimator is defined as
where
The
terms
and
are the
previously defined model based weighted means for
and
respectively. The regression vector
is
estimated as
The PEBLUP estimator,
is given by
where
and
The
terms
and
are the
previously defined design based weighted means for
and
respectively. The regression vector
is
estimated as
where
and with
computed
as
The components of variance,
and
are estimated using the fitting-of-constants
(not weighted by the survey weights) method, as given by Battese et al.
(1988) or Rao (2003). The resulting estimators of
are always greater than or equal to zero, but
the estimator of
may be negative. If
it is set to zero, implying that there are no
area effects. The associated estimated MSEs are obtained by extending You and
Rao (2002) and Stukel and Rao (1997).
Note that if the
sample
is
selected from the universe
the
realized sampling fraction,
could be non-negligible. For estimating a
population mean,
Rao and
Molina (2015), accounted for non-negligible sampling fractions by expressing it as
where
is the sample mean of the
sampled area and
is the sample mean of the non-sampled units
within that area. They predicted
using the unit level model given by
equation (4.1). Their expressions correspond
to the case when
This estimator was extended by Rubin-Bleuer (2014) to include the EBLUP and PEBLUP estimators for the
case that
is
arbitrary. Specific
details that also account for MSE estimation can be found in Estevao et al. (2015).
4.1 Benchmarking
The current production system does not have a procedure
to benchmark the estimates obtained via the unit level model. However, the
difference adjustment approach can be suitably modified to allow this. The
EBLUP and PEBLUP estimators are of the form
where
and
correspond to the terms defined previously:
is equal
to
for
EBLUP, and to
for
PEBLUP;
is equal
to
for
EBLUP, and to
for
PEBLUP;
is equal
to
for
EBLUP, and to
for
PEBLUP; and,
is equal
to
for
EBLUP, and to
for
PEBLUP.
Suppose that
needs to be benchmarked to
The corresponding benchmarked estimator is
where
The
term is
defined as follows:
if the
benchmarking is to a total and
if the
benchmarking is for the mean. Possible choices of the
are
for EBLUP, and
for
PEBLUP.
4.2 Mean squared
error estimation
The mean squared error estimates of the unit level
estimators are based on estimating its mean squared error, given model (4.1)
and error structure
Table 4.1 displays these estimated MSE’s.
Table 4.1
MSE estimates for the unit level estimators
Table summary
This table displays the results of MSE estimates for the unit level estimators. The information is grouped by Estimator (appearing as row headers), mse (appearing as column headers).
Estimator |
mse |
EBLUP |
|
PEBLUP |
|
The various
terms in Table 4.1 can be interpreted in
a similar way to those associated with the area level MSE’s. The
are denoted as
for EBLUP, and
for PEBLUP account for most of the MSE if the number of areas is large. The
account for the estimation of
and the
account for the estimation of
and
The estimated variances of
and
are respectively given by
and
where
The specific form of the
terms and the estimated variances can be found
in Estevao et al. (2015).