Development of a small area estimation system at Statistics Canada
Section 3. Area level model
The area level
small area estimator first appeared in the seminal paper of Fay and Herriot
(1979). Following that paper, let the parameter of interest be
common examples are totals,
or
means,
As noted
above, the vector
of auxiliary variables may differ from the one used in direct estimation and is
denoted as
The area
level model can be expressed as two equations.
The first equation, commonly known as the sampling model, is given by
and expresses
the direct estimate
in terms
of the unknown parameter
plus a
random error
due to
sampling. The sampling errors
are
independently and identically distributed with mean 0 and variance
that is
and
where
denotes
expectation in terms of the sample design. Note that
is also
the design variance of
and is
typically unknown.
The second equation, known as the linking model, is given by
and expresses
the parameter
as a
fixed effect
plus a
random effect
multiplied by
In the
production system, the
term has
a default value of one but can be specified by the user to control
heteroscedastic errors or the impact of influential observations. The random
effects
are
independently and identically distributed with mean 0 and unknown model
variance
that is
and
where
denotes
the model expectation and
the
model variance. The random errors
are
independent of the random effects
The
combination of the sampling model and linking model results in a single
generalized linear mixed model (GLMM) given by
From the Fay-Herriot model (3.3), we observe that
and
where
is the smoothed design variance of
In general, we cannot treat
as fixed, as it is not strictly a function of
auxiliary data. If the
and
are known, the solution to the GLMM yields the
Best Linear Unbiased Predictor (BLUP),
where
and
There
are four recursive procedures for estimating
and
in the production system. The first three
assume that
is known, or that a smoothed version of it is
available (see the following section for details). Under this assumption, the
variance components can be computed via the Fay-Herriot procedure (FH) as
outlined in Fay and Herriot (1979), the restricted maximum likelihood (REML),
or the Adjusted Density Maximization (ADM) due to Li and Lahiri (2010). The
fourth procedure, WF, due to Wang and Fuller (2003) assumes that
is estimated by
given that
The WF procedure does not require any
smoothing of the estimated
values before
estimating
Wang and
Fuller (2003) carried out simulations with
ranging
from 9 to 36 and found that their procedure
yielded reasonable estimates of
and its
estimated mean squared error.
The
main difference between these four procedures is how the
are computed. They are all based on an
iterative scoring algorithm that obtains
as an estimate of the model variance
The FH, REML, and WF procedures may yield
that are smaller than zero. If this occurs,
the
are set to zero for both the FH and REML
procedures. A drawback of truncating the estimated
to zero is that the resulting small area
estimator will be synthetic for all areas. Li and Lahiri (2010) suggested the
ADM as a way to address the problem of obtaining negative
by maximizing an adjusted likelihood defined
as a product of the model variance and a standard likelihood. Although the ADM
method always gives a positive solution for
it should be used cautiously because it
overestimates the model variance. The
REML, FH and ADM procedures use the smoothed values of the estimated
values
obtained from the sample or some estimate provided by the user. For the WF
procedure, if
Wang and Fuller (2003) suggested to set
to
where
and
Plugging
and an estimate of
into the
defined by equation (3.4), yields the
Empirical Best Linear Unbiased Predictor (EBLUP),
It is given by
where
and
is
chosen according to the procedure used. For the REML, FH and ADM procedures the
are the smoothed values of
the estimated
values obtained from
the sample or some estimate provided by the user. For the WF procedure, we have that
If the estimated model
variance
is relatively small
compared with
then
will be small and more weight will be attached
to the synthetic estimator
Similarly, more weight is attached to the
direct estimator,
if the design variance
is relatively small.
Details
of the required computations can be found in the methodology specifications for
the production system in Estevao et al. (2015).
3.1 Estimation of the
smooth design variance
The
design variance,
could be used as an estimator of the smooth
design variance
if it were known. In most cases, it is
unknown. To get around this difficulty, a design-unbiased variance estimator
of
is assumed to be available; i.e.,
Under this assumption, we have that
A simple unbiased estimator of the smooth design
variance
is
However,
may be quite unstable when the sample size in
domain
is small. A more efficient estimator is
obtained by modelling
given
Dick (1995) and Rivest and Belmonte (2000)
considered smoothing models given by
where
is a
vector of explanatory variables that are functions of
is a
vector of unknown model parameters to be estimated, and
is a
random error with
and
constant variance
We also
assume that the errors
are
identically distributed conditionally on
From the
above model, we observe that
where
Dick
(1995) estimated
by
omitting the factor
Rivest
and Belmonte (2000) estimated
by
assuming that the errors
are
normally distributed. However, we observed empirically that the resulting
estimator of
is
sensitive to deviations from the normality assumption. This assumption is
avoided by using a method of moments (see
Beaumont
and Bocci, 2016). This leads to the unbiased estimator of
given by
An estimator
of the vector of unknown model parameters
is necessary to estimate
It is obtained using the ordinary least
squares method as
The estimator
of
is then given by
A nice property of
is that the average of the smooth design
variance estimator,
is equal to the average of the direct variance
estimator,
i.e.,
This ensures that
does not systematically overestimate or
underestimate
3.2 Benchmarking
If the parameter of interest
is a total
the user may wish to have the sum of the small
area estimates,
agree with the estimated totals
at the overall sample level
i.e.,
In the case of a mean,
this benchmarking condition becomes
where
Two methods are available in the production system to
ensure benchmarking for area based small
area estimates. The first one is based on a difference adjustment and the
second one is based on an augmented vector. They are valid for any method used
to compute
or whether the variance estimate
has been
smoothed or not. The benchmarking based on a difference adjustment is an
adaptation of the benchmarking given in Battese et al. (1988). The
benchmarking based on an augmented vector is due to Wang, Fuller and Qu (2008).
Difference adjustment:
For this method, the
estimator is adjusted only for those areas
where the realized sample size
and the
synthetic estimates
for
are left
as is. The resulting benchmarked estimator is given by
and is
defined as follows
where
for
if the
benchmarking is to a total, and
if the
benchmarking is for the mean. The estimator
is a
value provided by the user that represents the total or mean of the
-values of population
The
benchmarking ensures that
Augmented vector: The vector
is
augmented with
to form
with
and
as
previously defined. The resulting augmented generalized linear mixed model
(GLMM) equation is given by
where
and
The
estimates for
and
are once
more solved recursively for the four EBLUP procedures that we denote as
The resulting benchmarked estimator
is given by
where
and
All the components of
are computed using the augmented model given
by (3.5). It can be shown that
and hence the benchmarking holds.
The difference adjustment and augmented vector methods
are two ways that benchmarking can be satisfied. Wang et al. (2008)
suggested other procedures that can be used. Specifically, they adapted the
self-calibrated estimator You and Rao (2002) developed in the context of the
unit level model to the area level model. You, Rao and Hidiroglou (2013)
obtained an estimator of the mean
squared prediction error and its bias
under a misspecified model.
3.3 Mean squared
error estimation
The reliability of the EBLUP estimators is obtained as
The expectation is with respect to models
(3.3) for the non-benchmarked estimator, and (3.5) for the benchmarked
estimator.
The estimated Mean Squared Errors (MSEs) of the area
level estimators are given in Table 3.1. The specific form of the
terms and the estimated variances can be found
in Rao and Molina (2015) or in Estevao et al. (2015). For the benchmarked
estimators, the estimated MSE for the difference adjustment approach uses the
non-benchmarked MSE formulas. For the case of the augmented vector approach,
the MSE is based on augmenting the vector
with
Table 3.1
MSE estimates (mse) for the area level estimators
Table summary
This table displays the results of MSE estimates (mse) for the area level estimators. The information is grouped by Estimator (appearing as row headers), mse (appearing as column headers).
Estimator |
mse |
Fay-Herriot |
|
ADM |
|
REML |
|
WF |
|
The various
terms in Table 3.1 can be interpreted as
follows. The
is a bias correction term for FH and ADM. The
term given by
accounts
for most of the MSE if the number of areas is large. The
term accounts for the estimation of
and
accounts for the estimation of
The
term in the WF procedure reflects that the
estimated value of
has been used. The estimated variance of
given by
is dependent on the particular procedure used
to estimate