# Development of a small area estimation system at Statistics Canada

Section 5. Hierarchical Bayes (HB) method

The basic Fay-Herriot area
level model includes a linear sampling model for direct survey estimates
and a linear linking model for the parameters of interest. Such models are *matched *because
${\theta}_{i}$
appears
as a linear function in both the sampling and linking models. There are
instances when these equations are not matched such as when a function,
$h\left({\theta}_{i}\right),$
is modelled as a linear function of explanatory
variables instead of
${\theta}_{i}.$
The *sampling
model* and *linking model* pair is

$${\widehat{\theta}}_{i}={\theta}_{i}+{e}_{i}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(5.1)$$

and

$$h\left({\theta}_{i}\right)={z}_{i}^{T}\beta +{b}_{i}{v}_{i}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(5.2)$$

where ${e}_{i}\text{\hspace{0.17em}}\stackrel{\text{ind}}{{\displaystyle \sim}}\text{\hspace{0.17em}}N\left(0,\text{\hspace{0.17em}}{\psi}_{i}\right)$ and ${v}_{i}\text{\hspace{0.17em}}\stackrel{\text{ind}}{{\displaystyle \sim}}\text{\hspace{0.17em}}N\left(0,\text{\hspace{0.17em}}{\sigma}_{v}^{2}\right).$

The model pair given by (5.1) and (5.2) is referred to
as an *unmatched* model. Nonlinear
linking models are often needed in practice to provide a better model fit to
the data. For example, if the parameter of interest is a probability or a rate
within the range of 0 and

The production system includes the following choices of $h\left({\theta}_{i}\right)$

$$h\left({\theta}_{i}\right)=\{\begin{array}{ll}{\theta}_{i}\hfill & :\text{MatchedFay-Herriot}\left(\text{FH}\right)\text{model}\hfill \\ \mathrm{log}\left({\theta}_{i}\right)\hfill & :\text{Unmatchedlog-linearmodel}\hfill \\ \mathrm{log}\left({\theta}_{i}/\left({\theta}_{i}+{C}_{i}\right)\right)\hfill & :\text{Unmatchedlogcensusundercountmodel}\text{.}\hfill \end{array}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(5.3)$$

The inclusion of
$h\left({\theta}_{i}\right)={\theta}_{i}$
corresponds to the matched model represented
by equations (3.1) and (3.2). An advantage of choosing the Hierarchical
Bayes method is that the estimated
${\sigma}_{v}^{2}$
cannot
be negative. The function
$\mathrm{log}\left({\theta}_{i}\right)\text{,}$
where
${\theta}_{i}$
is equal
to the population mean
${\overline{Y}}_{i},$
was used
in Fay and Herriot (1979). Their context was to estimate per capita income
(PCI) for small places in the

The sampling variances, ${\psi}_{i},$ are assumed known for all the linking models represented by (5.2). The variances are assumed to be estimated for the first two functions (the matched Fay-Herriot and unmatched log-linear model) given in (5.3). If the sampling variances, ${\psi}_{i},$ are assumed known, then the unknown parameters in the sampling model (5.1) and the linking model (5.2) can be presented in a hierarchical Bayes (HB) framework as follows

$$\left[{\widehat{\theta}}_{i}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{\theta}_{i}\right]\sim N\left({\theta}_{i},\text{\hspace{0.17em}}{\psi}_{i}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}m$$

and

$$\left[h\left({\theta}_{i}\right)\text{\hspace{0.17em}}|\text{\hspace{0.17em}}\beta ,\text{\hspace{0.17em}}{\sigma}_{v}^{2}\right]\sim N\left({z}_{i}^{T}\beta ,\text{\hspace{0.17em}}{b}_{i}^{2}{\sigma}_{v}^{2}\right).$$

If the sampling variances are unknown, they are estimated by adding

$$\left[{d}_{i}{\stackrel{\u2322}{\psi}}_{i}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{\psi}_{i}\right]\sim {\psi}_{i}{\chi}_{{d}_{i}}^{2}$$

where ${\chi}_{{d}_{i}}^{2}$ follows a chi-square distribution with ${d}_{i}=\left({n}_{i}-1\right)$ degrees of freedom.

The model parameters $\beta ,$ ${\sigma}_{v}^{2}$ and ${\psi}_{i}$ (when it is unknown) are assumed to obey prior distributions. The distributions used in the production system for $\beta $ and ${\sigma}_{v}^{2}$ are the flat prior, $\pi \left(\beta \right)\propto 1,$ and $\pi \left({\sigma}_{v}^{2}\right)\propto {\left({\sigma}_{v}^{2}\right)}^{-1/2}.$ If ${\psi}_{i}$ is estimated, the prior $\pi \left({\psi}_{i}\right)\propto {\left({\psi}_{i}\right)}^{-1/2}$ is added to the Bayesian model. These prior distributions are multiplied by the density functions of the distributions associated with the sampling and linking models. This yields a joint likelihood function in terms of the model parameters. This function is used to obtain a full conditional (posterior) distribution for each of the unknown parameters. For some of these, the resulting distribution has a tractable or well-known form. For others, the resulting distribution is a product of density functions with no known form. All HB methods involve estimation of the model parameters through repeated sampling of their respective full conditional distributions.

Markov Chain Monte Carlo (MCMC) methods are used to obtain estimates from the full conditional distribution of each parameter. Gibbs sampling is used repeatedly to sample from the full conditional distributions. The Gibbs sampling method (Gelfand and Smith, 1990) with the Metropolis-Hastings algorithm (Chib and Greenberg, 1995) are used to find the posterior means and posterior variances; see Estevao et al. (2015) for details. The various estimators of ${\theta}_{i}$ resulting from (5.3) are denoted as ${\widehat{\theta}}_{i}^{\text{HB}}.$

## 5.1 Benchmarked HB estimator

Benchmarking of the estimators uses the *difference
adjustment method *described in Section 3.2. That is, the benchmarked
estimators
${\widehat{\theta}}_{i}^{\text{HB}}$
are computed as

$${\widehat{\theta}}_{i}^{\text{HB},\text{\hspace{0.17em}}b}=\{\begin{array}{ll}{\widehat{\theta}}_{i}^{\text{HB}}+{\alpha}_{i}\left({\widehat{\theta}}^{*}-{\displaystyle {\sum}_{d\in A}{\omega}_{d}{\widehat{\theta}}_{d}^{\text{HB}}}\right)\hfill & \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i\in A\hfill \\ {z}_{i}^{T}\widehat{\beta}\hfill & \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i\in \overline{A}\hfill \end{array}$$

where
${\alpha}_{i}={\left({\displaystyle {\sum}_{i\in A}{\omega}_{i}^{2}\left({\ddot{\psi}}_{i}+{b}_{i}^{2}{\widehat{\sigma}}_{v}^{2\text{HB}}\right)}\right)}^{-1}{\omega}_{i}\left({\ddot{\psi}}_{i}+{b}_{i}^{2}{\widehat{\sigma}}_{v}^{2\text{HB}}\right)$
for
$i\in A,$
and
${\widehat{\theta}}^{*}$
is the
benchmark value. The terms
${\omega}_{i}$
are
defined as follows:
${\omega}_{i}=1$
if the
benchmarking is to a total, and
${\omega}_{i}={N}_{i}/N$
if the
benchmarking is for the mean. The
${\ddot{\psi}}_{i}\u2019s$
are
either known or unknown. The
${\widehat{\theta}}^{*}$
can be a
value provided by the user that represents the total or mean of the
$y$
*-*values of population
$U.$
The
benchmarking ensures that
${\sum}_{i\in A\cup \overline{A}}{\omega}_{i}{\widehat{\theta}}_{i}^{\text{HB},\text{\hspace{0.17em}}b}}={\widehat{\theta}}^{*}.$

- Date modified: