# Development of a small area estimation system at Statistics CanadaSection 5. Hierarchical Bayes (HB) method

The basic Fay-Herriot area level model includes a linear sampling model for direct survey estimates and a linear linking model for the parameters of interest. Such models are matched because ${\theta }_{i}$ appears as a linear function in both the sampling and linking models. There are instances when these equations are not matched such as when a function, $h\left({\theta }_{i}\right),$ is modelled as a linear function of explanatory variables instead of ${\theta }_{i}.$ The sampling model and linking model pair is

${\stackrel{^}{\theta }}_{i}={\theta }_{i}+{e}_{i}\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.1\right)$

and

$h\left({\theta }_{i}\right)={z}_{i}^{T}\beta +{b}_{i}{v}_{i}\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.2\right)$

where ${e}_{i}\text{\hspace{0.17em}}\stackrel{\text{ind}}{\sim }\text{\hspace{0.17em}}N\left(0,\text{\hspace{0.17em}}{\psi }_{i}\right)$ and ${v}_{i}\text{\hspace{0.17em}}\stackrel{\text{ind}}{\sim }\text{\hspace{0.17em}}N\left(0,\text{\hspace{0.17em}}{\sigma }_{v}^{2}\right).$

The model pair given by (5.1) and (5.2) is referred to as an unmatched model. Nonlinear linking models are often needed in practice to provide a better model fit to the data. For example, if the parameter of interest is a probability or a rate within the range of 0 and 1, a linear linking model with normal random effects may not be appropriate. A linking model, in this case, could be a logistic or log-linear model. Such a model was used to adjust counts for detailed levels for the 2011 Census of Canada. A good description of what is involved to carry out such an adjustment can be found in Dick (1995) and You, Rao and Dick (2004).

The production system includes the following choices of $h\left({\theta }_{i}\right)$

The inclusion of $h\left({\theta }_{i}\right)={\theta }_{i}$ corresponds to the matched model represented by equations (3.1) and (3.2). An advantage of choosing the Hierarchical Bayes method is that the estimated ${\sigma }_{v}^{2}$ cannot be negative. The function $\mathrm{log}\left({\theta }_{i}\right)\text{,}$ where ${\theta }_{i}$ is equal to the population mean ${\overline{Y}}_{i},$ was used in Fay and Herriot (1979). Their context was to estimate per capita income (PCI) for small places in the United States with a population less than 1,000. The function $h\left({\theta }_{i}\right),$ $\mathrm{log}\left({\theta }_{i}/\left({\theta }_{i}+{C}_{i}\right)\right),$ was included to support the methodology to estimate the net undercoverage in Canadian Censuses. In this model, ${\theta }_{i}$ represents the number of individuals not counted in the census, while ${C}_{i}$ is the known census count. As a result, ${\theta }_{i}/\left({\theta }_{i}+{C}_{i}\right)$ is the proportion of individuals undercounted by the Census.

The sampling variances, ${\psi }_{i},$ are assumed known for all the linking models represented by (5.2). The variances are assumed to be estimated for the first two functions (the matched Fay-Herriot and unmatched log-linear model) given in (5.3). If the sampling variances, ${\psi }_{i},$ are assumed known, then the unknown parameters in the sampling model (5.1) and the linking model (5.2) can be presented in a hierarchical Bayes (HB) framework as follows

$\left[{\stackrel{^}{\theta }}_{i}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{\theta }_{i}\right]\sim N\left({\theta }_{i},\text{\hspace{0.17em}}{\psi }_{i}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}m$

and

$\left[h\left({\theta }_{i}\right)\text{\hspace{0.17em}}|\text{\hspace{0.17em}}\beta ,\text{\hspace{0.17em}}{\sigma }_{v}^{2}\right]\sim N\left({z}_{i}^{T}\beta ,\text{\hspace{0.17em}}{b}_{i}^{2}{\sigma }_{v}^{2}\right).$

If the sampling variances are unknown, they are estimated by adding

$\left[{d}_{i}{\stackrel{⌢}{\psi }}_{i}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{\psi }_{i}\right]\sim {\psi }_{i}{\chi }_{{d}_{i}}^{2}$

where ${\chi }_{{d}_{i}}^{2}$ follows a chi-square distribution with ${d}_{i}=\left({n}_{i}-1\right)$ degrees of freedom.

The model parameters $\beta ,$ ${\sigma }_{v}^{2}$ and ${\psi }_{i}$ (when it is unknown) are assumed to obey prior distributions. The distributions used in the production system for $\beta$ and ${\sigma }_{v}^{2}$ are the flat prior, $\pi \left(\beta \right)\propto 1,$ and $\pi \left({\sigma }_{v}^{2}\right)\propto {\left({\sigma }_{v}^{2}\right)}^{-1/2}.$ If ${\psi }_{i}$ is estimated, the prior $\pi \left({\psi }_{i}\right)\propto {\left({\psi }_{i}\right)}^{-1/2}$ is added to the Bayesian model. These prior distributions are multiplied by the density functions of the distributions associated with the sampling and linking models. This yields a joint likelihood function in terms of the model parameters. This function is used to obtain a full conditional (posterior) distribution for each of the unknown parameters. For some of these, the resulting distribution has a tractable or well-known form. For others, the resulting distribution is a product of density functions with no known form. All HB methods involve estimation of the model parameters through repeated sampling of their respective full conditional distributions.

Markov Chain Monte Carlo (MCMC) methods are used to obtain estimates from the full conditional distribution of each parameter. Gibbs sampling is used repeatedly to sample from the full conditional distributions. The Gibbs sampling method (Gelfand and Smith, 1990) with the Metropolis-Hastings algorithm (Chib and Greenberg, 1995) are used to find the posterior means and posterior variances; see Estevao et al. (2015) for details. The various estimators of ${\theta }_{i}$ resulting from (5.3) are denoted as ${\stackrel{^}{\theta }}_{i}^{\text{HB}}.$

## 5.1  Benchmarked HB estimator

Benchmarking of the estimators uses the difference adjustment method described in Section 3.2. That is, the benchmarked estimators ${\stackrel{^}{\theta }}_{i}^{\text{HB}}$ are computed as

${\stackrel{^}{\theta }}_{i}^{\text{HB},\text{\hspace{0.17em}}b}=\left\{\begin{array}{ll}{\stackrel{^}{\theta }}_{i}^{\text{HB}}+{\alpha }_{i}\left({\stackrel{^}{\theta }}^{*}-{\sum }_{d\in A}{\omega }_{d}{\stackrel{^}{\theta }}_{d}^{\text{HB}}\right)\hfill & \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i\in A\hfill \\ {z}_{i}^{T}\stackrel{^}{\beta }\hfill & \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i\in \overline{A}\hfill \end{array}$

where ${\alpha }_{i}={\left({\sum }_{i\in A}{\omega }_{i}^{2}\left({\stackrel{¨}{\psi }}_{i}+{b}_{i}^{2}{\stackrel{^}{\sigma }}_{v}^{2\text{HB}}\right)\right)}^{-1}{\omega }_{i}\left({\stackrel{¨}{\psi }}_{i}+{b}_{i}^{2}{\stackrel{^}{\sigma }}_{v}^{2\text{HB}}\right)$ for $i\in A,$ and ${\stackrel{^}{\theta }}^{*}$ is the benchmark value. The terms ${\omega }_{i}$ are defined as follows: ${\omega }_{i}=1$ if the benchmarking is to a total, and ${\omega }_{i}={N}_{i}/N$ if the benchmarking is for the mean. The ${\stackrel{¨}{\psi }}_{i}’s$ are either known or unknown. The ${\stackrel{^}{\theta }}^{*}$ can be a value provided by the user that represents the total or mean of the $y$ -values of population $U.$ The benchmarking ensures that ${\sum }_{i\in A\cup \overline{A}}{\omega }_{i}{\stackrel{^}{\theta }}_{i}^{\text{HB},\text{\hspace{0.17em}}b}={\stackrel{^}{\theta }}^{*}.$

﻿

Is something not working? Is there information outdated? Can't find what you're looking for?