Development of a small area estimation system at Statistics Canada

Section 3. Area level model

Table of contents

The area level small area estimator first appeared in the seminal paper of Fay and Herriot (1979). Following that paper, let the parameter of interest be $θ_{i};$ common examples are totals, $Y_{i} = \sum_{j \in U_{i}} y_{j},$ or means, ${\bar{Y}}_{i} = Y_{i} / N_{i} .$ As noted above, the vector of auxiliary variables may differ from the one used in direct estimation and is denoted as $z .$ The area level model can be expressed as two equations.

The first equation, commonly known as the sampling model, is given by

${\hat{θ}}_{i} = θ_{i} + e_{i} (3.1)$

and expresses the direct estimate ${\hat{θ}}_{i}$ in terms of the unknown parameter $θ_{i}$ plus a random error $e_{i}$ due to sampling. The sampling errors $e_{i}$ are independently and identically distributed with mean 0 and variance $ψ_{i} :$ that is $E_{p} (e_{i} | θ_{i}) = 0$ and $V_{p} (e_{i} | θ_{i}) = ψ_{i},$ where $p$ denotes expectation in terms of the sample design. Note that $ψ_{i}$ is also the design variance of ${\hat{θ}}_{i}$ and is typically unknown.

The second equation, known as the linking model, is given by

$θ_{i} = z_{i}^{T} β + b_{i} v_{i} (3.2)$

and expresses the parameter $θ_{i}$ as a fixed effect $z_{i}^{T} β$ plus a random effect $v_{i}$ multiplied by $b_{i} .$ In the production system, the $b_{i}$ term has a default value of one but can be specified by the user to control heteroscedastic errors or the impact of influential observations. The random effects $v_{i}$ are independently and identically distributed with mean 0 and unknown model variance $σ_{v}^{2},$ that is $E_{m} (v_{i}) = 0$ and $V_{m} (v_{i}) = σ_{v}^{2}$ where $E_{m}$ denotes the model expectation and $V_{m}$ the model variance. The random errors $e_{i}$ are independent of the random effects $v_{i} .$ The combination of the sampling model and linking model results in a single generalized linear mixed model (GLMM) given by

${\hat{θ}}_{i} = z_{i}^{T} β + b_{i} v_{i} + e_{i} . (3.3)$

From the Fay-Herriot model (3.3), we observe that $E_{m p} ({\hat{θ}}_{i}) = z_{i}^{T} β$ and $V_{m p} ({\hat{θ}}_{i}) = b_{i}^{2} σ_{v}^{2} + {\tilde{ψ}}_{i},$ where ${\tilde{ψ}}_{i} = E_{m} (ψ_{i})$ is the smoothed design variance of ${\hat{θ}}_{i} .$ In general, we cannot treat $ψ_{i}$ as fixed, as it is not strictly a function of auxiliary data. If the $σ_{v}^{2} ’ s$ and ${\tilde{ψ}}_{i} ’ s$ are known, the solution to the GLMM yields the Best Linear Unbiased Predictor (BLUP), ${\tilde{θ}}_{i}^{BLUP}$

${\tilde{θ}}_{i}^{BLUP} = {\begin{array}{l} γ_{i} {\hat{θ}}_{i} + (1 - γ_{i}) z_{i}^{T} \tilde{β} & for i \in A \\ z_{i}^{T} \tilde{β} & for i \in \bar{A} \end{array} (3.4)$

where $γ_{i} = (b_{i}^{2} σ_{v}^{2}) / ({\tilde{ψ}}_{i} + b_{i}^{2} σ_{v}^{2})$ and $\tilde{β} = {(\sum_{i \in A} z_{i} z_{i}^{T} / ({\tilde{ψ}}_{i} + b_{i}^{2} σ_{v}^{2}))}^{- 1} \sum_{i \in A} z_{i} {\hat{θ}}_{i} / ({\tilde{ψ}}_{i} + b_{i}^{2} σ_{v}^{2}) .$

There are four recursive procedures for estimating $σ_{v}^{2}$ and $β$ in the production system. The first three assume that ${\tilde{ψ}}_{i}$ is known, or that a smoothed version of it is available (see the following section for details). Under this assumption, the variance components can be computed via the Fay-Herriot procedure (FH) as outlined in Fay and Herriot (1979), the restricted maximum likelihood (REML), or the Adjusted Density Maximization (ADM) due to Li and Lahiri (2010). The fourth procedure, WF, due to Wang and Fuller (2003) assumes that $ψ_{i}$ is estimated by ${\hat{ψ}}_{i}$ given that $n_{i} \geq 2.$ The WF procedure does not require any smoothing of the estimated ${\hat{ψ}}_{i}$ values before estimating $σ_{v}^{2} .$ Wang and Fuller (2003) carried out simulations with $n_{i}$ ranging from 9 to 36 and found that their procedure yielded reasonable estimates of $θ_{i}$ and its estimated mean squared error.

The main difference between these four procedures is how the $σ_{v}^{2} ’ s$ are computed. They are all based on an iterative scoring algorithm that obtains ${\hat{σ}}_{v}^{2}$ as an estimate of the model variance $σ_{v}^{2} .$ The FH, REML, and WF procedures may yield ${\hat{σ}}_{v}^{2} ’ s$ that are smaller than zero. If this occurs, the ${\hat{σ}}_{v}^{2} ’ s$ are set to zero for both the FH and REML procedures. A drawback of truncating the estimated $σ_{v}^{2}$ to zero is that the resulting small area estimator will be synthetic for all areas. Li and Lahiri (2010) suggested the ADM as a way to address the problem of obtaining negative ${\hat{σ}}_{v}^{2}$ by maximizing an adjusted likelihood defined as a product of the model variance and a standard likelihood. Although the ADM method always gives a positive solution for $σ_{v}^{2},$ it should be used cautiously because it overestimates the model variance. The REML, FH and ADM procedures use the smoothed values of the estimated ${\hat{ψ}}_{i}$ values obtained from the sample or some estimate provided by the user. For the WF procedure, if ${\hat{σ}}_{v}^{2} < 0,$ Wang and Fuller (2003) suggested to set ${\hat{σ}}_{v}^{2}$ to $0.5 \sqrt{\hat{V} ({\hat{σ}}_{v}^{2})},$ where

$\hat{V} ({\hat{σ}}_{v}^{2}) = \sum_{i \in A} 2 κ_{i}^{2} [{({\hat{ψ}}_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2})}^{2} + \frac{{({\hat{ψ}}_{i})}^{2}}{(n_{i} - 1)}]$

and

$κ_{i} = \frac{{[b_{i}^{2} {\hat{σ}}_{v}^{2} + \frac{(n_{i} + 1)}{(n_{i} - 1)} {\hat{ψ}}_{i}]}^{- 1}}{\sum_{i \in A} {[b_{i}^{2} {\hat{σ}}_{v}^{2} + \frac{(n_{i} + 1)}{(n_{i} - 1)} {\hat{ψ}}_{i}]}^{- 1}} .$

Plugging ${\hat{σ}}_{v}^{2}$ and an estimate of ${\tilde{ψ}}_{i} ’ s$ into the ${\tilde{θ}}_{i}^{BLUP},$ defined by equation (3.4), yields the Empirical Best Linear Unbiased Predictor (EBLUP), ${\hat{θ}}_{i}^{EBLUP} .$ It is given by

${\hat{θ}}_{i}^{EBLUP} = {\begin{array}{l} {\hat{γ}}_{i} {\hat{θ}}_{i} + (1 - {\hat{γ}}_{i}) z_{i}^{T} \hat{β} & for i \in A \\ z_{i}^{T} \hat{β} & for i \in \bar{A} \end{array}$

where ${\hat{γ}}_{i} = (b_{i}^{2} {\hat{σ}}_{v}^{2}) / ({\ddot{ψ}}_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2}),$ $\hat{β} = {(\sum_{i \in A} z_{i} z_{i}^{T} / ({\ddot{ψ}}_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2}))}^{- 1} \sum_{i \in A} z_{i} {\hat{θ}}_{i}^{DIR} / ({\ddot{ψ}}_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2}),$ and ${\ddot{ψ}}_{i}$ is chosen according to the procedure used. For the REML, FH and ADM procedures the ${\ddot{ψ}}_{i} ’ s$ are the smoothed values of the estimated ${\hat{ψ}}_{i}$ values obtained from the sample or some estimate provided by the user. For the WF procedure, we have that ${\ddot{ψ}}_{i} = {\hat{ψ}}_{i} .$ If the estimated model variance $b_{i}^{2} {\hat{σ}}_{v}^{2}$ is relatively small compared with ${\ddot{ψ}}_{i},$ then ${\hat{γ}}_{i}$ will be small and more weight will be attached to the synthetic estimator $z_{i}^{T} \hat{β} .$ Similarly, more weight is attached to the direct estimator, ${\hat{θ}}_{i},$ if the design variance ${\ddot{ψ}}_{i}$ is relatively small.

Details of the required computations can be found in the methodology specifications for the production system in Estevao et al. (2015).

3.1 Estimation of the smooth design variance

The design variance, $ψ_{i},$ could be used as an estimator of the smooth design variance ${\tilde{ψ}}_{i} = E_{m} (ψ_{i})$ if it were known. In most cases, it is unknown. To get around this difficulty, a design-unbiased variance estimator ${\hat{ψ}}_{i}$ of $ψ_{i}$ is assumed to be available; i.e., $E_{p} ({\hat{ψ}}_{i}) = ψ_{i} .$ Under this assumption, we have that

$E_{m p} ({\hat{ψ}}_{i}) = E_{m} (ψ_{i}) = {\tilde{ψ}}_{i} .$

A simple unbiased estimator of the smooth design variance ${\tilde{ψ}}_{i}$ is ${\hat{ψ}}_{i} .$ However, ${\hat{ψ}}_{i}$ may be quite unstable when the sample size in domain $i$ is small. A more efficient estimator is obtained by modelling ${\hat{ψ}}_{i}$ given $z_{i} .$ Dick (1995) and Rivest and Belmonte (2000) considered smoothing models given by

$\log ({\hat{ψ}}_{i}) = x_{i}^{T} α + ε_{i},$

where $x_{i}$ is a vector of explanatory variables that are functions of $z_{i},$ $α$ is a vector of unknown model parameters to be estimated, and $ε_{i}$ is a random error with $E_{m p} (ε_{i}) = 0$ and constant variance $σ_{ε}^{2} = V_{m p} (ε_{i}) .$ We also assume that the errors $ε_{i}$ are identically distributed conditionally on $z_{i},$ $i = 1, \dots, m .$ From the above model, we observe that

${\tilde{ψ}}_{i} = E_{m p} ({\hat{ψ}}_{i}) = \exp (x_{i}^{T} α) Δ,$

where $Δ = E_{m p} (\exp (ε_{i})) .$ Dick (1995) estimated ${\tilde{ψ}}_{i}$ by omitting the factor $Δ .$ Rivest and Belmonte (2000) estimated $Δ$ by assuming that the errors $ε_{i}$ are normally distributed. However, we observed empirically that the resulting estimator of $Δ$ is sensitive to deviations from the normality assumption. This assumption is avoided by using a method of moments (see Beaumont and Bocci, 2016). This leads to the unbiased estimator of $Δ$ given by

$\hat{Δ} (α) = \frac{\sum_{i = 1}^{m} {\hat{ψ}}_{i}}{\sum_{i = 1}^{m} \exp (x_{i}^{T} α)} .$

An estimator $\hat{α}$ of the vector of unknown model parameters $α$ is necessary to estimate ${\tilde{ψ}}_{i} .$ It is obtained using the ordinary least squares method as

$\hat{α} = {(\sum_{i = 1}^{m} x_{i} x_{i}^{T})}^{- 1} \sum_{i = 1}^{m} x_{i} \log ({\hat{ψ}}_{i}) .$

The estimator ${\hat{\tilde{ψ}}}_{i}$ of ${\tilde{ψ}}_{i}$ is then given by

${\hat{\tilde{ψ}}}_{i} = \exp (x_{i}^{T} \hat{α}) \hat{Δ} (\hat{α}) .$

A nice property of ${\hat{\tilde{ψ}}}_{i}$ is that the average of the smooth design variance estimator, ${\hat{\tilde{ψ}}}_{i},$ is equal to the average of the direct variance estimator, ${\hat{ψ}}_{i};$ i.e.,

$\frac{\sum_{i = 1}^{m} {\hat{\tilde{ψ}}}_{i}}{m} = \frac{\sum_{i = 1}^{m} {\hat{ψ}}_{i}}{m} .$

This ensures that ${\hat{\tilde{ψ}}}_{i}$ does not systematically overestimate or underestimate ${\tilde{ψ}}_{i} = E_{m p} ({\hat{ψ}}_{i}) .$

3.2 Benchmarking

If the parameter of interest $θ_{i}$ is a total $(θ_{i} = Y_{i}),$ the user may wish to have the sum of the small area estimates, $\hat{θ} = \sum_{i \in A \cup \bar{A}} {\hat{θ}}_{i}^{EBLUP},$ agree with the estimated totals $\hat{Y} = \sum_{i \in A} {\hat{Y}}_{i}$ at the overall sample level $s;$ i.e., $\hat{θ} = \hat{Y} .$ In the case of a mean, $θ_{i} = {\bar{Y}}_{i},$ this benchmarking condition becomes $\sum_{i \in A \cup \bar{A}} N_{i} {\hat{θ}}_{i}^{EBLUP} = \sum_{i \in A} N_{i} {\hat{θ}}_{i},$ where ${\hat{θ}}_{i} = {\hat{\bar{Y}}}_{i} .$

Two methods are available in the production system to ensure benchmarking for area based small area estimates. The first one is based on a difference adjustment and the second one is based on an augmented vector. They are valid for any method used to compute ${\hat{θ}}_{i}^{EBLUP}$ or whether the variance estimate ${\ddot{ψ}}_{i}$ has been smoothed or not. The benchmarking based on a difference adjustment is an adaptation of the benchmarking given in Battese et al. (1988). The benchmarking based on an augmented vector is due to Wang, Fuller and Qu (2008).

Difference adjustment: For this method, the ${\hat{θ}}_{i}^{EBLUP}$ estimator is adjusted only for those areas where the realized sample size $n_{i} \geq 1, i \in A$ and the synthetic estimates $z_{i}^{T} \hat{β}$ for $i \in \bar{A}$ are left as is. The resulting benchmarked estimator is given by ${\hat{θ}}_{i}^{EBLUP, b}$ and is defined as follows

${\hat{θ}}_{i}^{EBLUP, b} = {\begin{array}{l} {\hat{θ}}_{i}^{EBLUP} + α_{i} ({\hat{θ}}^{*} - \sum_{d \in A} ω_{d} {\hat{θ}}_{d}^{EBLUP}) & for i \in A \\ z_{i}^{T} \hat{β} & for i \in \bar{A} \end{array}$

where $α_{i} = {\sum_{i \in U_{A}} ω_{i}^{2} ({\ddot{ψ}}_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2})}^{- 1} ω_{i} ({\ddot{ψ}}_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2})$ for $i \in A,$ $ω_{i} = 1,$ if the benchmarking is to a total, and $ω_{i} = N_{i} / N,$ if the benchmarking is for the mean. The estimator ${\hat{θ}}^{*}$ is a value provided by the user that represents the total or mean of the $y$ -values of population $U .$ The benchmarking ensures that $\sum_{i \in A \cup \bar{A}} ω_{i} {\hat{θ}}_{i}^{EBLUP, b} = {\hat{θ}}^{*} .$

Augmented vector: The vector $z_{i}^{T}$ is augmented with $ω_{i} {\ddot{ψ}}_{i},$ to form $z_{i}^{* T} = (z_{i}^{T} , ω_{i} {\ddot{ψ}}_{i})$ with $ω_{i}$ and ${\ddot{ψ}}_{i}$ as previously defined. The resulting augmented generalized linear mixed model (GLMM) equation is given by

${\hat{θ}}_{i} = z_{i}^{* T} β^{*} + b_{i} v_{i}^{*} + e_{i} (3.5)$

where $E_{m} (v_{i}^{*}) = 0$ and $V_{m} (v_{i}^{*}) = σ_{v}^{* 2} .$ The estimates for $β^{*}$ and $σ_{v}^{* 2}$ are once more solved recursively for the four EBLUP procedures that we denote as ${\hat{θ}}_{i}^{EBLUP*} .$

The resulting benchmarked estimator ${\hat{θ}}_{i}^{EBLUP *, b}$ is given by

${\hat{θ}}_{i}^{{EBLUP}^{*} , b} = {\begin{array}{l} {\hat{γ}}_{i}^{*} {\hat{θ}}_{i}^{{EBLUP}^{*}} + (1 - {\hat{γ}}_{i}^{*}) z_{i}^{* T} {\hat{β}}^{*} & for i \in A \\ z_{i}^{* T} {\hat{β}}^{*} & for i \in \bar{A} \end{array}$

where ${\hat{γ}}_{i}^{*} = (b_{i}^{2} {\hat{σ}}_{v}^{* 2}) / ({\ddot{ψ}}_{i} + b_{i}^{2} {\hat{σ}}_{v}^{* 2}),$ and ${\hat{β}}^{*} = {(\sum_{i \in A} z_{i}^{*} z_{i}^{* T} / ({\ddot{ψ}}_{i} + b_{i}^{2} {\hat{σ}}_{v}^{* 2}))}^{- 1} \sum_{i \in A} z_{i}^{*} {\hat{θ}}_{i} / ({\ddot{ψ}}_{i} + b_{i}^{2} {\hat{σ}}_{v}^{* 2}) .$

All the components of ${\hat{θ}}_{i}^{{EBLUP}^{*} , b}$ are computed using the augmented model given by (3.5). It can be shown that $\sum_{i \in A \cup \bar{A}} ω_{i} {\hat{θ}}_{i}^{{EBLUP}^{*} , b} = \sum_{i \in A} ω_{i} {\hat{θ}}_{i},$ and hence the benchmarking holds.

The difference adjustment and augmented vector methods are two ways that benchmarking can be satisfied. Wang et al. (2008) suggested other procedures that can be used. Specifically, they adapted the self-calibrated estimator You and Rao (2002) developed in the context of the unit level model to the area level model. You, Rao and Hidiroglou (2013) obtained an estimator of the mean squared prediction error and its bias under a misspecified model.

3.3 Mean squared error estimation

The reliability of the EBLUP estimators is obtained as $MSE ({\hat{θ}}_{i}^{EBLUP}) = E {({\hat{θ}}_{i}^{EBLUP} - θ_{i})}^{2} .$ The expectation is with respect to models (3.3) for the non-benchmarked estimator, and (3.5) for the benchmarked estimator.

The estimated Mean Squared Errors (MSEs) of the area level estimators are given in Table 3.1. The specific form of the $g$ terms and the estimated variances can be found in Rao and Molina (2015) or in Estevao et al. (2015). For the benchmarked estimators, the estimated MSE for the difference adjustment approach uses the non-benchmarked MSE formulas. For the case of the augmented vector approach, the MSE is based on augmenting the vector $z_{i}^{T}$ with $ω_{i} {\ddot{ψ}}_{i} .$

Table 3.1
MSE estimates (mse) for the area level estimators
Table summary
This table displays the results of MSE estimates (mse) for the area level estimators. The information is grouped by Estimator (appearing as row headers), mse (appearing as column headers).
Estimator	mse
Fay-Herriot	$mse ({\hat{θ}}_{i}^{FH}) = {\begin{array}{l} g_{0 i} + g_{1 i} + g_{2 i} + 2 g_{3 i} & for i \in A \\ z_{i}^{T} var (\hat{β}) z_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2} & for i \in \bar{A} \end{array}$
ADM	$mse ({\hat{θ}}_{i}^{ADM}) = {\begin{array}{l} g_{0 i} + g_{1 i} + g_{2 i} + 2 g_{3 i} & for i \in A \\ z_{i}^{T} var (\hat{β}) z_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2} & for i \in \bar{A} \end{array}$
REML	$mse ({\hat{θ}}_{i}^{REML}) = {\begin{array}{l} g_{1 i} + g_{2 i} + 2 g_{3 i} & for i \in A \\ z_{i}^{T} var (\hat{β}) z_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2} & for i \in \bar{A} \end{array}$
WF	$mse ({\hat{θ}}_{i}^{WF}) = {\begin{array}{l} g_{1 i} + g_{2 i} + 2 g_{3 i} + g_{4 i} & for i \in A \\ z_{i}^{T} var (\hat{β}) z_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2} & for i \in \bar{A} \end{array}$

The various $g$ terms in Table 3.1 can be interpreted as follows. The $g_{0 i}$ is a bias correction term for FH and ADM. The $g_{1 i}$ term given by $g_{1 i} = {\hat{γ}}_{i} {\ddot{ψ}}_{i},$ accounts for most of the MSE if the number of areas is large. The $g_{2 i}$ term accounts for the estimation of $β,$ and $2 g_{3 i}$ accounts for the estimation of $σ_{v}^{2} .$ The $g_{4 i}$ term in the WF procedure reflects that the estimated value of $ψ_{i},$ ${\hat{ψ}}_{i},$ has been used. The estimated variance of $\hat{β},$ given by $var (\hat{β}) = {(\sum_{i \in A} \frac{z_{i} z_{i}^{T}}{{\ddot{ψ}}_{i} + b_{i}^{2} {\hat{σ}}_{v}^{2}})}^{- 1}$ is dependent on the particular procedure used to estimate $σ_{v}^{2} .$

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2019-05-07

Language selection

Search and menus

Search

Development of a small area estimation system at Statistics Canada

Section 3. Area level model

3.1 Estimation of the smooth design variance

3.2 Benchmarking

3.3 Mean squared error estimation

Development of a small area estimation system at Statistics Canada Section 3. Area level model

3.1 Estimation of the smooth design variance

3.2 Benchmarking

3.3 Mean squared error estimation

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Development of a small area estimation system at Statistics Canada

Section 3. Area level model