Two local diagnostics to evaluate the efficiency of the empirical best predictor under the Fay-Herriot model
Section 2. The Fay-Herriot model and the best predictor

Table of contents

We consider a finite population $U$ of size $N$ and a sample $s$ of size $n$ drawn from $U$ according to a sampling design $p (s) .$ The population $U$ is partitioned into $m$ domains that do not overlap. The domains are identified by the subscript $i$ taking values from 1 to $m .$ The population of domain $i,$ with a size of $N_{i},$ is denoted as $U_{i} .$ The sample of domain $i$ is denoted as $s_{i}$ and its size is $n_{i} .$ We are interested in estimating $m$ finite population parameters, $θ_{i}, i = 1, \dots, m,$ associated with the $m$ domains. The parameter $θ_{i}$ is usually a total, an average or a ratio for domain $i .$ Auxiliary information is available in the form of vectors, $z_{i},$ available for all domains $i = 1, \dots, m .$ The set containing the $m$ auxiliary vectors is denoted by $Z = {z_{i}}_{i =1, \dots, m} .$ Furthermore, we denote by $Ω,$ the set of all variables used to make inferences excluding the inclusion indicators in the sample $s;$ $Ω$ includes $Z$ and $θ_{i}, i = 1, \dots, m,$ among others. The design expectation of a random variable, say $A,$ will thus be denoted by $E (A | Ω) .$

We consider a linking model that breaks down the parameters of interest $θ_{i}$ as follows:

$θ_{i} = β^{Τ} z_{i} + b_{i} v_{i}, i = 1, \dots, m, (2.1)$

where $β$ is a vector of model parameters of the same dimension as $z_{i},$ $b_{i}$ are fixed factors that can be used to account for heterosedasticity in the model and $v_{i}$ are error terms that follow the normal distribution: $v_{i} | Z ~ N (0, σ_{v}^{2}),$ where $σ_{v}^{2}$ is a model parameter. In practice, $b_{i} = 1$ is a common choice but it may be more natural to choose $b_{i} = N_{i}$ when $θ_{i}$ is a total. The term $β^{Τ} z_{i}$ is the known effect or effect explained by the model of the finite population parameter $θ_{i},$ while $b_{i} v_{i}$ is the unknown or unexplained effect that is called the unexplained local effect of $θ_{i}$ or simply the local effect of $θ_{i} .$

The direct estimator of $θ_{i}$ is denoted by ${\hat{θ}}_{i} .$ It is usually obtained by assigning a survey weight to each unit of the sample $s_{i} .$ The survey weight of a unit can simply be the inverse of its probability of selection in the sample $s$ or a calibration weight. The sampling error is defined as:

$e_{i} = {\hat{θ}}_{i} - θ_{i} . (2.2)$

In what follows, the direct estimator will be assumed to be design-unbiased, i.e. $E ({\hat{θ}}_{i} | Ω) = θ_{i}$ or $E (e_{i} | Ω) = 0.$ This assumption is not always satisfied in practice, for example when using calibration weights, but we will make the usual assumption that the bias remains negligible. We will also assume that the direct estimator ${\hat{θ}}_{i},$ and thus the error $e_{i},$ follows a normal distribution. As discussed in Rao and Molina (2015, page 77), the normality assumption of the errors $e_{i}$ is possibly weaker than the normality assumption of the errors $v_{i}$ because of the effect of the central limit theorem on ${\hat{θ}}_{i} .$ Of course, this effect is less pronounced for smaller domains. Under these assumptions, we have: $e_{i} | Ω ~ N (0, ψ_{i}),$ where $ψ_{i} = V ({\hat{θ}}_{i} | Ω)$ is the design variance of ${\hat{θ}}_{i} .$ The sample size $n_{i}$ can be very small, which can lead to poor precision of the direct estimator ${\hat{θ}}_{i} .$ This problem has been at the origin of small area estimation research.

By combining the model (2.1) and the expression (2.2), we obtain the combined model, also called the Fay-Herriot model:

${\hat{θ}}_{i} = β^{Τ} z_{i} + b_{i} v_{i} + e_{i} . (2.3)$

Noting that $v_{i}$ is fixed under the sampling design, it can easily be shown that $V (b_{i} v_{i} + e_{i} | Z) = b_{i}^{2} σ_{v}^{2} + {\tilde{ψ}}_{i},$ where ${\tilde{ψ}}_{i} = E (ψ_{i} | Z)$ is the smoothed variance (see the remark at the end of this section). The standardized error of the combined model is given by:

$ε_{i} = \frac{{\hat{θ}}_{i} - β^{Τ} z_{i}}{\sqrt{b_{i}^{2} σ_{v}^{2} + {\tilde{ψ}}_{i}}} . (2.4)$

The direct estimate ${\hat{θ}}_{i}$ provides information about $θ_{i} .$ Rao and Molina (2015, Chapter 9, pages 271-272) give the conditional distribution of $θ_{i} :$

$θ_{i} | Z, {\hat{θ}}_{i} ~ N {β^{Τ} z_{i} + γ_{i} ({\hat{θ}}_{i} - β^{Τ} z_{i}), (1 - γ_{i}) b_{i}^{2} σ_{v}^{2}}, (2.5)$

where $γ_{i} = \frac{b_{i}^{2} σ_{v}^{2}}{b_{i}^{2} σ_{v}^{2} + {\tilde{ψ}}_{i}} .$

The best predictor of $θ_{i},$ conditionally on ${\hat{θ}}_{i}$ (Rao and Molina, 2015), is then given by:

${\hat{θ}}_{i}^{B} = E (θ_{i} | Z, {\hat{θ}}_{i}) = γ_{i} {\hat{θ}}_{i} + (1 - γ_{i}) β^{Τ} z_{i} . (2.6)$

In the remainder of this paper, the best predictor ${\hat{θ}}_{i}^{B}$ will be called the B estimator.

In Sections 3 and 4, the theory is developed assuming that $β,$ $σ_{v}^{2}$ and ${\tilde{ψ}}_{i}$ are known. In Section 5, the estimation of these three quantities is discussed, which allows us to obtain an empirical version of the best predictor and our diagnostics.

Remark: In the literature on small area estimation, the theory is usually developed under the assumption that $ψ_{i}$ is fixed. Therefore, it is implicitly assumed that ${\tilde{ψ}}_{i} = ψ_{i} .$ When making inferences under the Fay-Herriot model, $ψ_{i}$ cannot be expected to be fixed. For example, consider the case where $θ_{i}$ is a proportion in the domain $i$ and a stratified simple random sampling with replacement design is used with strata that coincide with domains. The direct estimator ${\hat{θ}}_{i}$ is simply the sample proportion in the domain $i$ and it is well known that its variance is given by $ψ_{i} = n_{i}^{- 1} θ_{i} (1 - θ_{i}) .$ In this case, it is obvious that $ψ_{i}$ is random since it depends on $θ_{i} .$ It is also easy to show that ${\tilde{ψ}}_{i} = n_{i}^{- 1} (β^{Τ} z_{i} (1 - β^{Τ} z_{i}) - b_{i}^{2} σ_{v}^{2}) \neq ψ_{i}$ unless $v_{i} = σ_{v} = 0.$ In the rest of this paper, the entire theory is developed under the usual assumption that ${\tilde{ψ}}_{i} = ψ_{i} .$ In practice, these two variances are unknown and have to be estimated. Section 5 discusses the estimation of ${\tilde{ψ}}_{i}$ using a smoothing model. It can easily be shown that if a model-unbiased estimator, ${\hat{\tilde{ψ}}}_{i},$ is available, that is $E ({\hat{\tilde{ψ}}}_{i} | Z) = {\tilde{ψ}}_{i},$ then this estimator is also model-unbiased for $ψ_{i},$ that is $E ({\hat{\tilde{ψ}}}_{i} - ψ_{i} | Z) = 0.$ The reverse is also true: a model-unbiased estimator for $ψ_{i}$ will also be model-unbiased for ${\tilde{ψ}}_{i} .$ Therefore, although ${\tilde{ψ}}_{i} \neq ψ_{i},$ both variances can be estimated by the same estimator. This suggests that the assumption ${\tilde{ψ}}_{i} = ψ_{i}$ may not be so critical in practice.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2022-01-06

Language selection

Search and menus

Search

Two local diagnostics to evaluate the efficiency of the empirical best predictor under the Fay-Herriot model
Section 2. The Fay-Herriot model and the best predictor

Two local diagnostics to evaluate the efficiency of the empirical best predictor under the Fay-Herriot model Section 2. The Fay-Herriot model and the best predictor

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Two local diagnostics to evaluate the efficiency of the empirical best predictor under the Fay-Herriot model
Section 2. The Fay-Herriot model and the best predictor