Two local diagnostics to evaluate the efficiency of the empirical best predictor under the Fay-Herriot model
Section 2. The Fay-Herriot model and the best predictor
We consider a finite population
of size
and a sample
of size
drawn from
according to a sampling design
The population
is partitioned into
domains that do not overlap. The domains are
identified by the subscript
taking values from 1 to
The population of domain
with a size of
is denoted as
The sample of domain
is denoted as
and its size is
We are interested in estimating
finite population parameters,
associated with the
domains. The parameter
is usually a total, an average or a ratio for domain
Auxiliary information is available in the form
of vectors,
available for all domains
The set containing the
auxiliary vectors is denoted by
Furthermore, we denote by
the set of all variables used to make
inferences excluding the inclusion indicators in the sample
includes
and
among others. The design expectation of a
random variable, say
will thus be denoted by
We consider a linking model that breaks down the parameters
of interest
as follows:
where
is a
vector of model parameters of the same dimension as
are
fixed factors that can be used to account for heterosedasticity in the model
and
are
error terms that follow the normal distribution:
where
is a
model parameter. In practice,
is a common choice but it may be more natural
to choose
when
is a
total. The term
is the
known effect or effect explained by the model of the finite population
parameter
while
is the
unknown or unexplained effect that is called the unexplained local effect of
or
simply the local effect of
The direct estimator of
is denoted by
It is usually obtained by assigning a survey
weight to each unit of the sample
The survey weight of a unit can simply be the
inverse of its probability of selection in the sample
or a calibration weight. The sampling error is
defined as:
In
what follows, the direct estimator will be assumed to be design-unbiased, i.e.
or
This assumption is not always satisfied in
practice, for example when using calibration weights, but we will make the
usual assumption that the bias remains negligible. We will also assume that the
direct estimator
and thus the error
follows a normal distribution. As discussed in
Rao and Molina (2015, page 77), the normality assumption of the errors
is possibly weaker than the normality
assumption of the errors
because of the effect of the central limit
theorem on
Of course, this effect is less pronounced for
smaller domains. Under these assumptions, we have:
where
is the design variance of
The sample size
can be very small, which can lead to poor
precision of the direct estimator
This problem has been at the origin of small
area estimation research.
By
combining the model (2.1) and the expression (2.2), we obtain the combined
model, also called the Fay-Herriot model:
Noting that
is fixed under the sampling design, it can
easily be shown that
where
is the smoothed variance (see the remark at
the end of this section). The standardized error of the combined model is given
by:
The direct estimate
provides information about
Rao and Molina (2015, Chapter 9,
pages 271-272) give the conditional distribution of
where
The best predictor of
conditionally on
(Rao and Molina, 2015), is then given by:
In
the remainder of this paper, the best predictor
will be called the B estimator.
In
Sections 3 and 4, the theory is developed assuming that
and
are known. In Section 5, the estimation
of these three quantities is discussed, which allows us to obtain an empirical
version of the best predictor and our diagnostics.
Remark: In the literature on small area estimation,
the theory is usually developed under the assumption that
is
fixed. Therefore, it is implicitly assumed that
When
making inferences under the Fay-Herriot model,
cannot
be expected to be fixed. For example, consider the case where
is a
proportion in the domain
and a stratified
simple random sampling with replacement design is used with strata that coincide
with domains. The direct estimator
is
simply the sample proportion in the domain
and it
is well known that its variance is given by
In this
case, it is obvious that
is
random since it depends on
It is
also easy to show that
unless
In the
rest of this paper, the entire theory is developed under the usual assumption
that
In
practice, these two variances are unknown and have to be estimated. Section 5
discusses the estimation of
using a
smoothing model. It can easily be shown that if a model-unbiased estimator,
is available, that is
then
this estimator is also model-unbiased for
that is
The
reverse is also true: a model-unbiased estimator for
will
also be model-unbiased for
Therefore, although
both
variances can be estimated by the same estimator. This suggests that the assumption
may not be so critical in practice.
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa