Bayesian benchmarking of the Fay-Herriot model using random deletion
Section 1. Introduction

Table of contents

In official statistics, it is important for lower level estimates to sum to upper level estimates. For example, the National Agricultural Statistics Service (NASS) often uses a “top-down” sequence in the release of its official estimates in which national and state estimates, e.g., estimated corn acreage totals, are published prior to the completion of supplemental data collection and estimation of corresponding county estimates (Cruze, Erciulescu, Nandram, Barboza and Young, 2019). Within these small administrative areas, the survey data often become sparse. Several popular modeling techniques give rise to more reliable small area estimates. However, the small area estimates may not automatically satisfy relationships with estimates at other levels of aggregation, and benchmarking procedures may be applied to enforce consistency among estimates.

There is a considerable history on benchmarking techniques which have been used to impose agreement among multiple levels and to protect against possible model misspecification. These procedures can be broadly classified in two categories: internal benchmarking, in which a target is derived from current survey data, and external benchmarking, in which a desired target may be taken from other sources such as administrative data or previously established estimates. We discuss external benchmarking, in accordance with NASS’s “top-down” procedure, of the Fay-Herriot (FH) model (Fay and Herriot, 1979).

The most recent review of small area estimation is given in Pfeffermann (2013), but see Rao and Molina (2015) for the most updated textbook on small area estimation. Earlier Jiang and Lahiri (2006) gave an extensive review of the classical inferential approach for linear and generalized linear mixed models that are used in small area estimation. There are discussions of benchmarking in these works as well, but the latter review was not on the hierarchical Bayes approach that is of primary interest in this paper.

Within the hierarchical Bayes framework, You, Rao and Dick (2004) studied benchmarked estimators for small area estimation based on unmatched sampling and linking models proposed earlier by You and Rao (2002). They applied this approach to undercoverage estimation for the ten provinces across for the 1991 Canadian Census. Wang, Fuller and Qu (2008) gave a characterization of the best linear unbiased predictor (BLUP) for small area means under an area level model that satisfies a benchmarking constraint and minimizes the loss function criterion that all linear unbiased predictors satisfy. They also presented an alternative way of imposing the benchmarking constraint such that the BLUP estimator would have a self-calibrated property (discussed in You and Rao, 2002). Wang et al. (2008) characterized a class of benchmarked estimators as the predictors that minimize a quadratic loss function subject to a benchmarking restriction. Their proposed self-calibrated augmented model reduces bias both at the overall and small area level. Other benchmarking procedures are given by, Datta and Ghosh (2013), Ghosh and Steorts (2013), Pfeffermann, Sikov and Tiller (2014) and Pfeffermann and Tiller (2006).

Whether fitting unit-level or area-level models, incorporating a fixed, external target amounts to imposing the general constraint $\sum_{i =1}^{l} w_{i} θ_{i} = a,$ where $a$ is a known constant and the $θ_{i}$ denote small area estimates to be benchmarked; for totals, the weights $w_{i}$ are all equal to 1. One way to do so is by using the following transformation, $ϕ = a - \sum_{i =1}^{l} θ_{i},$ keeping $θ_{i}, i =1, \dots, l - 1,$ unchanged and “deleting” the last small area, replacing it with $θ_{l} = ϕ - (a - \sum_{i =1}^{l - 1} θ_{i}) .$ Janicki and Vesper (2017) introduced a slightly different transformation, $ϕ_{i} = θ_{i}, i =1, \dots, l - 1, ϕ_{l} = \sum_{i =1}^{l} θ_{i},$ which is essentially an internal benchmarking that preserves the sum of all $l$ estimates. If that sum (of all $l$ unbenchmarked estimates) were prescribed as an external target, then $θ_{l} = ϕ_{l} - \sum_{i =1}^{l - 1} θ_{i},$ and Janicki and Vesper’s transformation becomes equivalent to deleting last small area estimate.

External benchmarking procedures, which deleted the last small area estimate, were explored by Nandram and Sayit (2011) and by Nandram, Berg, and Barboza (2014) for the purposes of benchmarking binomial probabilities and forecasts of crop yield, respectively. (In both of these contexts the constraint was actually imposed on the weighted sum of small area estimates.) Erciulescu, Cruze, and Nandram (2019) considered a variety of external benchmarking techniques including deletion, difference benchmarking, and ratio benchmarking in the context of hierarchical Bayesian small area models. Collectively, the external benchmarking constraint has been inserted in the likelihood function (e.g., Toto and Nandram, 2010), the joint density of the area effects (e.g., Nandram and Sayit, 2011), or in the posterior density of the area effects (Janicki and Vesper, 2017), although the latter choice is using the prior knowledge or requirements embodied in the constraint a posteriori rather than on the prior distributions themselves.

Datta, Ghosh, Steorts and Maples (2011), henceforth DGSM, proposed a general class of constrained Bayes estimators to provide benchmarked estimates. Referring specifically to the method of Toto and Nandram (2010) for unit level models, DGSM wrote the following: “A disadvantage to such an approach is that results can differ depending on which unit is dropped”. This statement also applies to Nandram and Toto (2010), Nandram, Toto and Choi (2011), Janicki and Vesper (2017) and others. It also applies in the same way to an area-level model subject to an external constraint. The procedures of DGSM depend on an important area-specific parameter (see Section 4). This parameter also has several different specifications, and it can be argued that the resulting estimates could also be affected by the choice of specification. Moreover, the procedures of DGSM do not provide posterior standard errors or credible intervals.

In response to DGSM’s comment on the last area deletion benchmarking, we introduce a random deletion benchmarking, giving a chance to each area to be deleted, and not just the last one. The random deletion benchmarking method is motivated mathematically in Appendix A. Empirical results show that there are slight differences between the last one deletion benchmarking and the random deletion benchmarking.

In this paper, we discuss random deletion benchmarking in the context of a Bayesian FH (BFH) model. In Section 2, the BFH model without constraint is introduced. The methodology for imposing an external target on the BFH model through random deletion is developed in Section 3. In Section 4, we describe the empirical studies to assess features of estimates obtained from random deletion benchmarking, including related measures of uncertainty. Finally, Section 5 has concluding remarks; more technical details are provided in several appendices.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2019-07-04

Language selection

Search and menus

Search

Bayesian benchmarking of the Fay-Herriot model using random deletion
Section 1. Introduction

Bayesian benchmarking of the Fay-Herriot model using random deletion Section 1. Introduction

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Bayesian benchmarking of the Fay-Herriot model using random deletion
Section 1. Introduction