Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible
Section 1. Introduction

Table of contents

Small area estimation (SAE) has grown in importance in recent years due to the demand for reliable small area statistics. Direct estimators are used to estimate parameters of interest when the sample size is reasonably large. However, they have large standard errors and coefficients of variation when it comes to applying them to small areas, as the realized sample size will be quite small. It is therefore necessary to use models that borrow strength from other related areas or from past surveys to have stable estimators for these small areas. Model-based estimates typically show a substantial improvement over direct estimates in terms of mean squared error (MSE).

The available theory for small area estimation is based on either area-level or unit-level models, depending on the level of available auxiliary information. Unit-level based methods use the data of the individual units as auxiliary information, whereas area-level based methods use aggregates or means of the data of the units within the small areas. Fay and Herriot (1979), denoted hereafter as the FH model, is the most used area-level model in small area estimation. The one-fold nested error regression model proposed in Battese, Harter and Fuller (1988), also known as the basic unit-level model, is frequently used when unit-level information is available. We denote this model as the BHF model. Both are special cases of a general linear mixed model in SAE (see Rao and Molina, 2015 for an excellent account of the small area estimation).

Small area means or totals are the most frequent linear parameters estimated in SAE. In these cases, the most popular small area method is the use of linear mixed models to derive the best linear unbiased predictors (BLUP) for the small area mean or total. BLUP estimators minimize the MSE among the class of linear unbiased estimators. Alternatively, it can be shown that the BLUP estimator can be obtained by solving mixed model equations with unknowns given by the fixed and random parameters of the model. The mixed model equations result from the maximization of the joint density of the data and the vector of random small area effects. A BLUP estimator depends on the variances (and covariances) of random effects which can be estimated by the Henderson method of fitting constants (FC), the maximum likelihood (ML) or restricted maximum likelihood (REML). Using these estimated components in the BLUP estimator leads to a two-stage estimator referred to as the empirical best linear unbiased predictor (EBLUP).

A potential difficulty with EBLUP estimators is that when they are aggregated over all the small areas, they may not agree with the overall estimate for a larger area obtained via direct estimation. Statistical agencies favor an overall agreement between the sum of the model-based small area estimates and the direct estimate at a higher level that corresponds to the union of the small areas. Benchmarking is a method of modifying the model-based estimates to agree with the direct estimator for the larger area.

Existing benchmarking methods are either frequentist or Bayesian. In this paper, we focus on the frequentist approach to benchmarking (for Bayesian benchmarking procedures, see You, Rao and Dick, 2004; Datta, Ghosh, Steorts and Maples, 2011 and Nandram and Sayit, 2011). The frequentist methods can be applied to obtain benchmark small area estimates for both the area-level and unit-level models.

We briefly summarize the existing literature for both types of models. We first describe the procedures developed to benchmark area-level based estimates. Pfeffermann and Barnard (1991) obtained a constrained benchmarked estimator by maximizing the joint density of the data and the vector of random small area effects given the benchmark restriction. Their benchmark estimator was constructed with modified estimates of fixed and small area effects that are solutions to the constrained maximization problem. Wang, Fuller and Qu (2008) developed a benchmarked EBLUP for the FH area-level model, by minimizing a loss function subject to the constraint given by the benchmark condition. They obtained a second benchmarked estimator by adding a suitable auxiliary variable to the FH model without imposing a constraint. They showed that the EBLUP estimator based on the augmented FH model is self-benchmarked: the estimator satisfied the benchmark condition without further adjustments. Bell, Datta and Ghosh (2013) generalized the result in Wang et al. (2008) to the case of multiple benchmark constraints by considering a more general loss function. You, Rao and Hidiroglou (2013) obtained another self-benchmarked estimator under the FH model by replacing the regression vector used in the EBLUP estimator with an alternative estimator that depends on the benchmarking weights.

We now turn to procedures that benchmark unit-level model-based estimates. The objective is to obtain small area estimators that benchmark to a direct estimator at a given level of aggregation of the small areas. The direct estimators that are mostly used by Statistical agencies are the Generalized Regression Estimator (GREG) in Särndal, Swensson and Wretman (1989) or more generally the calibration estimator based on procedures in Deville and Särndal (1992). You and Rao (2002) developed a pseudo-EBLUP predictor (YR predictor) that incorporates survey weights. A property of this estimator is that it is self-benchmarked, that is, the sum of the small area estimates adds up to an estimator that has the same form as the GREG. However, it is not a direct estimator because the estimated regression vector that is part of this estimator reflects the error structure of the nested error model. Assuming that the sampling rates are negligible, Stefan and Hidiroglou (2020) proposed several procedures to ensure that the EBLUP and pseudo-EBLUP estimators would benchmark to the GREG estimator, given that both the model and the GREG estimator used the same vector of auxiliary variables. Ugarte, Militino and Goicoa (2009) developed a restricted EBLUP estimator for a small area total that satisfies the benchmarking property to a synthetic estimator.

The objective of this paper is to compare several benchmarked estimators of a small area mean for the basic unit level model when the sampling rates are non‑negligible. We compare six benchmarked estimators: two benchmarked estimators based on the procedures proposed by Stefan and Hidiroglou (2020), two restricted estimators based on the procedure proposed by Ugarte et al. (2009) and two ratio estimators obtained by multiplying each small area EBLUP and YR estimators by a common adjustment factor. The paper is organized as follows. Section 2 presents a summary of EBLUP and pseudo-EBLUP estimators under the basic unit-level model. Section 3 describes the six benchmarked estimators. The first two estimators are based on simple ratio adjustments. Then, we show how the two benchmarking procedures proposed by Stefan and Hidiroglou (2020) in the case of negligible sampling rates can be adapted to produce benchmarked small area mean estimators when the sampling rates are non‑negligible. Finally, we describe the restricted EBLUP estimator of Ugarte et al. (2009) and propose a pseudo restricted estimator which is a variant of the restricted EBLUP that incorporates survey weights. We also propose a re-parameterized restricted maximum likelihood (reREML) method for estimating the variance components. This method of estimation is useful when computing restricted EBLUP small area mean estimators as it results in strictly positive variance components estimates. Section 4 presents the results of a Monte Carlo simulation based on generated data sets, whereas Section 5 reports the results of a simulation study based on a real data set. Finally, Section 6 gives some concluding remarks.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2021-06-24

Language selection

Search and menus

Search

Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible
Section 1. Introduction

Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible Section 1. Introduction

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible
Section 1. Introduction