Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible
Section 1. Introduction

Small area estimation (SAE) has grown in importance in recent years due to the demand for reliable small area statistics. Direct estimators are used to estimate parameters of interest when the sample size is reasonably large. However, they have large standard errors and coefficients of variation when it comes to applying them to small areas, as the realized sample size will be quite small. It is therefore necessary to use models that borrow strength from other related areas or from past surveys to have stable estimators for these small areas. Model-based estimates typically show a substantial improvement over direct estimates in terms of mean squared error (MSE).

The available theory for small area estimation is based on either area-level or unit-level models, depending on the level of available auxiliary information. Unit-level based methods use the data of the individual units as auxiliary information, whereas area-level based methods use aggregates or means of the data of the units within the small areas. Fay and Herriot (1979), denoted hereafter as the FH model, is the most used area-level model in small area estimation. The one-fold nested error regression model proposed in Battese, Harter and Fuller (1988), also known as the basic unit-level model, is frequently used when unit-level information is available. We denote this model as the BHF model. Both are special cases of a general linear mixed model in SAE (see Rao and Molina, 2015 for an excellent account of the small area estimation).

Small area means or totals are the most frequent linear parameters estimated in SAE. In these cases, the most popular small area method is the use of linear mixed models to derive the best linear unbiased predictors (BLUP) for the small area mean or total. BLUP estimators minimize the MSE among the class of linear unbiased estimators. Alternatively, it can be shown that the BLUP estimator can be obtained by solving mixed model equations with unknowns given by the fixed and random parameters of the model. The mixed model equations result from the maximization of the joint density of the data and the vector of random small area effects. A BLUP estimator depends on the variances (and covariances) of random effects which can be estimated by the Henderson method of fitting constants (FC), the maximum likelihood (ML) or restricted maximum likelihood (REML). Using these estimated components in the BLUP estimator leads to a two-stage estimator referred to as the empirical best linear unbiased predictor (EBLUP).

A potential difficulty with EBLUP estimators is that when they are aggregated over all the small areas, they may not agree with the overall estimate for a larger area obtained via direct estimation. Statistical agencies favor an overall agreement between the sum of the model-based small area estimates and the direct estimate at a higher level that corresponds to the union of the small areas. Benchmarking is a method of modifying the model-based estimates to agree with the direct estimator for the larger area.

Existing benchmarking methods are either frequentist or Bayesian. In this paper, we focus on the frequentist approach to benchmarking (for Bayesian benchmarking procedures, see You, Rao and Dick, 2004; Datta, Ghosh, Steorts and Maples, 2011 and Nandram and Sayit, 2011). The frequentist methods can be applied to obtain benchmark small area estimates for both the area-level and unit-level models.

We briefly summarize the existing literature for both types of models. We first describe the procedures developed to benchmark area-level based estimates. Pfeffermann and Barnard (1991) obtained a constrained benchmarked estimator by maximizing the joint density of the data and the vector of random small area effects given the benchmark restriction. Their benchmark estimator was constructed with modified estimates of fixed and small area effects that are solutions to the constrained maximization problem. Wang, Fuller and Qu (2008) developed a benchmarked EBLUP for the FH area-level model, by minimizing a loss function subject to the constraint given by the benchmark condition. They obtained a second benchmarked estimator by adding a suitable auxiliary variable to the FH model without imposing a constraint. They showed that the EBLUP estimator based on the augmented FH model is self-benchmarked: the estimator satisfied the benchmark condition without further adjustments. Bell, Datta and Ghosh (2013) generalized the result in Wang et al. (2008) to the case of multiple benchmark constraints by considering a more general loss function. You, Rao and Hidiroglou (2013) obtained another self-benchmarked estimator under the FH model by replacing the regression vector used in the EBLUP estimator with an alternative estimator that depends on the benchmarking weights.

We now turn to procedures that benchmark unit-level model-based estimates. The objective is to obtain small area estimators that benchmark to a direct estimator at a given level of aggregation of the small areas. The direct estimators that are mostly used by Statistical agencies are the Generalized Regression Estimator (GREG) in Särndal, Swensson and Wretman (1989) or more generally the calibration estimator based on procedures in Deville and Särndal (1992). You and Rao (2002) developed a pseudo-EBLUP predictor (YR predictor) that incorporates survey weights. A property of this estimator is that it is self-benchmarked, that is, the sum of the small area estimates adds up to an estimator that has the same form as the GREG. However, it is not a direct estimator because the estimated regression vector that is part of this estimator reflects the error structure of the nested error model. Assuming that the sampling rates are negligible, Stefan and Hidiroglou (2020) proposed several procedures to ensure that the EBLUP and pseudo-EBLUP estimators would benchmark to the GREG estimator, given that both the model and the GREG estimator used the same vector of auxiliary variables. Ugarte, Militino and Goicoa (2009) developed a restricted EBLUP estimator for a small area total that satisfies the benchmarking property to a synthetic estimator.

The objective of this paper is to compare several benchmarked estimators of a small area mean for the basic unit level model when the sampling rates are non‑negligible. We compare six benchmarked estimators: two benchmarked estimators based on the procedures proposed by Stefan and Hidiroglou (2020), two restricted estimators based on the procedure proposed by Ugarte et al. (2009) and two ratio estimators obtained by multiplying each small area EBLUP and YR estimators by a common adjustment factor. The paper is organized as follows. Section 2 presents a summary of EBLUP and pseudo-EBLUP estimators under the basic unit-level model. Section 3 describes the six benchmarked estimators. The first two estimators are based on simple ratio adjustments. Then, we show how the two benchmarking procedures proposed by Stefan and Hidiroglou (2020) in the case of negligible sampling rates can be adapted to produce benchmarked small area mean estimators when the sampling rates are non‑negligible. Finally, we describe the restricted EBLUP estimator of Ugarte et al. (2009) and propose a pseudo restricted estimator which is a variant of the restricted EBLUP that incorporates survey weights. We also propose a re-parameterized restricted maximum likelihood (reREML) method for estimating the variance components. This method of estimation is useful when computing restricted EBLUP small area mean estimators as it results in strictly positive variance components estimates. Section 4 presents the results of a Monte Carlo simulation based on generated data sets, whereas Section 5 reports the results of a simulation study based on a real data set. Finally, Section 6 gives some concluding remarks.


Date modified: