Two local diagnostics to evaluate the efficiency of the empirical best predictor under the Fay-Herriot model
Section 1. Introduction
Governments need socioeconomic information at increasingly fine levels of detail. National statistical offices are therefore required to produce statistics for sub-populations that were not identified or could not be taken into account when the survey’s precision objectives were determined. As a result, the number of sampled units for these sub-populations may be too small to ensure good precision of standard design-based direct estimators such as the Horvitz-Thompson estimator or calibration estimators. This type of sub-population, where the sample size is insufficient, is called a small domain (or small area). To remedy the lack of precision of direct estimators for small domains, indirect estimators, or small area estimators, can be used. These small area estimators usually rely on a model such as the Fay-Herriot model (Fay and Herriot, 1979). The Empirical Best (EB) predictor, also called the Empirical Bayes predictor or EB estimator, is a small area estimator frequently used in practice.
Small area estimation methods use statistical models to leverage information from the survey and from auxiliary data sources. The Fay-Herriot model is a linear model that breaks down the parameter of interest of a domain into two terms: the first term is the effect explained by the model and the second term is the model error that can be interpreted as an unexplained and unknown local effect.
Classical statistical tools, such as graphs of model residuals, can be used to assess the validity of the Fay-Herriot model. However, these tools give little indication of the efficiency of an indirect estimate for a particular domain. The model Mean Square Error (MSE) of an indirect estimator can be viewed as a local quality indicator since it varies across domains. The model MSE accounts for the effect explained by the model, but integrates out the unexplained local effect.
The design MSE is an alternative to the model MSE that does not integrate out the unexplained local effect. However, unbiased design MSE estimates of small area estimators tend to be very unstable, particularly for domains with few sampled units (Rivest and Belmonte, 2000; Rao, Rubin-Bleuer and Estevao, 2018; and Pfeffermann and Ben-Hur, 2019). To circumvent this problem, literature suggests taking an average over several domains of the design MSE (Rao and Molina, 2015; and Pfeffermann and Ben-Hur, 2019) as a quality measure. However, many public statistics users are only concerned with their specific domain and do not buy into an overall quality criterion to assess the efficiency of estimates for their domain of interest. This is especially the case when they are convinced that their domain is very specific and that this specificity is not found in the explanatory term of the model, but rather in the error term, i.e., the unexplained local effect.
To address the problem of the instability of unbiased estimators of the design MSE, Rao, Rubin-Bleuer and Estevao (2018) proposed a composite estimator that they evaluated in a simulation study. Their composite estimator consists of taking a weighted average of a model MSE estimator and a design MSE estimator. They achieve greater stability at the cost of an increase in bias. Pfeffermann and Ben-Hur (2019) also proposed a method for estimating the design MSE of a small area estimator. The method is rather complex and relies mainly on the choice of an appropriate model. It is therefore not entirely design-based. Apart from these attempts to estimate the design MSE, to the best of our knowledge, there is no local diagnostic in the literature that can be used to determine whether small area estimation is preferable to direct estimation for a specific domain.
In this paper, a different approach is proposed to compare the efficiency under the design of the EB and direct estimators. We proceed in two steps. First, we determine the unexplained local effect interval that ensures the design MSE of the Best (B) predictor, also called Bayes predictor or B estimator, is smaller than the design MSE of the direct estimator. The second step is to assess whether it is plausible that the unexplained local effect lies within this interval. To this end, two diagnostics are proposed: one based on the conditional distribution of the unexplained local effect given the direct estimate, and a second based on a hypothesis test on the unexplained local effect carried out with respect to the sampling design. We found that, depending on the magnitude of the standardized model residual and a factor associated with the precision of the direct estimate, it is possible to detect whether the B or EB estimators are likely to have a smaller design MSE than that of the direct estimator.
Section 2 presents the Fay-Herriot model and describes how the best predictor (B estimator) of the population parameter of interest is constructed. In Section 3, the model and design MSEs of the direct estimator and best predictor are derived. Section 4 describes the two proposed diagnostics. Section 5 explains how to estimate the model parameters and obtain the empirical best predictor (EB estimator) and the estimators of the diagnostics. Section 6 presents the results of a simulation study using real auxiliary data. A brief conclusion is provided in Section 7.
- Date modified: