Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models
Section 1. Introduction

Table of contents

Official statistics produced by national statistical institutes are generally based on repeated sample surveys. Much of their value lies in their continuity, enabling developments in society and the economy to be monitored, and policy actions decided. Survey samples contain besides sampling errors different sources of non-sampling errors that have a systematic effect on the outcomes of a survey. As long as the survey process is kept constant, this bias component is not visible. This is often an argument to keep survey processes of repeated surveys unchanged as long as possible. From time to time changes in surveys are needed to improve the efficiency, reduce the survey related costs, or meet new requirements, and this is seen strongly in the use of mixed-mode surveys including web-based questionnaires in official statistics. A redesign of the survey process generally has systematic effects on the survey estimates, since the biases induced by the aforementioned non-sampling errors are changed, disturbing comparability with figures published in the past.

Systematic differences in the outcomes of a repeated survey due to redesign of the survey process are called discontinuities. To avoid the implementation of a new survey process disturbing the comparability of estimates over time, it is important to quantify these discontinuities. This avoids confounding real change in the parameters of interest with changing measurement bias due to alteration of the survey process.

Several methods to quantify discontinuities are proposed in the literature (van den Brakel, Smith and Compton, 2008). A reliable and straightforward approach is to conduct the old and new approach alongside of each other at the same time for some period of time, further referred to as a parallel run. Ideally this is based on a randomized experiment that can be embedded in the probability sample of the survey (van den Brakel, 2008). In this paper small area estimation methods for estimating domain discontinuities are proposed. We consider the situation where the regular survey, used for the production of official figures, is conducted at the full sample size and is conducted in parallel with an alternative approach. Due to budget limitations, the sample that is assigned to the alternative approach is often not sufficiently large to observe minimum detectable differences at prespecified significance and power levels using standard direct estimators, particularly for sub populations or domains.

To explain the problem addressed in this paper, some notation is introduced. Let $θ_{i}$ denote the real population value of a variable of interest for domain $i .$ Furthermore, ${\hat{y}}_{i}^{r}$ and ${\hat{y}}_{i}^{a}$ denote direct estimates of $θ_{i}$ based on the regular survey and the alternative survey approach, respectively. Since the regular survey is conducted at the regular sample size, ${\hat{y}}_{i}^{r}$ is a reliable direct estimate for $θ_{i},$ at least for the planned domains. Due to the reduced sample size of the new survey in the parallel run ${\hat{y}}_{i}^{a},$ however, will be insufficiently precise. More precise domain estimates with the small sample available under the new approach can be obtained with the Fay-Herriot (FH) model (Fay and Herriot, 1979), which is defined as ${\hat{y}}_{i}^{a} = x_{i}^{t} β + ν_{i} + e_{i}^{a},$ with $x_{i}$ a vector with covariates at the domain level, $β$ the regression coefficients, $ν_{i}$ the random domain effects and $e_{i}^{a}$ the sampling error. To obtain more precise domain estimates for the alternative approach, van den Brakel, Buelens and Boonstra (2016) proposed an hierarchical Bayesian (HB) univariate FH model, where sample estimates of the regular survey are considered as potential auxiliary variables in a model selection procedure. This implies that ${\hat{y}}_{i}^{r}$ is used as a covariate in $x_{i},$ besides the usual covariates that are available from registers or censuses. This results in an area level model, with measurement error (Ybarra and Lohr, 2008). The use of reliable direct estimates observed in the regular survey significantly increased the precision of the domain estimates for the alternative approach conducted at reduced sample size (van den Brakel et al., 2016).

Let ${\tilde{y}}_{i}^{a}$ denote the small area prediction for $θ_{i}$ based on the aforementioned FH model under the small sample assigned to the alternative survey approach. In the approach followed by van den Brakel et al. (2016), point estimates for domain discontinuities are obtained as the difference between the direct estimate obtained with the regular survey and the model based domain prediction obtained under the alternative approach, i.e., ${\tilde{Δ}}_{i} = {\hat{y}}_{i}^{r} - {\tilde{y}}_{i}^{a} .$ The use of the direct estimate of the regular survey as an auxiliary variable in the small domain predictions of the alternative survey, results in strong positive correlations between both estimators, which cannot be ignored when computing the standard errors for the discontinuities. More precisely, $Var ({\tilde{Δ}}_{i}) = Var ({\hat{y}}_{i}^{r}) + MSE ({\tilde{y}}_{i}^{a}) - 2 Cov ({\hat{y}}_{i}^{r}, {\tilde{y}}_{i}^{a}) .$ Since ${\hat{y}}_{i}^{r}$ is also used as a covariate in $x_{i}$ in the FH model for ${\tilde{y}}_{i}^{a},$ $Cov ({\hat{y}}_{i}^{r}, {\tilde{y}}_{i}^{a})$ will be nonzero. To this end, two analytic approximations for the standard errors of the discontinuities are proposed. The first approach combines the design-based variance estimate of the direct estimator of the regular survey $(Var ({\hat{y}}_{i}^{r}))$ with the posterior variance of the HB domain predictions of the alternative survey $(MSE ({\tilde{y}}_{i}^{a}))$ and a design-based estimator for the covariance between both point estimates $(Cov ({\hat{y}}_{i}^{r}, {\tilde{y}}_{i}^{a})) .$ This approach is unstable in the sense that even negative variance estimates occur in the case of strong positive covariance estimates. A related issue is that design-based and model-based variance approximations are combined in one uncertainty measure for the discontinuities. Therefore a second analytic approximation was proposed, where a design-based estimator for the variance of the HB domain predictions $(MSE ({\tilde{y}}_{i}^{a}))$ is derived and combined with the design-based variance for the direct estimator for the regular survey and the design-based covariance between both point estimates.

Several references to design-based mean squared error estimation in small area estimation can be found in the literature. Gonzalez and Waksberg (1973) introduced the concept of an average design-based mean squared error of a set of synthetic estimators and proposed an estimator that, however, can be unstable and take negative values. Marker (1995) proposed a more stable but biased estimator for the design-based mean squared error for small area estimates, which can also take negative values. Lahiri and Pramanik (2019) proposed a design-based estimator that cannot take negative values, following the concepts of an average design-based mean squared error, originally introduced by Gonzalez and Waksberg (1973). Rivest and Belmonte (2000) proposed an estimator for the mean squared error that measures the uncertainty with respect to the sampling design conditional on the random effects of the model and assuming normality of the sampling model. Rao, Rubin-Bleuer and Estevao (2018) and Pfeffermann and Ben-Hur (2018) also propose a model for the design-based mean squared error in small area estimators. Rao et al. (2018) estimate the model parameters through restricted maximum likelihood while Pfeffermann and Ben-Hur (2018) applies a bootstrap method.

The complications with variance estimation of domain discontinuities under a univariate FH model can also be circumvented by setting up a full Bayesian framework for the analysis of the domain discontinuities. Two approaches are proposed in this paper. The first approach is a bivariate FH model to model the direct estimates under the regular and alternative approach simultaneously, i.e., a bivariate area level model for the vector ${({\hat{y}}_{i}^{r}, {\hat{y}}_{i}^{a})}^{t} .$ The random component of this model accounts for the correlation between the domain parameters under the regular and alternative approach. The precision of the estimated discontinuities is improved by increasing the effective sample size within the domains by means of cross-sectional correlations. In addition, a positive correlation between the random domain effects further decreases the standard error of the estimated discontinuities. The second approach uses a univariate FH model for the direct estimates of the discontinuities, i.e., a univariate FH model for ${\hat{Δ}}_{i} = {\hat{y}}_{i}^{r} - {\hat{y}}_{i}^{a} .$ This method is considered as a less complex alternative for the bivariate FH model. It is, however, anticipated that it is harder to construct good prediction models, since the available covariates from registers might be good predictors for the target variables of the sample survey but probably not for systematic differences between the differences of two estimates for the same variable obtained with different survey processes.

The univariate FH model proposed by van den Brakel et al. (2016) was applied to estimate domain discontinuities in five key target variable of the Dutch Crime Victimization Survey (CVS) using data obtained in a parallel run where the regular survey is conducted at the regular sample size and the alternative survey at a sample size that is about one fourth of the regular sample size. In this paper the bivariate FH model and the univariate FH model for the domain discontinuities are also applied to the same redesign of the CVS. The results are compared with the univariate FH model proposed in van den Brakel et al. (2016).

Model selection in this paper is based on a step forward selection procedure that minimizes the WAIC criteria (Watanabe, 2010, 2013). To avoid selecting over-parameterized models, it is proposed to add covariates in a step forward selection procedure only if they decrease the WAIC by more than the standard error of the WAIC. This prevents selection of several covariates that only marginally improves the WAIC, resulting in models that tend to overfit the data.

The FH model (Fay and Herriot, 1979) is frequently applied in the context of small area estimation (Rao and Molina, 2015). FH models are particularly appropriate if auxiliary information is available at the domain level. Datta, Ghosh, Nangia and Natarjan (1996) employed a multivariate FH model fitted in an HB framework to estimate median income. Multivariate FH models fitted in a frequentist framework are considered in Gonzales-Manteiga, Lombardia, Molina, Morales and Santamaria (2008); Benavent and Morales (2016). Several authors provided time-series FH models to use sample information from previous editions of a survey as a form of small area estimation (Rao and Yu, 1994; Datta, Lahiri, Maiti and Lu, 1999; You and Rao, 2000; Estaban, Morales, Perez and Santamaria, 2012; Marhuenda, Molina and Morales, 2013). Pfeffermann and Burck (1990); Pfeffermann and Tiller (2006); van den Brakel and Krieg (2016); Bollineni-Balabay, van den Brakel, Palm and Boonstra (2017) are some examples of FH time-series models casted in a state-space framework. Boonstra and van den Brakel (2019) discuss how FH time series models can be expressed either in a state space frame work and fitted with the Kalman filter or alternatively expressed as time series multilevel models in an hierarchical Bayesian framework, and estimated using a Gibbs sampler.

The paper is structured as follows. In Section 2 the Crime Vicitimization Survey, the redesign and the set up of the parallel run are described. The bivariate FH model is explained in Section 3, including the HB framework and the model selection and evaluation approach. Results are presented in Section 4. The paper ends with a discussion in Section 5.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2021-06-24

Language selection

Search and menus

Search

Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models
Section 1. Introduction

Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models Section 1. Introduction

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models
Section 1. Introduction