Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models
Section 1. Introduction

Official statistics produced by national statistical institutes are generally based on repeated sample surveys. Much of their value lies in their continuity, enabling developments in society and the economy to be monitored, and policy actions decided. Survey samples contain besides sampling errors different sources of non-sampling errors that have a systematic effect on the outcomes of a survey. As long as the survey process is kept constant, this bias component is not visible. This is often an argument to keep survey processes of repeated surveys unchanged as long as possible. From time to time changes in surveys are needed to improve the efficiency, reduce the survey related costs, or meet new requirements, and this is seen strongly in the use of mixed-mode surveys including web-based questionnaires in official statistics. A redesign of the survey process generally has systematic effects on the survey estimates, since the biases induced by the aforementioned non-sampling errors are changed, disturbing comparability with figures published in the past.

Systematic differences in the outcomes of a repeated survey due to redesign of the survey process are called discontinuities. To avoid the implementation of a new survey process disturbing the comparability of estimates over time, it is important to quantify these discontinuities. This avoids confounding real change in the parameters of interest with changing measurement bias due to alteration of the survey process.

Several methods to quantify discontinuities are proposed in the literature (van den Brakel, Smith and Compton, 2008). A reliable and straightforward approach is to conduct the old and new approach alongside of each other at the same time for some period of time, further referred to as a parallel run. Ideally this is based on a randomized experiment that can be embedded in the probability sample of the survey (van den Brakel, 2008). In this paper small area estimation methods for estimating domain discontinuities are proposed. We consider the situation where the regular survey, used for the production of official figures, is conducted at the full sample size and is conducted in parallel with an alternative approach. Due to budget limitations, the sample that is assigned to the alternative approach is often not sufficiently large to observe minimum detectable differences at prespecified significance and power levels using standard direct estimators, particularly for sub populations or domains.

To explain the problem addressed in this paper, some notation is introduced. Let θ i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacqaH4oqCdaWgaaWcbaGaamyAaaqaba aaaa@335B@ denote the real population value of a variable of interest for domain i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacaWGPbGaaiOlaaaa@322B@ Furthermore, y ^ i r MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaaceWG5bGbaKaadaqhaaWcbaGaamyAaa qaaiaadkhaaaaaaa@33AB@ and y ^ i a MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaaceWG5bGbaKaadaqhaaWcbaGaamyAaa qaaiaadggaaaaaaa@339A@ denote direct estimates of θ i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacqaH4oqCdaWgaaWcbaGaamyAaaqaba aaaa@335B@ based on the regular survey and the alternative survey approach, respectively. Since the regular survey is conducted at the regular sample size, y ^ i r MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaaceWG5bGbaKaadaqhaaWcbaGaamyAaa qaaiaadkhaaaaaaa@33AB@ is a reliable direct estimate for θ i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacqaH4oqCdaWgaaWcbaGaamyAaaqaba GccaGGSaaaaa@3415@ at least for the planned domains. Due to the reduced sample size of the new survey in the parallel run y ^ i a , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaaceWG5bGbaKaadaqhaaWcbaGaamyAaa qaaiaadggaaaGccaGGSaaaaa@3454@ however, will be insufficiently precise. More precise domain estimates with the small sample available under the new approach can be obtained with the Fay-Herriot (FH) model (Fay and Herriot, 1979), which is defined as y ^ i a = x i t β + ν i + e i a , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaaceWG5bGbaKaadaqhaaWcbaGaamyAaa qaaiaadggaaaGccaaMe8UaaGypaiaaysW7caWH4bWaa0baaSqaaiaa dMgaaeaacaWG0baaaOGaeqOSdiMaaGjbVlabgUcaRiaaysW7cqaH9o GBdaWgaaWcbaGaamyAaaqabaGccaaMe8Uaey4kaSIaaGjbVlaadwga daqhaaWcbaGaamyAaaqaaiaadggaaaGccaGGSaaaaa@4ABE@ with x i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacaWH4bWaaSbaaSqaaiaadMgaaeqaaa aa@32A6@ a vector with covariates at the domain level, β MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacqaHYoGyaaa@322C@ the regression coefficients, ν i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacqaH9oGBdaWgaaWcbaGaamyAaaqaba aaaa@335C@ the random domain effects and e i a MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacaWGLbWaa0baaSqaaiaadMgaaeaaca WGHbaaaaaa@3375@ the sampling error. To obtain more precise domain estimates for the alternative approach, van den Brakel, Buelens and Boonstra (2016) proposed an hierarchical Bayesian (HB) univariate FH model, where sample estimates of the regular survey are considered as potential auxiliary variables in a model selection procedure. This implies that y ^ i r MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaaceWG5bGbaKaadaqhaaWcbaGaamyAaa qaaiaadkhaaaaaaa@33AB@ is used as a covariate in x i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacaWH4bWaaSbaaSqaaiaadMgaaeqaaO Gaaiilaaaa@3360@ besides the usual covariates that are available from registers or censuses. This results in an area level model, with measurement error (Ybarra and Lohr, 2008). The use of reliable direct estimates observed in the regular survey significantly increased the precision of the domain estimates for the alternative approach conducted at reduced sample size (van den Brakel et al., 2016).

Let y ˜ i a MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaaceWG5bGbaGaadaqhaaWcbaGaamyAaa qaaiaadggaaaaaaa@3399@ denote the small area prediction for θ i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacqaH4oqCdaWgaaWcbaGaamyAaaqaba aaaa@335B@ based on the aforementioned FH model under the small sample assigned to the alternative survey approach. In the approach followed by van den Brakel et al. (2016), point estimates for domain discontinuities are obtained as the difference between the direct estimate obtained with the regular survey and the model based domain prediction obtained under the alternative approach, i.e., Δ ˜ i = y ^ i r y ˜ i a . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacuqHuoargaacamaaBaaaleaacaWGPb aabeaakiaaysW7caaI9aGaaGjbVlqadMhagaqcamaaDaaaleaacaWG PbaabaGaamOCaaaakiaaysW7cqGHsislcaaMe8UabmyEayaaiaWaa0 baaSqaaiaadMgaaeaacaWGHbaaaOGaaiOlaaaa@4200@ The use of the direct estimate of the regular survey as an auxiliary variable in the small domain predictions of the alternative survey, results in strong positive correlations between both estimators, which cannot be ignored when computing the standard errors for the discontinuities. More precisely, Var ( Δ ˜ i ) = Var ( y ^ i r ) + MSE ( y ˜ i a ) 2 Cov ( y ^ i r , y ˜ i a ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacaqGwbGaaeyyaiaabkhadaqadeqaai qbfs5aezaaiaWaaSbaaSqaaiaadMgaaeqaaaGccaGLOaGaayzkaaGa aGjbVlaai2dacaaMe8UaaeOvaiaabggacaqGYbWaaeWabeaaceWG5b GbaKaadaqhaaWcbaGaamyAaaqaaiaadkhaaaaakiaawIcacaGLPaaa caaMe8Uaey4kaSIaaGjbVlaab2eacaqGtbGaaeyramaabmqabaGabm yEayaaiaWaa0baaSqaaiaadMgaaeaacaWGHbaaaaGccaGLOaGaayzk aaGaaGjbVlabgkHiTiaaysW7caaIYaGaae4qaiaab+gacaqG2bWaae WabeaaceWG5bGbaKaadaqhaaWcbaGaamyAaaqaaiaadkhaaaGccaaI SaGaaGjbVlqadMhagaacamaaDaaaleaacaWGPbaabaGaamyyaaaaaO GaayjkaiaawMcaaiaac6caaaa@5FE8@ Since y ^ i r MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaaceWG5bGbaKaadaqhaaWcbaGaamyAaa qaaiaadkhaaaaaaa@33AB@ is also used as a covariate in x i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacaWH4bWaaSbaaSqaaiaadMgaaeqaaa aa@32A6@ in the FH model for y ˜ i a , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaaceWG5bGbaGaadaqhaaWcbaGaamyAaa qaaiaadggaaaGccaGGSaaaaa@3453@ Cov ( y ^ i r , y ˜ i a ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacaqGdbGaae4BaiaabAhadaqadeqaai qadMhagaqcamaaDaaaleaacaWGPbaabaGaamOCaaaakiaaiYcacaaM e8UabmyEayaaiaWaa0baaSqaaiaadMgaaeaacaWGHbaaaaGccaGLOa Gaayzkaaaaaa@3D4B@ will be nonzero. To this end, two analytic approximations for the standard errors of the discontinuities are proposed. The first approach combines the design-based variance estimate of the direct estimator of the regular survey ( Var ( y ^ i r ) ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaadaqadeqaaiaabAfacaqGHbGaaeOCam aabmqabaGabmyEayaajaWaa0baaSqaaiaadMgaaeaacaWGYbaaaaGc caGLOaGaayzkaaaacaGLOaGaayzkaaaaaa@397B@ with the posterior variance of the HB domain predictions of the alternative survey ( MSE ( y ˜ i a ) ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaadaqadeqaaiaab2eacaqGtbGaaeyram aabmqabaGabmyEayaaiaWaa0baaSqaaiaadMgaaeaacaWGHbaaaaGc caGLOaGaayzkaaaacaGLOaGaayzkaaaaaa@3925@ and a design-based estimator for the covariance between both point estimates ( Cov ( y ^ i r , y ˜ i a ) ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaadaqadeqaaiaaboeacaqGVbGaaeODam aabmqabaGabmyEayaajaWaa0baaSqaaiaadMgaaeaacaWGYbaaaOGa aGilaiaaysW7ceWG5bGbaGaadaqhaaWcbaGaamyAaaqaaiaadggaaa aakiaawIcacaGLPaaaaiaawIcacaGLPaaacaGGUaaaaa@3F87@ This approach is unstable in the sense that even negative variance estimates occur in the case of strong positive covariance estimates. A related issue is that design-based and model-based variance approximations are combined in one uncertainty measure for the discontinuities. Therefore a second analytic approximation was proposed, where a design-based estimator for the variance of the HB domain predictions ( MSE ( y ˜ i a ) ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaadaqadeqaaiaab2eacaqGtbGaaeyram aabmqabaGabmyEayaaiaWaa0baaSqaaiaadMgaaeaacaWGHbaaaaGc caGLOaGaayzkaaaacaGLOaGaayzkaaaaaa@3925@ is derived and combined with the design-based variance for the direct estimator for the regular survey and the design-based covariance between both point estimates.

Several references to design-based mean squared error estimation in small area estimation can be found in the literature. Gonzalez and Waksberg (1973) introduced the concept of an average design-based mean squared error of a set of synthetic estimators and proposed an estimator that, however, can be unstable and take negative values. Marker (1995) proposed a more stable but biased estimator for the design-based mean squared error for small area estimates, which can also take negative values. Lahiri and Pramanik (2019) proposed a design-based estimator that cannot take negative values, following the concepts of an average design-based mean squared error, originally introduced by Gonzalez and Waksberg (1973). Rivest and Belmonte (2000) proposed an estimator for the mean squared error that measures the uncertainty with respect to the sampling design conditional on the random effects of the model and assuming normality of the sampling model. Rao, Rubin-Bleuer and Estevao (2018) and Pfeffermann and Ben-Hur (2018) also propose a model for the design-based mean squared error in small area estimators. Rao et al. (2018) estimate the model parameters through restricted maximum likelihood while Pfeffermann and Ben-Hur (2018) applies a bootstrap method.

The complications with variance estimation of domain discontinuities under a univariate FH model can also be circumvented by setting up a full Bayesian framework for the analysis of the domain discontinuities. Two approaches are proposed in this paper. The first approach is a bivariate FH model to model the direct estimates under the regular and alternative approach simultaneously, i.e., a bivariate area level model for the vector ( y ^ i r , y ^ i a ) t . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaadaqadeqaaiqadMhagaqcamaaDaaale aacaWGPbaabaGaamOCaaaakiaaiYcacaaMe8UabmyEayaajaWaa0ba aSqaaiaadMgaaeaacaWGHbaaaaGccaGLOaGaayzkaaWaaWbaaSqabe aacaWG0baaaOGaaiOlaaaa@3C7D@ The random component of this model accounts for the correlation between the domain parameters under the regular and alternative approach. The precision of the estimated discontinuities is improved by increasing the effective sample size within the domains by means of cross-sectional correlations. In addition, a positive correlation between the random domain effects further decreases the standard error of the estimated discontinuities. The second approach uses a univariate FH model for the direct estimates of the discontinuities, i.e., a univariate FH model for Δ ^ i = y ^ i r y ^ i a . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8qsFeeaY=Hhbbf9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqabeaadaaakeaacuqHuoargaqcamaaBaaaleaacaWGPb aabeaakiaaysW7caaI9aGaaGjbVlqadMhagaqcamaaDaaaleaacaWG PbaabaGaamOCaaaakiaaysW7cqGHsislcaaMe8UabmyEayaajaWaa0 baaSqaaiaadMgaaeaacaWGHbaaaOGaaiOlaaaa@4202@ This method is considered as a less complex alternative for the bivariate FH model. It is, however, anticipated that it is harder to construct good prediction models, since the available covariates from registers might be good predictors for the target variables of the sample survey but probably not for systematic differences between the differences of two estimates for the same variable obtained with different survey processes.

The univariate FH model proposed by van den Brakel et al. (2016) was applied to estimate domain discontinuities in five key target variable of the Dutch Crime Victimization Survey (CVS) using data obtained in a parallel run where the regular survey is conducted at the regular sample size and the alternative survey at a sample size that is about one fourth of the regular sample size. In this paper the bivariate FH model and the univariate FH model for the domain discontinuities are also applied to the same redesign of the CVS. The results are compared with the univariate FH model proposed in van den Brakel et al. (2016).

Model selection in this paper is based on a step forward selection procedure that minimizes the WAIC criteria (Watanabe, 2010, 2013). To avoid selecting over-parameterized models, it is proposed to add covariates in a step forward selection procedure only if they decrease the WAIC by more than the standard error of the WAIC. This prevents selection of several covariates that only marginally improves the WAIC, resulting in models that tend to overfit the data.

The FH model (Fay and Herriot, 1979) is frequently applied in the context of small area estimation (Rao and Molina, 2015). FH models are particularly appropriate if auxiliary information is available at the domain level. Datta, Ghosh, Nangia and Natarjan (1996) employed a multivariate FH model fitted in an HB framework to estimate median income. Multivariate FH models fitted in a frequentist framework are considered in Gonzales-Manteiga, Lombardia, Molina, Morales and Santamaria (2008); Benavent and Morales (2016). Several authors provided time-series FH models to use sample information from previous editions of a survey as a form of small area estimation (Rao and Yu, 1994; Datta, Lahiri, Maiti and Lu, 1999; You and Rao, 2000; Estaban, Morales, Perez and Santamaria, 2012; Marhuenda, Molina and Morales, 2013). Pfeffermann and Burck (1990); Pfeffermann and Tiller (2006); van den Brakel and Krieg (2016); Bollineni-Balabay, van den Brakel, Palm and Boonstra (2017) are some examples of FH time-series models casted in a state-space framework. Boonstra and van den Brakel (2019) discuss how FH time series models can be expressed either in a state space frame work and fitted with the Kalman filter or alternatively expressed as time series multilevel models in an hierarchical Bayesian framework, and estimated using a Gibbs sampler.

The paper is structured as follows. In Section 2 the Crime Vicitimization Survey, the redesign and the set up of the parallel run are described. The bivariate FH model is explained in Section 3, including the HB framework and the model selection and evaluation approach. Results are presented in Section 4. The paper ends with a discussion in Section 5.


Date modified: