Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models
Section 3. Methods

3.1 Small area estimation for domain discontinuities

Testing hypotheses about differences between estimates of a finite population parameter observed under different survey processes implies the existence of measurement errors. Therefore a measurement error model is required to explain systematic differences between survey estimates for the same population parameter observed under two different survey approaches. Let $θ_{i}$ denote the population parameter of domain $i = 1, \dots, m .$ Let $y_{i}^{r}$ and $y_{i}^{a}$ denote the observed value for $θ_{i}$ in the case of a complete enumeration under the regular approach and alternative approach, respectively. Direct estimates for $y_{i}^{r}$ and $y_{i}^{a}$ are obtained with the GREG estimator based on the samples assigned to the regular and alternative survey and are denoted as ${\hat{y}}_{i}^{r}$ and ${\hat{y}}_{i}^{a}$ respectively.

The relation between the observed values under a complete enumeration and the real population parameter is: $y_{i}^{q} = θ_{i} + λ_{i}^{q},$ $i = 1, \dots, m,$ $q = r, a,$ with $λ_{i}^{q}$ the real measurement bias if $θ_{i}$ is measured with survey approach $q .$ Without any external information, it is not possible to estimate $λ_{i}^{q} .$ From the sample data only the relative bias, say $Δ_{i} = y_{i}^{r} - y_{i}^{a} = λ_{i}^{r} - λ_{i}^{a}$ is identifiable. Direct estimates for these discontinuities are obtained from the survey data as the contrast between the GREG estimates, i.e., ${\hat{Δ}}_{i} = {\hat{y}}_{i}^{r} - {\hat{y}}_{i}^{a} .$

In the case of the Dutch CVS the sample size of the regular survey is large enough to obtain sufficiently precise direct estimates for the planned domains, since the sample is designed to publish official statistics for these domains. The sample assigned to the alternative survey for the parallel run has only a size of one third of the regular sample size, which is insufficient to obtain precise direct estimates for the planned domains. In an earlier paper (van den Brakel et al., 2016) univariate FH models were developed to obtain more precise predictions for the domain parameters observed with the small sample size assigned to the alternative survey approach using auxiliary variables derived from three different sources. The first source contains demographic variables derived from the Municipal Basis Administration (MBA), which is an administration of all people residing in the Netherlands. The second source contains related variables available in the Police Register of Reported Offences (PRRO). The third source, which is unique in the case of a parallel run, contains direct estimates for the same variables observed under the regular survey, which are sufficiently precise at least for the planned domains like police districts. The direct estimates from the regular survey are often selected as auxiliary variables for these univariate FH models. This comes not as a surprise since these are survey estimates for the same population parameters. Although measured with a different survey process, strong positive correlations can be expected. Strong improvements of the precision of small domain prediction are indeed found if the set of potential auxiliary variables, i.e., from MBA and PRRO, is extended with the direct estimates from the regular CVS.

In this application the sampling error in the auxiliary variables that come from the regular CVS can be ignored in the FH model, since the sample size and therefore the sampling error for these domains is more or less equal for the domains (Ybarra and Lohr, 2008). This implies that the variance component of the random domain effects is inflated with the sampling error of the auxiliary variables, which is fine as long as the sampling error does not differ between domains. In most applications this is not the case and the methods proposed by Ybarra and Lohr (2008) should be used to account for sampling error in the auxiliary variables.

FH multilevel models can be fitted under a frequentist approach using EBLUP or under an HB approach (Rao and Molina, 2015). In van den Brakel et al. (2016) the HB approach is preferred over the EBLUP, since the strong auxiliary information in the fixed effect part of the model often results in zero estimates for the variance component of the random domain effects, giving too much weight to the synthetic regression part and too little weight to the direct estimates in the EBLUP, (Bell, 1999; Rao and Molina, 2015). This problem can also be overcome with adjusted maximum likelihood estimation, see e.g., Li and Lahiri (2010) and Hirose and Lahiri (2018).

Let ${\tilde{y}}_{i}^{a}$ denote the HB prediction for domain $i$ under the alternative approach. Now domain discontinuites are obtained by ${\tilde{Δ}}_{i} = {\hat{y}}_{i}^{r} - {\tilde{y}}_{i}^{a} .$ Using direct estimates of the regular survey as auxiliary variables in the fixed part of the FH model for the alternative survey considerably increases the complexity of the variance estimation for the discontinuities. The variance of ${\tilde{Δ}}_{i}$ can be expressed as $Var ({\tilde{Δ}}_{i}) = Var ({\hat{y}}_{i}^{r}) + MSE ({\tilde{y}}_{i}^{a}) - 2 cov ({\hat{y}}_{i}^{r}, {\tilde{y}}_{i}^{a}) .$ The use of ${\hat{y}}_{i}^{r}$ or related sample estimates as auxiliary variables to predict ${\tilde{y}}_{i}^{a},$ results in non-zero values for $cov ({\hat{y}}_{i}^{r}, {\tilde{y}}_{i}^{a})$ that cannot be ignored. To approximate $Var ({\tilde{Δ}}_{i}),$ van den Brakel et al. (2016) proposed an approximately design-unbiased estimator for $cov ({\hat{y}}_{i}^{r}, {\tilde{y}}_{i}^{a})$ and $Var ({\hat{y}}_{i}^{r}),$ while the $MSE ({\tilde{y}}_{i}^{a})$ is approximated with the posterior variance of the HB domain predictions. A major disadvantage of this approach is that model-based and design-based uncertainty measures are intertwined. On the one hand, the MSE’s for ${\tilde{y}}_{i}^{a}$ are approximated with their posterior variances. On the other hand, the covariances between ${\hat{y}}_{i}^{r}$ and ${\tilde{y}}_{i}^{a}$ are approximated from a design-based perspective. Consequently, naive application of this approach may give negative variance estimates for the discontinuities. This drawback has been solved using a design-based approximation for the $MSE ({\tilde{y}}_{i}^{a}),$ resulting in a fully design-based approximation for the uncertainty of the estimated domain discontinuities.

In this paper a full HB framework for estimating domain discontinuities is proposed as an alternative by developing a bivariate FH model for the domain parameters observed under both the regular and alternative approach. The advantage of this approach is that it improves the precision of both the direct estimates of the regular and alternative domain estimates by borrowing strength from other domains and both surveys. Negative variance estimates for the estimated domain discontinuities are precluded by definition under this multivariate HB framework. This method is compared with a simple alternative, namely a univariate FH model for the direct estimates of the discontinuities. As mentioned in the introduction, it is anticipated that it might be hard to find covariates in the available registers that explain the discontinuities. An advantage of both models is that they avoid the complications of accounting for sampling error in the auxiliary variables, which is necessary if the survey estimates of the regular survey are used as covariates in univariate FH models and the sampling error differs between domains.

3.2 Bivariate Fay-Herriot model

A bivariate version of the FH model (Fay and Herriot, 1979) starts with a measurement model for the two GREG estimates observed in each domain:

${\hat{y}}_{i} = y_{i} + e_{i}, i = 1, \dots, m, (3.1)$

with $y_{i} = {(y_{i}^{r}, y_{i}^{a})}^{t},$ ${\hat{y}}_{i}$ a vector containing the GREG estimates of $y_{i}$ and $e_{i} = {(e_{i}^{r}, e_{i}^{a})}^{t}$ a vector with the sampling errors of ${\hat{y}}_{i}$ for which it is assumed that

$e_{i} \overset{ind}{~} N (0_{2}, Ψ_{i}), i = 1, \dots, m . (3.2)$

Here $0_{2}$ is a 2 dimensional column vector with each element equal to zero. Since the sample for the regular and alternative survey are drawn independently, it is assumed that $Ψ_{i} = Diag (ψ_{i}^{r}, ψ_{i}^{a})$ where $ψ_{i}^{q}$ is the design variance of ${\hat{y}}_{i}^{q} .$ It is also assumed that these design variances are known although they are replaced by their estimates in practice. The true domain parameters are modelled with a multilevel model. For the fixed effects it is assumed that the regular and alternative approach share the same covariates. In the most general case the regression coefficients for the fixed part are different for both variables i.e., $y_{i}^{q} = x_{i}^{t} β^{q} + ν_{i}^{q},$ with $x_{i} = {(x_{i 1}, \dots, x_{i p})}^{t}$ a $p$ -vector with covariates of domain $i,$ $β^{q}$ a $p$ -vector of regression coefficients, which are equal over the domains but might be different between the two survey approaches. It is assumed that $x_{i 1} = 1$ corresponds to the intercept. Furthermore $ν_{i}^{q}$ are random domain effects. This gives rise to the following bivariate multilevel model for the two domain parameters:

$y_{i} = X_{i} β + ν_{i}, i = 1, \dots, m, (3.3)$

where $X_{i} = I_{2} \otimes x_{i}^{t},$ $β = {(β^{r^{t}}, β^{a^{t}})}^{t},$ $I_{2}$ a 2 dimensional identity matrix, and $ν_{i} = {(ν_{i}^{r}, ν_{i}^{a})}^{t} .$ For the random domain effects it is assumed that

$ν_{i} \overset{IID}{~} N (0_{2}, Σ), i = 1, \dots, m, (3.4)$

with $Σ$ a general $2 \times 2$ covariance matrix for the random domain effects. Inserting (3.3) into (3.1) gives:

${\hat{y}}_{i} = X_{i} β + ν_{i} + e_{i}, i = 1, \dots, m, (3.5)$

with model assumptions (3.2) and (3.4).

Since the number of domains in this application is small, it is important to select parsimonious models. One way to reduce model complexity is to assume that the regression coefficients are equal for both survey approaches. In this case a dummy indicator, say $δ_{i},$ is introduced, which is equal to zero for the regular survey and equal to one for the alternative survey. In this case $x_{i 1} = 1$ corresponds to the overall intercept and $x_{i 2} = δ_{i}$ is the indicator whose coefficient measures the differences between intercepts of the variables observed under both surveys. So $y_{i}^{q} = x_{i}^{t} β + ν_{i}^{q}$ and in (3.3) $X_{i} = x_{i}^{t},$ and $β$ a vector with the corresponding regression coefficients. As a result, two versions for the fixed effects are considered:

FE_uq: A fixed effect model where the regular and alternative approach share the same covariates, but have different regression coefficients. In this case, domain discontinuities are given by

$Δ_{i} = \sum_{j =1}^{p} x_{i, j} (β_{j}^{r} - β_{j}^{a}) + (ν_{i}^{r} - ν_{i}^{a}) . (3.6)$

FE_eq: A more parsimonious version for the fixed effect component by assuming that the regression coefficients are equal for the regular and alternative approach. In this case domain discontinuities are given by

$Δ_{i} = - β_{2} + (ν_{i}^{r} - ν_{i}^{a}), (3.7)$

with $β_{2}$ the regression coefficient for $x_{i 2} = δ_{i} .$

The following covariance structures for the random domain effects are considered:

RE_f: A full covariance matrix $Σ$ for the random domain effects. Positive correlation between the random domain effects will further increase the precision of the estimates for the domain discontinuities since domain estimates borrow strength not only from different domains but also across the two surveys.
RE_d: A diagonal covariance matrix with separate variances for the regular and alternative approach, i.e.: $Σ = Diag (σ_{r}^{2}, σ_{a}^{2}) .$ This covariance structure in combination with model FE_uq comes down to applying a univariate FH model to both surveys separately. In this case models only use sample information from other domains within the same survey but not across the two surveys to improve the precision of the estimates for domain discontinuities.
RE_s: A diagonal covariance matrix with equal variances for the regular and alternative approach, i.e.: $Σ = σ^{2} I_{2} .$

3.3 Univariate Fay-Herriot model for domain discontinuities

The univariate FH model for the direct estimates of the discontinuities starts with defining a measurement error model for the GREG estimates of the discontinuities:

${\hat{Δ}}_{i} = Δ_{i} + z_{i} (3.8)$

with $Δ_{i} = y_{i}^{r} - y_{i}^{a}$ the true discontinuity of domain $i$ under a complete enumeration of the population under both approaches, ${\hat{Δ}}_{i} = {\hat{y}}_{i}^{r} - {\hat{y}}_{i}^{a}$ the GREG estimate for $Δ_{i}$ based on the parallel run and $z_{i} = e_{i}^{r} - e_{i}^{a}$ the sampling error of ${\hat{Δ}}_{i} .$ It is assumed that $z_{i}$ $≃$ $N (0, ψ_{i}^{r} + ψ_{i}^{a})$ and that the design variances of the sampling errors are known. For $Δ_{i}$ the following linear model is assumed:

$Δ_{i} = x_{i}^{t} β + v_{i}, (3.9)$

with $v_{i}$ $≃$ $N (0, σ_{v}^{2}) .$ Inserting (3.9) into (3.10) gives:

${\hat{Δ}}_{i} = x_{i}^{t} β + v_{i} + z_{i} . (3.10)$

3.4 Estimation of the bivariate Fay-Herriot model

The models developed in Subsections 3.2 and 3.3 are fitted with a HB approach using Markov Chain Monte Carlo (MCMC) sampling. In particular the Gibbs sampler is used. The following priors are used for the model parameters and hyperparameters. For the regression coefficients uniform improper priors are assumed, i.e., $β ~ 1.$ For the random domain effects a multivariate normal prior is used: $ν | Σ$ $≃$ $N (0_{2 m}, I_{m} \otimes Σ) .$

In the case of a full covariance matrix for the random domain effects, the prior for $Σ$ is taken to be a scaled inverse Wishart distribution (O’Malley and Zaslavsky, 2008). This distribution is obtained by writing $Σ = Diag (ξ) \tilde{Σ} Diag (ξ),$ with $ξ = {(ξ^{r}, ξ^{a})}^{t}$ and assuming a standard normal distribution for $ξ^{r}$ and $ξ^{a},$ i.e., $ξ^{x}$ $≃$ $N (0, 1),$ $(x \in (a, r))$ and an inverse Wishart distribution for $\tilde{Σ},$ i.e., $\tilde{Σ}$ $≃$ $Inv - Wish (v_{v}, Φ_{v}),$ with $v_{v} = d + 1$ degrees of freedom, with $d$ the dimension of $\tilde{Σ}$ which is equal to 2 in this application, and scale parameter $Φ_{v} = I_{2} .$ In the case of a diagonal covariance matrix for the random domain effects, the priors for $σ_{q}$ (in the case of unequal variances) or $σ$ (in the case of equal variances) are half-Cauchy distributions. These are more robust prior distributions than the more commonly used inverse chi-squared distribution (Gelman, 2006). The inverse chi-squared distribution might be informative, even in the case of small scale and shape parameters. In addition convergence problems might occur with the Gibbs sampler. Both problems are largely avoided with a redundant multiplicative parametrization of the random effects (Gelman, 2006; Gelman, Van Dyk, Huang and Boscardin, 2008; Polson and Scott, 2012). See van den Brakel and Boonstra (2018) for more details on the priors of this model.

Let $\hat{y}$ denote the $2 m$ column vector obtained by stacking the $m$ column vectors ${\hat{y}}_{i},$ and $X$ the matrix obtained by stacking the matrices $X_{i} .$ In the case of unequal regression coefficients, $X$ is a $2 m \times 2 p$ matrix. In the case of equal regression coefficients, $X$ is a $2 m \times p$ matrix. The likelihood function can be written as

$p (\hat{y} | θ) = N (X β + ν, Ψ), (3.11)$

with $Ψ = \oplus_{i =1}^{m} Ψ_{i}$ a $2 m \times 2 m$ diagonal matrix with the design variances of the direct estimates $\hat{y}$ and $θ$ a vector containing all model parameters. The joint prior distribution $p ( θ )$ equals the product of the aforementioned priors. The posterior distribution of $θ$ is proportional to the joint density, i.e., $p (θ | \hat{y}) \propto p ( θ ) p (\hat{y} | θ) .$ The model is fitted using the Gibbs sampler Geman and Geman (1984); Gelfand and Smith (1990). The full conditional distributions used in the Gibbs sampler are specified in van den Brakel and Boonstra (2018).

For each model considered, the Gibbs sampler is run in three independent chains with randomly generated starting values. The length of each chain after the burn-in period for each run is 10,000 iterations. This gives 30,000 draws to compute estimates and standard errors. The convergence of the MCMC simulation is assessed using trace and autocorrelation plots as well as the Gelman-Rubin potential scale reduction factor (Gelman and Rubin, 1992), which diagnoses the mixing of the chains. The diagnostics suggest that all chains converge well within 500 draws. The estimated Monte Carlo simulation errors are small compared to the posterior standard errors for all parameters, so that the number of draws are more than sufficient for our purposes.

The estimands of interest are expressed as functions of the parameters, and applying these functions to the MCMC output for the parameters results in draws from the posteriors for these estimands. Domain predictions for the target variables under the bivariate FH model are obtained as the posterior means approximated by the Gibbs sampler output and are denoted as ${\tilde{y}}_{i}^{q, b FH} .$ Domain predictions for the discontinuities are obtained as the posterior means of (3.6) or (3.7) approximated by the Gibbs sampler output and are denoted as ${\tilde{Δ}}_{i}^{b FH} .$ Mean squared errors for ${\tilde{y}}_{i}^{q, b FH}$ and ${\tilde{Δ}}_{i}^{b FH}$ are obtained as posterior variances approximated from the Gibbs sampler output.

The methods are implemented in R using the mcmcsae R-package (Boonstra, 2020).

3.5 Pooling design variances

Estimates for the design variances $ψ_{i}^{r}$ and $ψ_{i}^{a}$ are available from the GREG estimator and are used as if the true design variances are known. This is a standard assumption in small area estimation. Therefore it is important to provide reliable estimates for these design variances. For the regular survey the variance estimates of the GREG estimates are considered to be reliable enough to be used in the FH model. For the alternative survey the estimates of the design variances are unreliable and therefore smoothed to improve their stability of the estimates of $ψ_{i}^{a} .$ Under the assumption that the population variances of the GREG residuals under the alternative approach are equal accross domains, the analysis-of-variance type of pooled variance estimator is used:

$ψ_{i}^{a} = \frac{1 - f_{i}^{a}}{n_{i}^{a}} \frac{1}{n^{a} - m} \sum_{i = 1}^{m} (n_{i}^{a} - 1) S_{i; GREG}^{a^{2}}$

with $f_{i}^{a}$ the sample fraction in domain $i$ of the alternative survey, $n_{i}^{a}$ the number of respondents in domain $i$ under the alternative survey, $n^{a} = \sum_{i =1}^{m} n_{i}^{a},$ and $S_{i; GREG}^{a^{2}}$ the estimated population variance of the GREG residuals.

Alternatively, variance estimates of the direct estimates can be smoothed by modeling the variance estimates along with the GREG estimates themselves, (You and Chapman, 2006) and Sugasawa, Tamae and Kubokawa (2017). Their approach can be traced back to Arora and Lahiri (1997). Another possibility is to smooth variance estimates by applying generalized variance functions, (Wolter (2007), Chapter 7, and Hawala and Lahiri (2018)).

3.6 Model selection and evaluation

Frequently applied model selection criteria in HB settings are the Widely Applicable Information Criterion or Watanabe-Akaike Information Criteria (WAIC) (Watanabe, 2010, 2013) and the Deviance Information Criteria (DIC) (Spiegelhalter, Best, Carlin and van der Linde, 2002). They are popular because they are easy to compute from MCMC simulation output and because of their ability to make a reasonable tradeoff between model fit and model complexity. The WAIC is seen as an improvement on the DIC since the latter can produce negative estimates for the effective number of parameters and it is not defined for singular models (Vehtari, Gelman and Gabry, 2017). The penalty used for model complexity in DIC and WAIC is closely related to the effective number of parameters proposed by Hodges and Sargent (2001) for linear multilevel models where each fixed effect contributes one degree of freedom and the random effects contribute a value in the range between zero and $m,$ depending on the size of the variance component. As follows from the definition of WAIC, models with lower WAIC values are preferred. The WAIC estimates are uncertain and an approximation of its standard error is provided by Vehtari et al. (2017) equation (23) and can be computed using R package $loo$ (Vehtari, Gelman and Gabry, 2015).

Covariates are selected from the set of auxiliary variables listed in van den Brakel and Boonstra (2018) using a step-forward selection procedure. Various models are compared using the aforementioned WAIC estimates. From the set of potential covariates, the covariate with the lowest WAIC value is selected in the model. This selection process is iteratively repeated as long as adding a new covariate further decreases the WAIC value. In this application, this step-forward selection procedure, further abbreviated as step-WAIC, often results in models with a large number of covariates. Since the WAIC values are estimates that contain error, it appears that it might not be desirable to minimize the WAIC by adding covariates to the model as long as it reduces the point estimates of the WAIC. As an alternative we applied a step-forward selection procedure where covariates are added to the model as long as a new covariate decreases the WAIC with a value that exceeds the estimated standard error of the WAIC. This method will be referred to as step-WAIC-se.

The step-forward selection procedure is applied to each of the six different combinations of the two fixed effect versions (FE_uq and FE_eq) and the three covariance structures of the random component (RE_f, RE_d, and RE_s). From the resulting six models the one with the lowest WAIC value is selected. Model adequacy of these six selected models is evaluated with posterior predictive checks. This implies that replicate data sets, simulated from the posterior predictive distribution are compared with the originally observed data to study systematic discrepancies and to evaluate how well the selected model fits the observed data (Gelman, Carlin, Stern, Dunson, Vehtari and Rubin, 2004). Posterior predictive $p$ -values are calculated for six different tests that evaluate particular aspects of the posterior predictive distribution. Posterior predictive $p$ -values for the domain discontinuities are defined as $p = P (T ({\hat{Δ}}^{sim}, Δ) \geq T (\hat{Δ}, Δ) | \hat{y}),$ where ${\hat{Δ}}^{sim} = {({\hat{Δ}}_{1}^{sim}, \dots, {\hat{Δ}}_{m}^{sim})}^{t}$ are replicates of the observed discontinuities for the $m$ domains under the posterior predictive distribution, $\hat{Δ} = {({\hat{Δ}}_{1}, \dots, {\hat{Δ}}_{m})}^{t}$ the observed direct estimates for the $m$ domain discontinuities and $T ({\hat{Δ}}^{sim}, Δ)$ (or $T (\hat{Δ}, Δ))$ a test statistic that depends on ${\hat{Δ}}^{sim}$ (or $\hat{Δ})$ and unknown true values for the $m$ domain discontinuities $Δ = {(Δ_{1}, \dots, Δ_{m})}^{t} .$ Posterior predictive $p$ -values are estimated from the Gibbs sampler output as the average over the $S$ Monte Carlo samples

$\hat{p} = \frac{1}{S} \sum_{s =1}^{S} I (T ({\hat{Δ}}^{s}, Δ^{s}) \geq T (\hat{Δ}, Δ^{s})), (3.12)$

with $I ( A )$ the indicator function with value one if the condition $A$ is fulfilled and zero otherwise, ${\hat{Δ}}^{s} = {({\hat{Δ}}_{1}^{s}, \dots, {\hat{Δ}}_{m}^{s})}^{t}$ the $m$ observed domain discontinuities in the $s^{th}$ replicate of the MCMC simulation and $Δ^{s} = {(Δ_{1}^{s}, \dots, Δ_{m}^{s})}^{t}$ the true values of the $m$ domain discontinuities in the $s^{th}$ replicate of the MCMC simulation. If a model fits the observed data adequately, then it is expected that $T (\hat{Δ}, Δ^{s})$ is in the bulk of the histogram of the replicates $T ({\hat{Δ}}^{s}, Δ^{s}) .$ Therefore $p$ -values close to zero or one are indications of a poor fit with respect to that test statistic. In the expressions below, it is understood that $T_{x},$ for $x = 1, \dots, 6,$ is a function of $({\hat{Δ}}^{s}, Δ^{s})$ or $(\hat{Δ}, Δ^{s}),$ depending on the component that is evaluated in (3.12). The following posterior predictive tests are defined (You, 2008):

A general goodness-of-fit test statistic $T_{1} = \sum_{i =1}^{m} {({\hat{Δ}}_{i} - Δ_{i})}^{2} / Var ({\hat{Δ}}_{i} | Δ_{i}) .$ Here $Var ({\hat{Δ}}_{i} | Δ_{i}) = ψ_{i}^{r} + ψ_{i}^{a} .$
$T_{2} = max ({\hat{Δ}}_{i})$ and $T_{3} = min ({\hat{Δ}}_{i})$ which are sensitive for deviations in the tails of the distribution.
$T_{4} = \frac{1}{m} \sum_{i =1}^{m} {\hat{Δ}}_{i} \equiv \bar{\hat{Δ}},$ i.e., the mean which is sensitive for bias in the domain predictions.
$T_{5} = \frac{1}{m - 1} {\sum_{i =1}^{m} ({\hat{Δ}}_{i} - \bar{\hat{Δ}})}^{2},$ i.e., the variance of the domain estimates, which is sensitive for e.g., overshrinkage.
$T_{6} = | max ({\hat{Δ}}_{i}) - \bar{Δ} | - | min ({\hat{Δ}}_{i}) - \bar{Δ} |,$ with $\bar{Δ} = \frac{1}{m} \sum_{i =1}^{m} Δ_{i}$ which is sensitive to asymmetry in the distribution.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2021-06-24

Language selection

Search and menus

Search

Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models
Section 3. Methods

3.1 Small area estimation for domain discontinuities

3.2 Bivariate Fay-Herriot model

3.3 Univariate Fay-Herriot model for domain discontinuities

3.4 Estimation of the bivariate Fay-Herriot model

3.5 Pooling design variances

3.6 Model selection and evaluation

Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models Section 3. Methods

3.1 Small area estimation for domain discontinuities

3.2 Bivariate Fay-Herriot model

3.3 Univariate Fay-Herriot model for domain discontinuities

3.4 Estimation of the bivariate Fay-Herriot model

3.5 Pooling design variances

3.6 Model selection and evaluation

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models
Section 3. Methods