Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models
Section 3. Methods
3.1 Small area estimation for domain discontinuities
Testing hypotheses about
differences between estimates of a finite population parameter observed under
different survey processes implies the existence of measurement errors.
Therefore a measurement error model is required to explain systematic differences
between survey estimates for the same population parameter observed under two
different survey approaches. Let
denote the
population parameter of domain
Let
and
denote the
observed value for
in the case of
a complete enumeration under the regular approach and alternative approach,
respectively. Direct estimates for
and
are obtained
with the GREG estimator based on the samples assigned to the regular and
alternative survey and are denoted as
and
respectively.
The relation between the
observed values under a complete enumeration and the real population parameter
is:
with
the real
measurement bias if
is measured
with survey approach
Without any
external information, it is not possible to estimate
From the sample
data only the relative bias, say
is identifiable.
Direct estimates for these discontinuities are obtained from the survey data as
the contrast between the GREG estimates, i.e.,
In the case of the Dutch
CVS the sample size of the regular survey is large enough to obtain
sufficiently precise direct estimates for the planned domains, since the sample
is designed to publish official statistics for these domains. The sample
assigned to the alternative survey for the parallel run has only a size of one
third of the regular sample size, which is insufficient to obtain precise
direct estimates for the planned domains. In an earlier paper (van den Brakel et al., 2016) univariate FH models were developed to obtain more
precise predictions for the domain parameters observed with the small sample
size assigned to the alternative survey approach using auxiliary variables
derived from three different sources. The first source contains demographic
variables derived from the Municipal Basis Administration (MBA), which is an
administration of all people residing in the Netherlands. The second source
contains related variables available in the Police Register of Reported
Offences (PRRO). The third source, which is unique in the case of a parallel
run, contains direct estimates for the same variables observed under the
regular survey, which are sufficiently precise at least for the planned domains
like police districts. The direct estimates from the regular survey are often
selected as auxiliary variables for these univariate FH models. This comes not
as a surprise since these are survey estimates for the same population
parameters. Although measured with a different survey process, strong positive
correlations can be expected. Strong improvements of the precision of small
domain prediction are indeed found if the set of potential auxiliary variables,
i.e., from MBA and PRRO, is extended with the direct estimates from the regular
CVS.
In this application the
sampling error in the auxiliary variables that come from the regular CVS can be
ignored in the FH model, since the sample size and therefore the sampling error
for these domains is more or less equal for the domains (Ybarra and Lohr, 2008). This implies that the variance component of the
random domain effects is inflated with the sampling error of the auxiliary
variables, which is fine as long as the sampling error does not differ between
domains. In most applications this is not the case and the methods proposed by Ybarra and Lohr (2008) should be used to account for sampling error in the
auxiliary variables.
FH multilevel models can
be fitted under a frequentist approach using EBLUP or under an HB approach (Rao and Molina, 2015). In van den Brakel
et al. (2016) the HB approach is
preferred over the EBLUP, since the strong auxiliary information in the fixed
effect part of the model often results in zero estimates for the variance
component of the random domain effects, giving too much weight to the synthetic
regression part and too little weight to the direct estimates in the EBLUP, (Bell, 1999; Rao and Molina, 2015). This problem can also be overcome with adjusted
maximum likelihood estimation, see e.g., Li and Lahiri (2010) and Hirose and Lahiri (2018).
Let
denote the HB
prediction for domain
under the
alternative approach. Now domain discontinuites are obtained by
Using direct
estimates of the regular survey as auxiliary variables in the fixed part of the
FH model for the alternative survey considerably increases the complexity of
the variance estimation for the discontinuities. The variance of
can be
expressed as
The use of
or related
sample estimates as auxiliary variables to predict
results in
non-zero values for
that cannot be
ignored. To approximate
van den Brakel et al. (2016) proposed an approximately design-unbiased estimator
for
and
while the
is approximated
with the posterior variance of the HB domain predictions. A major disadvantage
of this approach is that model-based and design-based uncertainty measures are
intertwined. On the one hand, the MSE’s for
are approximated
with their posterior variances. On the other hand, the covariances between
and
are
approximated from a design-based perspective. Consequently, naive application
of this approach may give negative variance estimates for the discontinuities.
This drawback has been solved using a design-based approximation for the
resulting in a
fully design-based approximation for the uncertainty of the estimated domain
discontinuities.
In this paper a full HB
framework for estimating domain discontinuities is proposed as an alternative
by developing a bivariate FH model for the domain parameters observed under
both the regular and alternative approach. The advantage of this approach is that
it improves the precision of both the direct estimates of the regular and
alternative domain estimates by borrowing strength from other domains and both
surveys. Negative variance estimates for the estimated domain discontinuities
are precluded by definition under this multivariate HB framework. This method
is compared with a simple alternative, namely a univariate FH model for the
direct estimates of the discontinuities. As mentioned in the introduction, it
is anticipated that it might be hard to find covariates in the available
registers that explain the discontinuities. An advantage of both models is that
they avoid the complications of accounting for sampling error in the auxiliary
variables, which is necessary if the survey estimates of the regular survey are
used as covariates in univariate FH models and the sampling error differs
between domains.
3.2 Bivariate Fay-Herriot model
A bivariate version of
the FH model (Fay and Herriot,
1979) starts with a measurement model for
the two GREG estimates observed in each domain:
with
a vector containing the GREG estimates of
and
a vector with the sampling errors of
for which it is assumed that
Here
is a 2 dimensional column vector with each
element equal to zero. Since the sample for the regular and alternative survey
are drawn independently, it is assumed that
where
is the design variance of
It is also assumed that these design variances
are known although they are replaced by their estimates in practice. The true
domain parameters are modelled with a multilevel model. For the fixed effects
it is assumed that the regular and alternative approach share the same
covariates. In the most general case the regression coefficients for the fixed
part are different for both variables i.e.,
with
a
-vector with covariates of domain
a
-vector of regression coefficients, which are equal
over the domains but might be different between the two survey approaches. It
is assumed that
corresponds to the intercept. Furthermore
are random domain effects. This gives rise to
the following bivariate multilevel model for the two domain parameters:
where
a 2 dimensional identity matrix, and
For the random domain effects it is assumed
that
with
a general
covariance matrix for the random domain
effects. Inserting (3.3) into (3.1) gives:
with model assumptions (3.2) and (3.4).
Since the number of
domains in this application is small, it is important to select parsimonious
models. One way to reduce model complexity is to assume that the regression
coefficients are equal for both survey approaches. In this case a dummy
indicator, say
is introduced,
which is equal to zero for the regular survey and equal to one for the
alternative survey. In this case
corresponds to
the overall intercept and
is the
indicator whose coefficient measures the differences between intercepts of the
variables observed under both surveys. So
and in (3.3)
and
a vector with
the corresponding regression coefficients. As a result, two versions for the
fixed effects are considered:
- FE_uq: A fixed effect model where the regular and
alternative approach share the same covariates, but have different regression
coefficients. In this case, domain discontinuities are given by
- FE_eq: A more parsimonious version for the fixed effect
component by assuming that the regression coefficients are equal for the
regular and alternative approach. In this case domain discontinuities are given
by
- with
the regression
coefficient for
The following covariance structures for the random domain effects are
considered:
- RE_f: A full covariance matrix
for the random
domain effects. Positive correlation between the random domain effects will
further increase the precision of the estimates for the domain discontinuities
since domain estimates borrow strength not only from different domains but also
across the two surveys.
- RE_d: A diagonal covariance matrix with separate variances
for the regular and alternative approach, i.e.:
This covariance structure in combination with model FE_uq comes down to applying a univariate FH model to both
surveys separately. In this case models only use sample information from other
domains within the same survey but not across the two surveys to improve the
precision of the estimates for domain discontinuities.
- RE_s: A diagonal covariance matrix with equal variances for
the regular and alternative approach, i.e.:
3.3 Univariate Fay-Herriot model for domain
discontinuities
The univariate FH model
for the direct estimates of the discontinuities starts with defining a
measurement error model for the GREG estimates of the discontinuities:
with
the true discontinuity of domain
under a complete enumeration of the population
under both approaches,
the GREG estimate for
based on the parallel run and
the sampling error of
It is assumed that
and that the design variances
of the sampling errors are known. For
the following linear model is assumed:
with
Inserting (3.9) into (3.10) gives:
3.4 Estimation of the bivariate Fay-Herriot model
The
models developed in Subsections 3.2 and 3.3 are fitted with a HB approach
using Markov Chain Monte Carlo (MCMC) sampling. In particular the Gibbs sampler
is used. The following priors are used for the model parameters and
hyperparameters. For the regression coefficients uniform improper priors are
assumed, i.e.,
For the random domain effects a multivariate
normal prior is used:
In
the case of a full covariance matrix for the random domain effects, the prior
for
is taken to be a scaled inverse Wishart
distribution (O’Malley and Zaslavsky, 2008). This distribution is obtained by writing
with
and assuming a standard normal distribution
for
and
i.e.,
and an inverse Wishart distribution for
i.e.,
with
degrees of freedom, with
the dimension of
which is equal to 2 in this application, and
scale parameter
In the case of a diagonal covariance matrix
for the random domain effects, the priors for
(in the case of unequal variances) or
(in the case of equal variances) are
half-Cauchy distributions. These are more robust prior distributions than the
more commonly used inverse chi-squared distribution (Gelman, 2006). The inverse chi-squared
distribution might be informative, even in the case of small scale and shape
parameters. In addition convergence problems might occur with the Gibbs
sampler. Both problems are largely avoided with a redundant multiplicative
parametrization of the random effects (Gelman, 2006; Gelman, Van Dyk, Huang and Boscardin, 2008; Polson and
Scott, 2012). See van den Brakel and Boonstra (2018) for more details on the priors
of this model.
Let
denote the
column vector
obtained by stacking the
column vectors
and
the matrix
obtained by stacking the matrices
In the case of
unequal regression coefficients,
is a
matrix. In the
case of equal regression coefficients,
is a
matrix. The
likelihood function can be written as
with
a
diagonal matrix with the design variances of
the direct estimates
and
a vector containing all model parameters. The
joint prior distribution
equals the product of the aforementioned
priors. The posterior distribution of
is proportional to the joint density, i.e.,
The model is fitted using the Gibbs sampler Geman and Geman
(1984); Gelfand and Smith (1990). The full conditional distributions used in the Gibbs
sampler are specified in van den Brakel and Boonstra (2018).
For each model
considered, the Gibbs sampler is run in three independent chains with randomly
generated starting values. The length of each chain after the burn-in period
for each run is 10,000 iterations. This gives 30,000 draws to compute estimates
and standard errors. The convergence of the MCMC simulation is assessed using
trace and autocorrelation plots as well as the Gelman-Rubin potential scale
reduction factor (Gelman and
Rubin, 1992), which diagnoses the mixing
of the chains. The diagnostics suggest that all chains converge well within 500
draws. The estimated Monte Carlo simulation errors are small compared to the
posterior standard errors for all parameters, so that the number of draws are
more than sufficient for our purposes.
The estimands of interest
are expressed as functions of the parameters, and applying these functions to
the MCMC output for the parameters results in draws from the posteriors for
these estimands. Domain predictions for the target variables under the
bivariate FH model are obtained as the posterior means approximated by the
Gibbs sampler output and are denoted as
Domain
predictions for the discontinuities are obtained as the posterior means of (3.6)
or (3.7) approximated by the Gibbs sampler output and are denoted as
Mean squared
errors for
and
are obtained as
posterior variances approximated from the Gibbs sampler output.
The methods are
implemented in R using the
mcmcsae R-package (Boonstra, 2020).
3.5 Pooling design variances
Estimates for the design
variances
and
are available
from the GREG estimator and are used as if the true design variances are known.
This is a standard assumption in small area estimation. Therefore it is
important to provide reliable estimates for these design variances. For the
regular survey the variance estimates of the GREG estimates are considered to
be reliable enough to be used in the FH model. For the alternative survey the
estimates of the design variances are unreliable and therefore smoothed to
improve their stability of the estimates of
Under the
assumption that the population variances of the GREG residuals under the
alternative approach are equal accross domains, the analysis-of-variance type
of pooled variance estimator is used:
with
the sample fraction in domain
of the alternative survey,
the number of respondents in domain
under the alternative survey,
and
the estimated population variance of the GREG
residuals.
Alternatively, variance
estimates of the direct estimates can be smoothed by modeling the variance
estimates along with the GREG estimates themselves, (You and Chapman, 2006) and Sugasawa, Tamae and Kubokawa (2017).
Their approach can be traced back to Arora and Lahiri (1997). Another
possibility is to smooth variance estimates by applying generalized variance
functions, (Wolter (2007),
Chapter 7, and Hawala and Lahiri (2018)).
3.6 Model selection and evaluation
Frequently applied model
selection criteria in HB settings are the Widely Applicable Information
Criterion or Watanabe-Akaike Information Criteria (WAIC) (Watanabe, 2010, 2013) and the Deviance Information Criteria (DIC) (Spiegelhalter,
Best, Carlin and van der Linde, 2002). They are popular because
they are easy to compute from MCMC simulation output and because of their
ability to make a reasonable tradeoff between model fit and model complexity.
The WAIC is seen as an improvement on the DIC since the latter can produce
negative estimates for the effective number of parameters and it is not defined
for singular models (Vehtari, Gelman and Gabry, 2017).
The penalty used for model complexity in DIC and WAIC is closely related to the
effective number of parameters proposed by Hodges and Sargent (2001) for
linear multilevel models where each fixed effect contributes one degree of
freedom and the random effects contribute a value in the range between zero and
depending on
the size of the variance component. As follows from the definition of WAIC,
models with lower WAIC values are preferred. The WAIC estimates are uncertain
and an approximation of its standard error is provided by Vehtari et al. (2017) equation (23) and can be computed using R package
(Vehtari, Gelman and Gabry, 2015).
Covariates are selected
from the set of auxiliary variables listed in van den Brakel and Boonstra (2018) using a step-forward selection procedure. Various
models are compared using the aforementioned WAIC estimates. From the set of
potential covariates, the covariate with the lowest WAIC value is selected in
the model. This selection process is iteratively repeated as long as adding a
new covariate further decreases the WAIC value. In this application, this
step-forward selection procedure, further abbreviated as step-WAIC, often results in models with a large number of
covariates. Since the WAIC values are estimates that contain error, it appears
that it might not be desirable to minimize the WAIC by adding covariates to the
model as long as it reduces the point estimates of the WAIC. As an alternative
we applied a step-forward selection procedure where covariates are added to the
model as long as a new covariate decreases the WAIC with a value that exceeds
the estimated standard error of the WAIC. This method will be referred to as step-WAIC-se.
The step-forward
selection procedure is applied to each of the six different combinations of the
two fixed effect versions (FE_uq and FE_eq) and the three covariance structures of the random
component (RE_f, RE_d, and RE_s). From the resulting six models the one with the
lowest WAIC value is selected. Model adequacy of these six selected models is
evaluated with posterior predictive checks. This implies that replicate data
sets, simulated from the posterior predictive distribution are compared with
the originally observed data to study systematic discrepancies and to evaluate
how well the selected model fits the observed data (Gelman,
Carlin, Stern, Dunson, Vehtari and Rubin, 2004). Posterior predictive
-values are
calculated for six different tests that evaluate particular aspects of the
posterior predictive distribution. Posterior predictive
-values for the
domain discontinuities are defined as
where
are replicates
of the observed discontinuities for the
domains under
the posterior predictive distribution,
the observed
direct estimates for the
domain
discontinuities and
(or
a test
statistic that depends on
(or
and unknown
true values for the
domain
discontinuities
Posterior
predictive
-values are
estimated from the Gibbs sampler output as the average over the
Monte Carlo
samples
with
the indicator function with value one if the
condition
is fulfilled and zero otherwise,
the
observed domain discontinuities in the
replicate of the MCMC simulation and
the true values of the
domain discontinuities in the
replicate of the MCMC simulation. If a model
fits the observed data adequately, then it is expected that
is in the bulk of the histogram of the
replicates
Therefore
-values close to zero or one are indications of a poor
fit with respect to that test statistic. In the expressions below, it is
understood that
for
is a function of
or
depending on the component that is evaluated
in (3.12). The following posterior predictive tests are defined (You, 2008):
- A general goodness-of-fit test statistic
Here
-
and
which are
sensitive for deviations in the tails of the distribution.
-
i.e., the mean which is sensitive for bias in the
domain predictions.
-
i.e., the variance of the domain estimates, which is
sensitive for e.g., overshrinkage.
-
with
which is
sensitive to asymmetry in the distribution.