Estimation of level and change for unemployment using structural time series models
Section 4. Time series small area estimation
The
initial monthly domain estimates for the separate waves, accompanied by
variance and covariance estimates, are the input for the time series models. In
the next step STM models are applied to smooth the initial estimates and
correct for RGB. The estimated models are used to make predictions for
provincial unemployment fractions, provincial unemployment trends, and
month-to-month changes in the trends. In Subsection 4.1 the STMs are
defined and subsequently expressed as state space models fitted in a
frequentist framework. Subsection 4.2 explains how these STMs can be
expressed as time series multilevel models fitted in an hierarchical Bayesian
framework.
4.1 State space model
This
section develops a structural time series model for the monthly data at
provincial level for twelve provinces simultaneously to take advantage of
temporal and cross-sectional sample information. Let
denote the five-dimensional vector containing
the survey regression estimates
defined by (3.1) in period
and domain
This vector can be modeled with the folowing
structural time series model (Pfeffermann, 1991; van den Brakel and
Krieg, 2009, 2015):
where
a scalar denoting the true population
parameter for period
in domain
a five-dimensional vector that models the RGB
and
a five-dimensional vector with sampling
errors. The population parameter
in (4.1) is modeled as
where
denotes a stochastic trend model to capture
low frequency variation (trend plus business cycle),
a stochastic seasonal component to model
monthly fluctuations and
a white noise for the unexplained variation in
For the stochastic trend component, the so-called smooth
trend model is used, which is defined by the following set of equations:
For the stochastic seasonal component the trigonometric form is used, see Boonstra
and van den Brakel (2016) for details. The white noise in (4.2) is
defined as
The
RGB between the series of the survey regression estimates, is modeled in (4.1)
with
The model is identified by taking
This implies that the relative bias in the
follow-up waves with respect to the first wave is estimated and it assumes that
the survey regression estimates of the first wave are the most reliable
approximations for
see van den Brakel and Krieg (2009)
for a motivation. The remaining components model the systematic difference
between wave
with respect to the first wave and are modeled
as random walks to allow for time dependent patterns in the RGB,
Finally,
a time series model for the survey errors is developed. Let
denote the five-dimensional vector containing
the survey errors of the five waves. The variance estimates of the survey
regression estimates are used as prior information in the time series model to
account for heteroscedasticity due to varying sample sizes over time using the
following survey error model:
and
defined by (3.2). Since the first wave is observed for the
first time there is no autocorrelation with samples observed in the past. To
model the autocorrelation between survey errors of the follow-up waves,
appropriate AR models for
are derived by applying the Yule-Walker
equations to the correlation coefficients
which are derived from the micro data as described in Section 3. Based
on this analysis an AR(1) model is assumed for wave 2 through 5 where the
autocorrelation coefficients depend on wave and month. These considerations
result in the following model for the survey errors:
with
the time-dependent partial autocorrelation
coefficients between wave
and
derived from (4.6). As a result,
and
for
The variances
are scaling parameters with values close to
one for the first wave and close to
for the other waves, where
denotes the length of the observed series.
Model
(4.1) uses sample information observed in preceding periods within each domain
to improve the precision of the survey regression estimator and accounts for
RGB and serial correlation induced by the rotating panel design. To take
advantage of sample information across domains, model (4.1) for the separate
domains can be combined in one multivariate model:
where
denotes the number of domains, which is equal
to twelve in this application. This multivariate setting allows to use sample
information across domains by modeling the correlation between the disturbance
terms of the different structural time series components (trend, seasonal, RGB)
or by defining the hyperparameters or the state variables of these components
equal over the domains. In this paper models with cross-sectional correlation
between the slope disturbance terms of the trend (4.3) are considered, i.e.,
The most parsimonious covariance structure is a diagonal matrix where all
the domains share the same variance component, i.e.,
for all
and
for all
and
These are so-called seemingly unrelated structural time
series models and are a synthetic approach to use sample information across
domains. A slightly more complex and realistic covariance structure is a
diagonal matrix where each domain has a separate variance component, i.e.,
for all
and
In this case the model only borrows strength over time and
does not take advantage of cross-sectional information. The most complex
covariance structure allows for a full covariance matrix. Strong correlation
between the slope disturbances across the domains can result in cointegrated
trends. This implies that
common trends are required to model the
dynamics of the trends for the
domains and allows the specification of
so-called common trend models (Koopman, Harvey, Doornik and Shephard, 1999; Krieg
and van den Brakel, 2012). Initial STM analyses showed that the
seasonal and RGB component turned out to be time independent. It is therefore
not sensible to model correlations between seasonal and RGB disturbance terms.
Since the hyperparameters of the white noise population domain parameters tend
to zero, it turned out to be better to remove this component completely from
the model implying that modeling correlations between population noise is not
considered. Correlations between survey errors for different domains is also
not considered, since the domains are geographical regions from which samples
are drawn independently.
As
an alternative to a model with a full covariance matrix for the slope
disturbances, a trend model is considered that has one common smooth trend
model for all provinces plus
trend components that describe the deviation
of each domain from this overall trend. In this case (4.2) is given by
Here
is the overal smooth trend component, defined by
(4.3), and
the deviation from the overall trend for the
separate domains, defined as local levels
or as smooth trends as in (4.3). These trend models implicitly allow for
(positive) correlations between the trends of the different domains.
The
parameters to be estimated with the time series modeling approach are the trend
and the signal. The latter is defined as the trend plus the seasonal component.
The time series approach is particularly suitable for estimating month-to-month
changes. Seasonal patterns hamper a straightforward interpretation of
month-to-month changes of direct estimates and smoothed signals. Therefore
month-to-month changes are calculated for the trends only. Due to the strong
positive correlation between the levels of consecutive periods, the standard
errors of month-to-month changes in the level of the trends are much smaller
than those of e.g., month-to-month changes of the direct estimates. The
month-to-month change of the trend is defined as
for models with separate trends for the
domains or
for models with an overall trend and
trends for the deviation from the overall
trend for the separate domains. This modeling approach is also useful to
estimate year-to-year developments for trend defined as
or
Year-to-year differences are also sensible for
signals, since the main part of the seasonal component cancels out. These
developments are defined equivalently to the year-to-year developments of the
trend.
The
aforementioned structural time series models are analyzed by putting them in
the so-called state space form. Subsequently the Kalman filter is used to fit
the models, where the unknown hyperparameters are replaced by their ML
estimates. The analysis is conducted with software developed in OxMetrics in
combination with the subroutines of SsfPack 3.0, (Doornik, 2009; Koopman,
Shephard and Doornik, 1999, 2008). ML estimates for the hyperparameters are
obtained using the numerical optimization procedure maxBFGS in OxMetrics. More
details about the state space representation, initialization of the Kalman
filter and software used to fit these models is included in Boonstra and
van den Brakel (2016).
4.2 Time series multilevel model
For
the description of the multilevel time series representation of the STMs, the
initial estimates
are combined into a vector
i.e., wave index runs faster than time index
which runs faster than area index. The numbers of areas, periods and waves are
denoted by
and
respectively. The total length of
is therefore
12(areas)* 72(months)* 5(waves) = 4,320. Similarly, the variance estimates
are put in the same order along the diagonal
of a
covariance matrix
The
covariance matrix
is not diagonal because of the correlations
induced by the panel design. It is a sparse band matrix, and the ordering of
the vector
is such that it achieves minimum possible
bandwidth, which is advantageous from a computational point of view.
The
multilevel models considered for modeling the vector of direct estimates
take the general linear additive form
where
is a
design matrix for the fixed effects
and the
are
design matrices for random effect vectors
Here the sum over
runs over several possible random effect terms
at different levels, such as a national level smooth trend, provincial local
level trends, white noise, etc. This is explained in more detail below. The
sampling errors
are taken to be normally distributed as
where
with
the covariance matrix for the initial
estimates for province
and
a province-specific variance scale parameter
to be estimated. As described in Section 3 the design variances in
are pooled over provinces and because of the
discrete nature of the unemployment data they thereby lose some of their
dependence on the unemployment level. It was found that incorporating the
variance scale factors
allows the model to rescale the estimated
design variances to a level that better fits the data.
To
describe the general model for each vector
of random effects, we suppress the superscript
Each vector
has
components corresponding to
effects allowed to vary over
levels of a factor variable. In particular,
where
and
are
and
covariance matrices, respectively. As in
Section 4.1 the covariance matrix
is allowed to be parameterised in three
different ways. Most generally, it is an unstructured, i.e., fully
parameterised covariance matrix. More parsimonious forms are
or
If
the three parameterisations are equivalent. The
covariance matrix
describes the covariance structure between the
levels of the factor variable, and is assumed to be known. It is typically more
convenient to use the precision matrix
as it is sparse for many common temporal and
spatial correlation structures (Rue and Held, 2005).
4.2.1 Relations between state space and time series
multilevel representations
A
single smooth trend can be represented as a random intercept
varying over time
with temporal correlation determined by a
band sparse precision matrix
associated with a second order random walk (Rue
and Held, 2005). In this case
and the design matrix
is the
indicator matrix for month, i.e., the matrix
with a single 1 in each row for the corresponding month and 0s elsewhere. The
sparsity of both
and
can be exploited in computations. The
precision matrix for the smooth trend component has two singular vectors,
and
This means that the corresponding specification
(4.14) is completely uninformative about the overall level and linear trend. In
order to prevent unidentifiability among various terms in the model, the
overall level and trend can be removed from
by imposing the constraints
where
is the
matrix with the two singular vectors as its
rows. The overall level and trend are then included in the vector
of fixed effects. In the state space
representation, this model is obtained by defining one trend model (4.3) for
all domains, i.e.,
and
for all
Defining the state variables for the trend
equal over the domains is a very synthetic approach to use sample information
from other domains and is based on assumptions that are not met in most cases.
A
smooth trend for each province is obtained with
and
a
covariance matrix, either diagonal with a
single variance parameter, diagonal with
variance parameters, or unstructured, i.e.,
fully parametrised in terms of
variance parameters and
correlation parameters. The design matrix is
in this case. In the state space
representation, these models are obtained with trend model (4.3) and covariance
structure (4.9).
An
alternative trend model consists of a single global smooth trend (second order
random walk) supplemented by a local level trend, i.e., an ordinary (first
order) random walk, for each province. The latter can be modeled as discussed
in the previous paragraph, but with precision matrix associated with a first
order random walk. This trend model corresponds to the models (4.10) and (4.11)
in the state space context. In contrast to the state space approach, it is not
necessary to remove one of the provincial random walk trends from the model for
identifiability. The reason is that in the multilevel approach constraints are
imposed to ensure that the smooth overall trend as well as all provincial
random walk trends sum to zero over time. The constrained components correspond
to global and provincial intercepts, which are separately included in the model
as fixed effects with one provincial fixed effect excluded.
Seasonal
effects can be expressed in terms of correlated random effects (4.14) as well.
The trigonometric seasonal is equivalent to the balanced dummy variable
seasonal model (Proietti, 2000; Harvey, 2006), corresponding to first order
random walks over time for each month, subject to a sum-to-zero constraint over
the months. In this case
(seasons),
and
with
the precision matrix of a first order random
walk. The sum-to-zero constraints over seasons at each time, together with the
sum-to-zero constraints over time of each random walk can be imposed as
with
the
matrix
Together with fixed effects for each season (again with a sum-to-zero
constraint imposed) this random effect term is equivalent to the trigonometric
seasonal. It can be extended to a seasonal for each province, with a separate
variance parameter for each province.
To
account for the RGB, the multilevel model includes fixed effects for waves 2 to
5. These effects can optionally be modeled dynamically by adding random walks
over time for each wave. Another choice to be made is whether the fixed and
random effects are crossed with province.
Further
fixed effects can be included in the model, for example those associated with
the auxiliary variables used in the survey regression estimates. Some fixed
effect interactions, for example season
province or wave
province might alternatively
be modeled as random effects to reduce the risk of overfitting.
Finally,
a white noise term can be added to the model, to account for unexplained
variation by area and time in the signal.
Model
(4.12) can be regarded as a generalization of the Fay-Herriot area-level model.
The Fay-Herriot model only includes a single vector of uncorrelated random
effects over the levels of a single factor variable (typically areas). The
models used in this paper contain various combinations of uncorrelated and
correlated random effects over areas and months. Earlier accounts of multilevel
time series models extending the Fay-Herriot model are Rao and Yu (1994); Datta
et al. (1999); You (2008). In Datta et al. (1999) and You (2008) time
series models are used with independent area effects and first-order random
walks over time for each area. In Rao and Yu (1994) a model is used with
independent random area effects and a stationary autoregressive AR(1) instead
of a random walk model over time. In You et al. (2003) the random walk
model was found to fit the Canadian unemployment data slightly better than
AR(1) models with autocorrelation parameter fixed at 0.5 or 0.75. We do not
consider AR(1) models in this paper, and refer to Diallo (2014) for an approach
that allows both stationary and non-stationary trends. Compared to the
aforementioned references a novel feature of our model is that smooth trends
are considered instead of or in addition to first-order random walks or
autoregressive components. We also include independent area-by-time random
effects as a white noise term accounting for unexplained variation at the
aggregation level of interest.
4.2.2 Estimating time series multilevel models
A
Bayesian approach is used to fit model (4.12)-(4.14). This means we need prior
distributions for all (hyper)parameters in the model. The following priors are
used:
- The
data-level variance parameters
for
are assigned inverse chi-squared priors with
degrees of freedom and scale parameters equal to 1.
- The
fixed effects are assigned a normal prior with zero mean and fixed diagonal
variance matrix with very large values (1e10).
- For
a fully parameterized covariance matrix
in (4.14) we use the scaled-inverse
Wishart prior as proposed in O’Malley and Zaslavsky (2008) and recommended by Gelman
and Hill (2007). Conditionally on a
-dimensional vector parameter
-
- where
is chosen, and
The vector
is assigned a normal distribution
- All
other variance parameters appearing in a diagonal matrix
in (4.14) are assigned, conditionally
on an auxiliary parameter
inverse chi-squared priors with 1 degree of
freedom and scale parameter
Each parameter
is assigned a
prior. Marginally, the standard deviation
parameters have half-Cauchy priors. Gelman (2006) demonstrates that these
priors are better default priors than the more common inverse chi-squared
priors.
The
model is fit using Markov Chain Monte Carlo (MCMC) sampling, in particular the
Gibbs sampler (Geman and Geman, 1984; Gelfand and Smith, 1990). The multilevel
models considered belong to the class of additive latent Gaussian models with
random effect terms being Gaussian Markov Random Fields (GMRFs), and we make
use of the sparse matrix and block sampling techniques described in Rue and
Held (2005) for efficiently fitting such models to the data. Moreover, the
parametrization in terms of the aforementioned auxiliary parameters
(Gelman, Van Dyk, Huang and Boscardin, 2008), greatly
improves the convergence of the Gibbs sampler used. See Boonstra and
van den Brakel (2016) for more details on the Gibbs sampler used,
including specifications of the full conditional distributions. The methods are
implemented in R using the mcmcsae R-package (Boonstra, 2016).
For
each model considered, the Gibbs sampler is run in three independent chains
with randomly generated starting values. Each chain is run for 2,500
iterations. The first 500 draws are discarded as a “burn-in sample”. From the
remaining 2,000 draws from each chain, we keep every fifth draw to save memory
while reducing the effect of autocorrelation between successive draws. This
leaves 3 * 400 = 1,200 draws to compute estimates and
standard errors. It was found that the effective number of independent draws
was near 1,200 for most model parameters, meaning that most autocorrelation was
indeed removed by the thinning. The convergence of the MCMC simulation is
assessed using trace and autocorrelation plots as well as the Gelman-Rubin
potential scale reduction factor (Gelman and Rubin, 1992), which diagnoses the
mixing of the chains. The diagnostics suggest that all chains converge well
within the burnin stage, and that the chains mix well, since all Gelman-Rubin
factors are close to one. Also, the estimated Monte Carlo simulation errors
(accounting for any remaining autocorrelation in the chains) are small compared
to the posterior standard errors for all parameters, so that the number of
retained draws is sufficient for our purposes.
The
estimands of interest can be expressed as functions of the parameters, and
applying these functions to the MCMC output for the parameters results in draws
from the posteriors for these estimands. In this paper we summarize those draws
in terms of their mean and standard deviation, serving as estimates and
standard errors, respectively. All estimands considered can be expressed as
linear predictors, i.e., as linear combinations of the model parameters.
Estimates and standard errors for the following estimands are computed:
- Signal:
the vector
including all fixed and random effects, except
those associated with waves 2 to 5. These correspond to the fitted values
associated with each fifth row
of
and the design matrices.
- Trend:
prediction of the long-term trend. This is computed by only incorporating the
trend components of each model in the linear predictor. For most models
considered the trend corresponds to seasonally adjusted figures, i.e.,
predictions of the signal with all seasonal effects removed.
- Growth
of trend: the differences between trends at two consecutive months.