Small area estimation for unemployment using latent Markov models
Section 6. Final remarks
In
this paper we develop a new area level SAE method that uses a Latent Markov
Model (LMM) as the linking model. In LMMs (Bartolucci et al., 2013), the
characteristic of interest, and its evolution in time, is represented by a
latent process that follows a Markov chain, usually of first order. Under the
assumption of normality for the conditional distribution of the response
variables given the latent variables, the model is estimated using an augmented
data Gibbs sampler. The proposed model has been applied to quarterly data from
the Italian LFS from 2004 to 2014. The model-based method has been found to be
effective for developing LMAs level estimates of unemployment incidence and the
reduction in the coefficient of variation compared to the direct estimator is
quite evident. The proposed approach is also more accurate than the direct and
the time-series model-based estimator proposed by You et al. (2003) in
reproducing census data. An advantage of this methodology is that it also
provides a clustering of the small areas in homogeneous groups.
LMMs
can be seen as an extension of latent class models to longitudinal data. In
this regard, our approach represents an extension of the latent class SAE model
proposed by Fabrizi et al. (2016). Moreover, LMMs may be seen as an
extension of Markov chain models to control for measurement errors and can
easily handle multivariate data, providing a very flexible modeling framework.
The approach could be extended using spatial correlation information, and it
could consider different distributions for the manifest variables, such as
Poisson, Binomial, and Multinomial responses. In this scenario, we could fit
unmatched sampling and linking models and handle departures from the normality
assumption, but a Gibbs sampler cannot be used any longer, and
Metropolis-Hastings sampling is an option. The proposed univariate model can
account for measurement errors, but the extension to multivariate framework
could be also possible, taking into account the conditional independence assumption.
In
this application we have not accounted explicitly for the serial correlation
induced by the rotating panel design. A natural way to take the different
features of this design into account, such as the rotating group bias and the
autocorrelation of the survey errors, is to use state space-model
specifications, as in Pfeffermann (1991), Pfeffermann and Rubin-Bleuer (1993)
and, more recently, Van den Brakel and Krieg (2015) and Boonstra and
Van den Brakel (2016). In this context, it would also be interesting
to extend to SAE the LMM with serial correlation in the measurement model
proposed by Bartolucci and Farcomeni (2009). State space-model specifications
can also be a useful tool to capture and model the strong trend and seasonality
of this type of data.
Acknowledgements
The
work of Bertarelli, Ranalli, D’Aló and Solari has been developed under the
support of the project PRIN-2012F42NS8 “Household wealth and youth
unemployment: new survey methods to meet current challenges”. The Authors are
grateful to the Associated Editor and two anonymous Referees who provided very
useful comments on earlier versions of this paper.
Appendix A
Model estimation
In
the following we first illustrate Bayesian estimation and model selection based
on a MCMC algorithm which is implemented in a data augmentation framework (Tanner
and Wong, 1987).
A.1 Data augmentation method
In
order to estimate the small area parameters
the measurement parameters
and the latent parameters
we follow a data augmentation approach. We
recall that the observed data consist of the direct estimates
the corresponding smoothed
and the covariate vectors
with
and
Moreover, the data augmentation approach
explicitly introduces the latent variables
treated as missing data, the values of which
are updated during the MCMC algorithm that is, therefore, based on a complete
data likelihood. In this context, the use of conjugate priors to the complete
data likelihood allows us to sample from the conditional posterior of the
latent states in a straightforward way. Since the state space is finite,
sampling the latent states conditionally given the model parameters is also
simple.
To
generate samples from the joint posterior distribution of the model parameters
and latent states, the proposed MCMC algorithm proceeds as follows. Let
be the matrix of realizations of the available
direct estimates that is defined as in (4.1), with each
replaced by
and let
be the matrix of the latent variable
with elements organized as in
Then the posterior distribution of all model
parameters and latent variables, given the observed data, has the following
expression:
The MCMC algorithm alternates between sampling the latent variables and
the parameters from the corresponding full conditional distribution. This
scheme is repeated for
iterations. At the end of each iteration
the sampled model parameters and latent
variables are obtained and are denoted by
and
More precisely, each iteration consists in:
- drawing
from
- drawing
from
- drawing
from
- drawing
from
In
the following we illustrate in details each of the above steps. In this regard,
note that our illustration is referred to the case where all elements of
are available. However, in our application,
some elements of this matrix are missing. This requires minor adjustments to
the MCMC algorithm, consisting in imputing the missing values by a Gibbs
sampler and sampling directly from its full conditional distribution.
A.1.1 Simulation of
Each
latent variable
is drawn separately from the corresponding
full conditional distribution, which is of multinomial type with specific
parameters. In particular, we have that
where
disappears for
and
disappears for
Moreover, the probability vector
is defined as follows:
- for
has elements proportional to
- for
has elements proportional to
- for
has elements proportional to
A.1.2 Simulation of
Recalling
that
we first draw
from the full conditional distribution:
where
and
is the number of areas in state
at time 1, with
Moreover, we draw each row of matrix
from the distribution
where
and
is the number of areas moving from state
to state
at time
with
and
A.1.3 Simulation of
Considering
that
, we first draw each
from the full conditional distribution:
where
with
denoting the indicator function equal to 1 if
its argument is true and to 0 otherwise. Then, we draw each
from
with
where
is the number of areas in state
regardless of the specific time occasion.
A.1.4 Simulation of
The
goal of SAE is to predict each
based on the model and the observed data. This
amounts to draw these elements from
where
with
A.2 Model selection: The Chib estimator
The
method proposed in Chib (1995) can be applied to perform model selection
starting from the Gibbs sampler output. It is known that the posterior density
can be written as the ratio of the product of the likelihood function and the
priors divided by the marginal likelihood:
Therefore, it is possible to write the marginal likelihood of the data
as
for any
and
We drop the dependence on
for ease of notation. This is the model
selection criterion used in Section 5. Then, choosing specific values of the
latent variables and model parameters, denoted by
and
we can estimate
through the following decomposition:
The use of the log transformation is motivated by numerical stability (Chib,
1995).
The
first five terms at the right hand side of (A.5) can be computed directly from
the assumed distributions of the parameters and the data. On the other hand
obtaining the last component is more challanging. By the law of total
probability,
may be decomposed as
Following
Chib (1995), we compute the first term of (A.6) following the Gibbs scheme
outlined in Section A.1, whereas, the other three terms are estimated from
the Gibbs output. In particular, we estimate
as
based on
draws from a reduced Gibbs sampling where
is not updated. In order to estimate
we use
Finally, to estimate
we use
with
draws from a third reduced Gibbs sampling.
References
Bartolucci, F., and
Farcomeni, A. (2009). A multivariate extension of the dynamic logit model for longitudinal
data based on a latent Markov heterogeneity structure. Journal of the American Statistical Association, 104, 816-831.
Bartolucci, F.,
Farcomeni, A. and Pennoni, F. (2013). Latent
Markov Models for Longitudinal Data. Boca Roton, FL: CRC Press.
Bartolucci, F.,
Lupparelli, M. and Montanari, G.E. (2009). Latent Markov model for longitudinal
binary data: An application to the performance evaluation of nursing homes. The Annals of Applied Statistics, 3, 611-636.
Bartolucci, F., Pennoni, F.
and Francis, B. (2007). A latent Markov model for detecting patterns of
criminal activity. Journal of the Royal
Statistical Society, Series A, 170, 115-132.
Boonstra, H.J. (2012).
hbsae: Hierarchical Bayesian small area estimation. R Package Version 1.
Boonstra, H.J. (2014).
Time-series small area estimation for unemployment based on a rotating panel
survey. Technical report, CBS. Available at https://www.cbs.nl/nl-nl/achtergrond/2014/25/time-series-small-area-estimation-for-unemployment-based-on-a-rotating-panel-survey.
Boonstra, H.J., and van den Brakel,
J.A. (2016). Estimation of Level and
Change for Unemployment Using Multilevel and Structural Time Series Models.
Discussion paper 2016-10. Statistics Netherlands, Heerlen.
Boys, R., and Henderson,
D. (2003). Data augmentation and marginal updating schemes for inference in
hidden Markov models. Technical report, Univ. Newcastle.
Carlin, B.P., and Chib,
S. (1995). Bayesian model choice via Markov chain Monte Carlo methods. Journal of the Royal Statistical Society,
Series B, 56, 473-484.
Chib, S. (1995). Marginal
likelihood from the Gibbs output. Journal
of the American Statistical Association, 90, 1313-1321.
D’Alò, M., Di Consiglio, L.,
Falorsi, S., Ranalli, M.G. and Solari, F. (2012). Use of spatial information in
small area models for unemployment rate estimation at sub-provincial areas in
Italy. Journal of the Indian Society of
Agricultural Statistics, 66, 43-53.
Datta, G.S., Lahiri, P., Maiti,
T. and Lu, K.L. (1999). Hierarchical Bayes estimation of unemployment rates for
the states of the US. Journal of the
American Statistical Association, 94, 1074-1082.
Fabrizi, E., Montanari, G.E.
and Ranalli, M.G. (2016). A hierarchical latent class model for predicting
disability small area counts from survey data. Journal of the Royal Statistical Society, Series A, 179, 103-132.
Fay, R.E., and Herriot, R.A.
(1979). Estimates of income for small places: An application of James-Stein
procedures to census data. Journal of the
American Statistical Association, 74, 269-277.
Gelman, A. (2006). Prior
distributions for variance parameters in hierarchical models (comment on
article by Browne and Draper). Bayesian
Analysis, 1(3), 515-534.
Gelman, A., Jakulin, A., Pittau,
M.G. and Su, Y.-S. (2008). A weakly informative default prior distribution for
logistic and other regression models. The
Annals of Applied Statistics, 2, 1360-1383.
Germain, S.E. (2010). Bayesian Spatio-Temporal Modelling of Rainfall
Through Non-Homogenous Hidden Markov Models. Ph.D. thesis, University of
Newcastle Upon Tyne.
Ghosh, M., Nangia, N. and
Kim, D.H. (1996). Estimation of median income of four-person families: A
Bayesian time series approach. Journal of
the American Statistical Association, 91, 1423-1431.
Harvey, A., and Chung,
C.-H. (2000). Estimating the underlying change in unemployment in the UK. Journal of the Royal Statistical Society,
Series A, 163, 303-309.
ISTAT (2014). I sistemi
locali del lavoro 2011. Rapporto Annuale
2014.
Jasra, A., Holmes, C. and
Stephens, D. (2005). Markov chain Monte Carlo methods and the label switching
problem in Bayesian mixture modeling. Statistical
Science, 20, 50-67.
Krieg, S., and van der Brakel,
J.A. (2012). Estimation of the monthly unemployment rate for six domains
through structural time series modelling with cointegrated trends. Computational Statistics & Data Analysis,
56, 2918-2933.
Lazarsfeld, P.F., Henry, N.W.
and Anderson, T.W. (1968). Latent Structure
Analysis. Houghton Mifflin Boston.
Liu, J.S., Wong, W.H. and
Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to
the comparisons of estimators and augmentation schemes. Biometrika, 81, 27-40.
MacDonald, I.L., and
Zucchini, W. (1997). Hidden Markov and
Other Models for Discrete-Valued Time Series. London: Series Chapman &
Hall.
Marhuenda, Y., Molina, I.
and Morales, D. (2013). Small area estimation with spatio-temporal Fay-Herriot
models. Computational Statistics &
Data Analysis, 58, 308-325.
Marin, J.-M., Mengersen, K.
and Robert, C.P. (2005). Bayesian modelling and inference on mixtures of
distributions. Handbook of Statistics, 25, 459-507.
Meng, X.-L. (1994).
Posterior predictive p-values. The Annals
of Statistics, 1142-1160.
Pfeffermann, D. (1991).
Estimation and seasonal adjustment of population means using data from repeated
surveys. Journal of Business &
Economic Statistics, 9, 163-175.
Pfeffermann, D., and
Burck, L. (1990). Robust small area estimation combining time series and cross-sectional
data. Survey Methodology, 16, 2, 217-237.
Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/1990002/article/14534-eng.pdf.
Pfeffermann, D., and Rubin-Bleuer,
S. (1993). Robust joint modelling of labour force series of small areas. Survey Methodology, 19, 2, 149-163.
Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/1993002/article/14458-eng.pdf.
Pfeffermann, D., and Tiller,
R. (2006). Small-area estimation with state-space models subject to benchmark
constraints. Journal of the American
Statistical Association, 101, 1387-1397.
Polson, N.G., and Scott, J.G.
(2012). On the half-cauchy prior for a global scale parameter. Bayesian Analysis, 7, 4, 887-902.
Rao, J.N.K. (2003). Small Area Estimation. Wiley Online
Library.
Rao, J.N.K., and Yu, M.
(1994). Small-area estimation by combining time-series and cross-sectional data. The Canadian Journal of Statistics, 22, 4, 511-528.
Spezia, L. (2010).
Bayesian analysis of multivariate gaussian hidden Markov models with an unknown
number of regimes. Journal of Time Series
Analysis, 31, 1-11.
Statistics Canada (2016).
Guide to the Labour Force Survey. Technical report, Statistics Canada,
Catalogue 71-543-G, available at
https://www150.statcan.gc.ca/n1/pub/71-543-g/71-543-g2016001-eng.pdf.
Tanner, M.A., and Wong,
W.H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American statistical
Association, 82, 528-540.
Van den Brakel, J.A., and
Krieg, S. (2015). Dealing with small sample sizes, rotation group bias and
discontinuities in a rotating panel design. Survey
Methodology, 41, 2, 267-296. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2015002/article/14231-eng.pdf.
Van der Brakel, J.A., and
Krieg, S. (2016). Small area estimation with state space common factor models
for rotating panels. Journal of the Royal
Statistical Society, Series A, 179, 763-791.
Van Dyk, D.A., and Meng,
X.-L. (2001). The art of data augmentation. Journal
of Computational and Graphical Statistics, 10(1), 1-50.
Vermunt, J.K., and
Magidson, J. (2002). Latent class cluster analysis. Applied Latent Class Analysis, 11, 89-106.
Wiggins, L.M. (1973). Panel Analysis: Latent Probability Models
for Attitude and Behavior Processes. Jossey-Bass.
Wolter, K. (2007). Introduction to Variance Estimation. New
York: Springer Science & Business Media.
You, Y., Rao, J.N.K. and
Gambino, J. (2003). Model-based unemployment rate estimation for the Canadian Labour
Force Survey: A hierarchical Bayes approach. Survey Methodology, 29, 1, 25-32. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2003001/article/6602-eng.pdf.