State space time series modelling of the Dutch Labour Force Survey: Model selection and mean squared errors estimation
Section 1. Introduction

Figures on the labour force produced by national statistical institutes (NSIs) are generally based on Labour Force Surveys (LFS). There is an increasing interest to producing these indicators at a monthly frequency (EUROSTAT 2015). Sample sizes are, however, hardly ever large enough even at the national level for producing sufficiently precise monthly labour force figures based on design-based estimators known from classical sampling theory (Särndal, Swensson and Wretman 1992; Cochran 1977). In such situations, small area estimation (SAE) techniques can be used to improve the effective sample size of domains by borrowing information from preceding periods or other domains, see Rao and Molina (2015) and Pfeffermann (2013). Repeated surveys in particular have a potential for improvement within the framework of structural time series (STS) or multilevel time series models.

STS models, as well as multilevel models, usually contain unknown hyperparameters that have to be estimated. If this uncertainty (here and further in this paper referred to as hyperparameter uncertainty) is not taken into account, estimated mean squared errors (MSEs) of the domain predictors become negatively biased. Within the framework of multilevel models, accounting for hyperparameter uncertainty is a necessary and common practice. It is routinely performed when those models are estimated with the empirical best linear unbiased predictor (EBLUP) or the hierarchical Bayesian (HB) approach, see Rao and Molina (2015), Chapter 6-7, 10. STS models, in turn, are not as widely used in SAE as multilevel models. The Kalman filter, usually applied to fit STS models, ignores the hyperparameter uncertainty, and therefore produces negatively-biased MSE estimates. Applications that give evidence for substantial advantages of STS models over the design-based approach treat the estimated model hyperparameters as known, see, e.g., Bollineni-Balabay, van den Brakel and Palm (2016a), Krieg and van den Brakel (2012). Pfeffermann and Rubin-Bleuer (1993), Tiller (1992).

At Statistics Netherlands, a multivariate STS model, proposed by Pfeffermann (1991), is used to produce official monthly labour force figures for the DLFS. The DLFS is, as in many other countries, based on a rotating panel and features insufficiently large sample sizes for production of monthly figures. The STS model applied to the design-based estimates uses sample information from preceding time periods and accounts for the so-called rotation group bias (RGB) and for autocorrelation in the survey errors. In this way, sufficiently precise monthly estimates of the unemployed labour force are obtained (see van den Brakel and Krieg 2015). STS models are also applied in the production of official statistics at the US Bureau of Labor Statistics, Tiller (1992). Interest to this technique has been growing among several other NSIs spread around the world, for example at NSIs of Australia (Zhang and Honchar 2016), Israel and the UK (ONS 2015).

This paper presents an extended Monte-Carlo simulation study, where the DLFS model acts as the data generation process. Such a simulation is an insightful step into the process of model selection before implementation in the production of official statistics. First, evaluating distributions of the hyperparameter estimators under different model specifications provides additional insight into the importance of retaining certain hyperparameters in the model. Standard model diagnostics for state space models provide limited information on irrelevant hyperparameters. In case of model overspecification, not only may the distribution of redundant hyperparameter estimates largely deviate from normality, but estimation of other hyperparameters may also be disturbed. Therefore, even if the model diagnostics is satisfactory, it may still be wise to simulate the model and to examine the distribution of the maximum likelihood (ML) estimator of the model’s hyperparameters.

Another aim of the simulation is evaluating to which extent uncertainty around the hyperparameter estimates affects estimation of the STS model-based MSEs. Ignoring the hyperparameter uncertainty in MSE estimation is only acceptable if the available time series are sufficiently long. Depending upon a particular application, the length of time required to be “sufficiently long” will vary. Most often, uninterrupted time series available at NSIs are relatively short, mainly due to survey redesigns. The literature offers several ways to account for the hyperparameter uncertainty in STS models: asymptotic approximation, bootstrapping and the full Bayesian approach (for the latter approach, see Durbin and Koopman (2012), Chapter 13). Among those approaches considered in this paper are the asymptotic approximation developed by Hamilton (1986), as well as parametric and non-parametric bootstrapping approaches developed by Pfeffermann and Tiller (2005) and Rodriguez and Ruiz (2012). These methods are applied to the DLFS model to find the best MSE estimation method in this real life application. This paper also illustrates how the hyperparameter uncertainty problem decays as the DLFS time series increase from 48 to 200 months.

The contribution of the paper is four-fold. First of all, it shows how the Monte Carlo simulation can be used to check for model overspecification (i.e., for redundant hyperparameters). Secondly, it suggests the best of the proposed approaches to MSE estimation for the DLFS and offers a more realistic evaluation of the variance reduction obtained with the STS model compared to the design-based approach. Thirdly, this Monte-Carlo study refutes the claim of Rodriguez and Ruiz (2012) about the superiority of their method over the bootstrap of Pfeffermann and Tiller (2005) in a more complex model. Finally, apart from MSE bias comparison, this paper also provides insight into the variance and MSEs of these MSE estimators. To the best of our knowledge, the variability of the above-mentioned bootstrap methods has not been studied yet.

The paper is structured as follows. Section 2 contains a description of the DLFS and the model currently used by Statistics Netherlands. Section 3 reviews the above-mentioned approaches to the MSE estimation. Details on the simulation setup specific to the DLFS are given in Section 4. Results are presented in Section 5. Section 6 contains concluding remarks.


Date modified: