Estimation of level and change for unemployment using structural time series models
Section 5. Results
The results obtained with the state space and multilevel time series representations of the STMs are described in Subsections 5.1 and 5.2, respectively. First, two discrepancy measures are defined to evaluate and compare the different models. The first measure is the Mean Relative Bias (MRB), which summarizes the differences between model estimates and direct estimates averaged over time, as percentage of the latter. For a given model the is defined as
where are the direct estimates by province and month incorporating the ratio RGB adjustment mentioned at the end of Section 3. This benchmark measure shows for each province how much the model-based estimates deviate from the direct estimates. The discrepancies should not be too large as one may expect that the direct estimates averaged over time are close to the true average level of unemployment. The second discrepancy measure is the Relative Reduction of the Standard Errors (RRSE) and measures the percentages of reduction in estimated standard errors between model-based and direct estimates, i.e.,
for a given model Here the estimated standard errors for the direct estimates follow from a variance approximation for whereas the model-based standard errors are posterior standard deviations or follow from the Kalman filter/smoother. Posterior standard deviations, standard errors obtained via the Kalman filter and standard errors of the direct estimators come from different frameworks and are formally spoken not comparable. They are used in (5.2) to quantify the reduction with respect to the direct estimator only but not intended as model selection criteria.
5.1 Results state space models
Ten different state space models are compared. Four different trend models are distinguished. The first trend component is a smooth trend model without correlations between the domains (4.3), abbreviated as T1. The second trend model, T2, is a smooth trend model (4.3) with a full correlation matrix for the slope disturbances (4.9). The third trend component, T3, is a common smooth trend model for all provinces with eleven local level trend models for the deviation of the domains from this overall trend ((4.10) in combination with (4.11)). The fourth trend model, T4, is a common smooth trend model for all provinces with eleven smooth trend models for the deviation of the domains from this overall trend ((4.10) in combination with (4.3)). In T3 and T4 the province Groningen is taken equal to the overall trend. The component for the RGB (4.4) can be domain specific (indicated by letter “R” in the model’s name) or chosen equal for all domains (no “R” in the model’s name). An alternative simplicfication is to assume that RGB for waves 2, 3, 4 and 5 are equal but domain specific (indicated by “R2”). In a similar way the seasonal component can be chosen domain specific (indicated by “S”) or taken equal for all domains. All models share the same component for the survey error, i.e., an AR(1) model with time varying autocorrelation coefficients for wave 2 through 5 to model the autocorrelation in the survey errors. The following state space models are compared:
For all models, the ML estimates for the hyperparameters of the RGB and the seasonals tend to zero, which implies that these components are time invariant. Also the ML estimates for the variance components of the white noise of the population domain parameters tend to zero. This component is therefore removed from model (4.2). The ML estimates for the variance components of the survey errors in the first wave vary between 0.93 and 1.90. For the follow-up waves, the ML estimates vary between 0.86 and 1.80. The variances of the direct estimates are pooled over the domains (3.2), which might introduce some bias, e.g., underestimation of the variance in domains with high unemployment rates. Scaling the variances of the survey errors with the ML estimates for is neccessary to correct for this bias. The ML estimates for the hyperparameters for the trend components can be found in Boonstra and van den Brakel (2016).
Models are compared using the log likelihoods. To account for differences in model complexity, Akaike Information Criteria (AIC) and Bayes Information Criteria (BIC) are used, see Durbin and Koopman (2012), Section 7.4. Results are summarized in Table 5.1. Parsimonious models where the seasonals or RGB are equal over the domains are preferred by the AIC or BIC criteria. Note, however, that the likelihoods are not completely comparable between models. To obtain comparable likelihoods, the first 24 months of the series are ignored in the computation of the likelihood for all models. Some of the likelihoods are nevertheless odd. For example the likelihood of T2SR is smaller than the likelihood of T2S, although T2SR contains more model parameters. This is probably the result of large and complex time series models in combination with relatively short time series, which gives rise to flat likelihood functions. Also from this point of view, sparse models that avoid over-fitting are still favorable, which is in line with the results of the AIC and BIC values in Table 5.1.
Model | log likelihood | states | hyperparameters | AIC | BIC |
---|---|---|---|---|---|
T1SR | 9,813.82 | 204 | 24 | -399.41 | -390.52 |
T2SR | 9,862.86 | 204 | 35 | -400.99 | -391.68 |
T2S | 9,879.03 | 160 | 35 | -403.50 | -395.90 |
T2R | 9,859.97 | 83 | 35 | -405.92 | -401.32 |
T3SR | 9,855.35 | 193 | 24 | -401.60 | -393.14 |
T3R | 9,851.62 | 72 | 24 | -406.48 | -402.74 |
T3R2 | 9,871.65 | 36 | 24 | -408.82 | -406.48 |
T3 | 9,881.16 | 28 | 24 | –409.55 | -407.52 |
T4SR | 9,857.47 | 204 | 24 | -401.23 | -392.34 |
T4R | 9,853.65 | 83 | 24 | -406.11 | -401.94 |
Modeling correlations between slope disturbances of the trend results in a significant model improvement. Model T1SR, e.g., is nested within T2SR and a likelihood ratio test clearly favours the latter. For model T2SR it follows that the dynamics of the trends for these 12 domains can be modeled with only 2 underlying common trends, since the rank of the covariance matrix equals two. As a result the full covariance matrix for the slope disturbances of the 12 domains is actually modeled with 23 instead of 78 hyperparameters. This shows that the correlations between the slope disturbances are very strong. Correlations indeed vary between 1.00 and 0.98. See Boonstra and van den Brakel (2016) for the ML estimates of the full covariance matrix.
Table 5.2 shows the MRB, defined by (5.1). Models that assume that the RGB is equal over the domains, i.e., T2S and T3, have large relative biases for some of the domains. Large biases occur in the domains where unemployment is large (e.g., Groningen) or small (e.g., Utrecht) compared to the national average. A possible compromise between parsimony and bias is to assume that the RGB is equal for the four follow-up waves but still domain specific (T3R2). For this model the bias is small, with the exception of Gelderland.
Grn | Frs | Drn | Ovr | Flv | Gld | Utr | N-H | Z-H | Zln | N-B | Lmb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
T1SR | 1.1 | 0.5 | 2.0 | -0.2 | 0.1 | 3.4 | 0.1 | 0.6 | 1.7 | -2.1 | 0.5 | 2.1 |
T2SR | 1.2 | 0.7 | 2.2 | -0.1 | 0.2 | 3.5 | 0.2 | 0.6 | 1.7 | -2.1 | 0.5 | 2.1 |
T2S | -3.1 | 3.1 | 0.7 | 0.9 | -4.4 | 2.8 | 2.4 | 0.8 | 0.5 | 1.7 | 1.8 | 1.5 |
T2R | 0.9 | 0.8 | 1.8 | -0.2 | -0.4 | 3.4 | 0.1 | 0.6 | 1.7 | -1.6 | 0.6 | 2.2 |
T3SR | 0.8 | 0.6 | 2.0 | -0.2 | -0.3 | 3.5 | 0.3 | 0.5 | 1.7 | -2.0 | 0.6 | 2.0 |
T3R2 | -0.1 | 1.3 | 2.1 | -0.6 | -0.8 | 3.6 | 0.9 | 0.6 | 1.5 | -1.1 | 1.0 | 1.2 |
T3R | 0.5 | 0.7 | 1.8 | -0.2 | -0.8 | 3.5 | 0.3 | 0.5 | 1.6 | -1.5 | 0.7 | 2.1 |
T3 | -4.0 | 2.5 | 0.1 | 0.9 | -5.0 | 2.8 | 2.3 | 0.7 | 0.6 | 2.5 | 2.0 | 1.3 |
T4SR | 0.8 | 0.7 | 2.1 | -0.2 | 0.0 | 3.5 | 0.2 | 0.6 | 1.7 | -1.9 | 0.5 | 2.1 |
T4R | 0.6 | 0.7 | 1.8 | -0.2 | -0.6 | 3.4 | 0.1 | 0.6 | 1.7 | -1.3 | 0.7 | 2.1 |
In Figure 5.1 the smoothed trends and standard errors of models T1SR, T2SR and T2S are compared. The month-to-month development of the trend and the standard errors for these three models are compared in Figure 5.2. The smoothed trends obtained with the common trend model are slightly more flexible compared to a model without correlation between the slope disturbances. This is clearly visible in the month-to-month change of the trends. Modeling the correlation between slope disturbances clearly reduces the standard error of the trend and the month-to-month change of the trend. Assuming that the RGB is equal for all domains (model T2S) affects the level of the trend and further reduces the standard error, mainly since the number of state variables are reduced. The difference between the trend under T2SR and T2S is a level shift. This follows from the month-to-month changes of the trend under model T2SR and T2S, which are exactly equal. According to AIC and BIC the reduction of the number of state variables by assuming equal RGB for all domains is an improvement of the model. In this application, however, interest is focused on the model fit for the separate domains. Assuming that the RGB is equal over all domains is on average efficient for overall goodness of fit measures, like AIC and BIC, but not necessarily for all separate domains. The bias introduced in the trends of some of the domains by taking the RGB equal over the domains is undesirable.
In Figure 5.3 the smoothed trends and standard errors of models T2SR, T3SR and T4SR are compared. The month-to-month developments of the trend and the standard errors can be found in Boonstra and van den Brakel (2016). The trends obtained with one overall smooth trend plus eleven trends for the domain deviations of the overall trend resemble trends obtained with the common trend model. In this application the dynamics based on the two common trends of model T2SR are reasonably well approximated by the alternative trends of models T3SR and T4SR. This is an empirical finding that may not generalize to other situations, particularly when more common factors are required. The common trend model, however, has the smallest standard errors for the trend. Furthermore, the trends under the model with a local level for the domain deviations from the overall trend are in some domains more volatile compared to the other two models. This is most obvious in the month-to-month changes of the trend. It is a general feature for trend models with random levels to have more volatile trends, see Durbin and Koopman (2012), Chapter 3. The more flexible trend model of T3 also results in a higher standard error of the month-to-month changes.
Assuming that the seasonals are equal for all domains is another way of reducing the number of state variables and avoid over-fitting of the data. This assumption does not affect the level of the trend since the MRB is small (see Table 5.2) and results in a significant improvement of the model according to AIC and BIC. Particularly if interest is focused on trend estimates, some bias in the seasonal patterns is acceptable and a model with a trend based on T2, or T4, with the seasonal component assumed equal over the domains, might be a good compromise between a model that accounts sufficiently for differences between domains and model parsimony to avoid over-fitting of the data.
Model T3 is the most parsimonious model that is the best model according to AIC and BIC. Particularly the assumption of equal RGB results in biased trend estimates in some of the domains (see Table 5.2). See Boonstra and van den Brakel (2016) for a comparison of the trend and the month-to-month development of the trend of models T2R, T3 and T4R. Assuming that the seasonals are equal over the domains, results in a less pronounced seasonal pattern. See Boonstra and van den Brakel (2016) for a comparison of the signals for models T2SR and T2R.
In Boonstra and van den Brakel (2016) results for year-to-year change of the trends under models T2R and T3R2 are included. Time series estimates for year-to-year change are very stable and precise and greatly improve the direct estimates for year-to-year change.
Table 5.3 shows the RRSE, defined by (5.2), for the ten state space models. Recall that the RRSE quantifies the reduction with respect to the direct estimator and is not intended as a model selection criterion. Table 5.4 contains the averages of standard errors for signal, trend, and growth (month-to-month differences of trend). The average is taken over all months and provinces. Modeling the correlation between the trends explicitly (T2) or implicitly (T3 or T4) reduces the standard errors for the trend and signal significantly. The time series modeling approach is particularly appropriate to estimate month-to-month changes through the trend component. The precision of the month-to-month changes, however, strongly depends on the choice of the trend model. A local level trend model (T3) results in more volatile trends and has a clearly larger standard error for the month-to-month change. Parsimonious models where RGB or the seasonal components are assumed equal over the domains result in further strong standard error reductions at the cost of introducing bias in the trend or the seasonal patterns.
Description for Figure 5.1
This figure shows 6 line-graphs with the time scale on the x-axis ranging from January 2014 to the end of 2008 and the unemployment rate on the y-axis with various ranges. The direct estimation and the three methods (STS T1SR, STS T2SR and SRS T2S) are compared in each graph representing one of 3 domains in two forms: the estimates themselves of the smoothed trend and their standard errors.
Description for Figure 5.2
This figure shows 6 line-graphs with the time scale on the x-axis ranging from January 2014 to the end of 2008 and the unemployment rate on the y-axis with various ranges. The three methods (STS T1ST, STS T2SR and SRS T2S) are compared in each graph representing one of 3 domains in two forms: the smoothed month-to-month development and their standard errors.
Description for Figure 5.3
This figure shows 6 line-graphs with the time scale on the x-axis ranging from January 2014 to the end of 2008 and the unemployment rate on the y-axis with various ranges. The direct estimation and three methods (STS T2SR, STS T3SR and SRS T3SR) are compared in each graph representing one of 3 domains in two forms: the estimates themselves of the smoothed trend and their standard errors.
Grn | Frs | Drn | Ovr | Flv | Gld | Utr | N-H | Z-H | Zln | N-B | Lmb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
T1SR | 36 | 36 | 38 | 42 | 43 | 44 | 47 | 47 | 45 | 50 | 47 | 43 |
T2SR | 43 | 42 | 43 | 48 | 49 | 49 | 53 | 53 | 50 | 54 | 53 | 48 |
T2S | 49 | 48 | 51 | 53 | 55 | 54 | 58 | 56 | 54 | 58 | 56 | 54 |
T2R | 64 | 63 | 62 | 65 | 66 | 63 | 68 | 68 | 63 | 73 | 67 | 64 |
T3SR | 45 | 41 | 45 | 48 | 42 | 51 | 49 | 50 | 48 | 53 | 49 | 50 |
T3R | 67 | 62 | 63 | 66 | 56 | 61 | 62 | 64 | 65 | 70 | 60 | 66 |
T3R2 | 68 | 63 | 64 | 67 | 57 | 62 | 62 | 65 | 65 | 70 | 60 | 67 |
T3 | 79 | 74 | 76 | 75 | 65 | 69 | 69 | 69 | 69 | 76 | 63 | 76 |
T4SR | 43 | 41 | 45 | 48 | 45 | 50 | 49 | 53 | 50 | 54 | 51 | 49 |
T4R | 65 | 63 | 64 | 65 | 62 | 62 | 63 | 68 | 63 | 73 | 63 | 65 |
se(signal) | se(trend) | se(growth) | |
---|---|---|---|
direct | 100 | This is an empty cell | This is an empty cell |
T1SR | 57 | 41 | 6 |
T2SR | 51 | 33 | 4 |
T2S | 46 | 23 | 4 |
T2R | 34 | 33 | 4 |
T3SR | 53 | 35 | 9 |
T3R | 36 | 35 | 9 |
T3R2 | 36 | 34 | 9 |
T3 | 28 | 26 | 9 |
T4SR | 52 | 34 | 4 |
T4R | 35 | 34 | 4 |
5.2 Results multilevel models
The ten models T1SR to T4R on pages 408-409 fitted as a state space model with the Kalman filter have also been fitted using the Bayesian multilevel approach using a Gibbs sampler. See Boonstra and van den Brakel (2016) for a detailed description of the fixed effect design matrices and random effect design and precision matrices corresponding to these models. The Bayesian approach accounts for uncertainty in the hyperparameters by considering their posterior distributions, implying that variance parameters do not actually become zero, as frequently happens for the ML estimates in the state space approach. For comparison purposes, however, effects absent from the state space model due to zero ML estimates have also been suppressed in the corresponding multilevel models. In addition to these ten models we consider one more model with extra terms including a dynamic RGB component as well as a white noise term.
Differences between state space and multilevel estimates based on the ten models considered can arise because of
- the different estimation methods, ML versus MCMC,
- the different modeling of survey errors. In the multilevel models the survey errors’ covariance matrix is taken to be with the covariance matrix of estimated design variances for the initial estimates for province and scaling factors, one for each province. In the state space models the survey errors are allowed to depend on more parameters though eventually an AR(1) model is used to approximate these dependencies,
- the slightly different parameterizations of the trend components. For the trend in model T3, for example, the province of Groningen is singled out by the state space model used, because no local level component is added for that province.
The estimates and, to a lesser extent, the standard errors based on the multilevel models are quite similar to the results obtained with the state space models. We show this only for the smoothed signals of model T2R in Figure 5.4, as the qualitative differences between state space and multilevel results are quite consistent over all models. More comparisons for signals, trends and month-to-month developments for models T2R and T3R2 can be found in Boonstra and van den Brakel (2016).
The small differences between the state space and multilevel signal estimates are due to slightly more flexible trends in the estimated multilevel models. Larger differences can be seen in the standard errors of the signal: the multilevel models yield almost always larger standard errors for provinces with high unemployment levels (Flevoland and Zuid-Holland in the figure), whereas for provinces with smaller unemployment levels (e.g., Zeeland) the differences are somewhat less pronounced.
The larger flexibility of the multilevel model trends is most likely due to the relatively large uncertainty about the variance parameters for the trend, which is accounted for in the Bayesian multilevel approach but ignored in the ML approach for the state space models. The posterior distributions for the trend variance parameters are also somewhat right-skewed. The posterior means for the standard deviations are always larger than the ML estimates for the corresponding hyperparameters of the state space models (compare Table 2 and Table 8 in Boonstra and van den Brakel (2016)). For the models with trend T2, i.e., with a fully parametrized covariance matrix over provinces, the multilevel models show positive correlations among the provinces, as do the state space ML estimates, but the latter are much more concentrated near 1, whereas the posterior means for correlations in the corresponding multilevel model T2SR are all between 0.45 and 0.8.
Table 5.5 contains values of the DIC model selection criterion (Spiegelhalter, Best, Carlin and van der Linde, 2002), the associated effective number of model parameters and the posterior mean of the log-likelihood. The parsimonious model T3 is selected as the most favourable model by the DIC criterion. So in this case the DIC criterion selects the same model as the AIC and BIC criteria do for the state space models. An advantage of DIC is that it uses an effective number of model parameters depending on the size of random effects, instead of just the number of model parameters used in AIC/BIC. That said, the numbers are in line with the totals of the numbers of states and hyperparameters in Table 5.1 for the state space models.
Description for Figure 5.4
This figure shows 6 line-graphs with the time scale on the x-axis ranging from January 2014 to the end of 2008 and the unemployment rate on the y-axis with various ranges. The direct estimation and three methods (STS T2SR, STS T3SR and SRS T3SR) are compared in each graph representing one of 3 domains in two forms: the estimates themselves of the smoothed trend and their standard errors.
DIC | mean llh | ||
---|---|---|---|
T1SR | -29,054 | 255 | 14,655 |
T2SR | -29,076 | 235 | 14,656 |
T2S | -29,129 | 196 | 14,662 |
T2R | -29,164 | 118 | 14,641 |
T3SR | -29,081 | 242 | 14,662 |
T3R | -29,174 | 126 | 14,650 |
T3R2 | -29,217 | 94 | 14,655 |
T3 | -29,230 | 82 | 14,656 |
T4SR | -29,084 | 228 | 14,656 |
T4R | -29,170 | 109 | 14,640 |
As was the case for the state space models, the parsimonious model T3 comes with larger average bias over time for the provinces Groningen and Flevoland, which have the highest rates of unemployment. Model T3R2 has much smaller average biases for Groningen and Flevoland and since its DIC value is not that much higher than for model T3, model T3R2 seems to be a good compromise between models T3 and T3R, being more parsimonious than T3R and respecting provincial differences better than model T3.
Table 5.6 contains the average standard errors for signal, trend and month-to-month differences in the trend, in comparison to the average for the direct estimates. The average is taken over all months and provinces. The results are again similar to the results obtained with the state space models, see Table 5.4, although especially the standard errors of month-to-month changes are larger under the multilevel models.
se(signal) | se(trend) | se(growth) | |
---|---|---|---|
direct | 100 | This is an empty cell | This is an empty cell |
T1SR | 55 | 41 | 8 |
T2SR | 52 | 37 | 6 |
T2S | 49 | 33 | 7 |
T2R | 39 | 38 | 6 |
T3SR | 53 | 38 | 15 |
T3R | 39 | 38 | 15 |
T3R2 | 39 | 38 | 15 |
T3 | 34 | 32 | 15 |
T4SR | 51 | 36 | 6 |
T4R | 37 | 36 | 6 |
Finally, a multilevel model based on model T3R2 but with additional random effects has been fitted to the data. This extended model includes a white noise term, the balanced dummy seasonal (equivalent to the trigonometric seasonal), and a dynamic RGB component. These components were seen to be absent or time independent in the state space approach due to zero ML hyperparameter estimates, and therefore were also not included in the multilevel models considered so far. In addition, the extended multilevel model includes season by province random effects, as a compromise between fixed provincial seasonal effects and no such interaction effects at all. More details and figures comparing the estimation results from this extended model to those from multilevel models T3R2 and T3SR can be found in Boonstra and van den Brakel (2016). It was found that most additional random effects were small so that the estimates based on the extended model are quite close to the estimates based on model T3R2, and the estimated standard errors are only slightly larger than those for model T3R2. A DIC value of -29,260 was found, well below the DIC value for model T3R2. This improvement in DIC was seen to be almost entirely due to the dynamic RGB component. Apparently, modeling the RGB as time-dependent results in a better fit. This seems to be in line with the temporal variations in differences between first wave and follow-up wave survey regression estimates, visible from Figure 3 in Boonstra and van den Brakel (2016).
- Date modified: