Small area estimation for unemployment using latent Markov models
Section 5. Results

In this section we report the results of the application of the LMM area level SAE model to the LFS data presented in Section 2. We fit the model with k = 2, , 6 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGRbGaaGypaiaaikdacaaISaGaaG jbVlablAciljaaiYcacaaMe8UaaGOnaaaa@3A47@ latent states. For each value of k , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGRbGaaiilaaaa@330C@ we run one Markov chain with 100,000 iterations and then we consider a burn-in period of 50,000 iterations. The posterior means are approximated by means of the retained MCMC samples. Similarly, the variance of the samples approximates the posterior variance of θ i t . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaH4oqCdaWgaaWcbaGaamyAaiaads haaeqaaOGaaiOlaaaa@35F1@ We select k = 4 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGRbGaaGypaiaaisdaaaa@33E1@ using the proposed model selection approach. In fact, using expression (A.4), we obtain the following values for the posterior density of the data: p ( Θ ^ | k = 2 ) = 59,152 .41, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbWaaeWaaeaaceWHyoGbaKaaca aMc8+aaqqaaeaacaaMc8Uaam4Aaiaai2dacaaIYaaacaGLhWoaaiaa wIcacaGLPaaacaaI9aGaaeynaiaabMdacaqGSaGaaeymaiaabwdaca qGYaGaaeOlaiaabsdacaqGXaGaaeilaaaa@4411@ p ( Θ ^ | k = 3 ) = 64,405 .11, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbWaaeWaaeaaceWHyoGbaKaaca aMc8+aaqqaaeaacaaMc8Uaam4Aaiaai2dacaaIZaaacaGLhWoaaiaa wIcacaGLPaaacaaI9aGaaeOnaiaabsdacaqGSaGaaeinaiaabcdaca qG1aGaaeOlaiaabgdacaqGXaGaaeilaaaa@440C@ p ( Θ ^ | k = 4 ) = 68,816 .06, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbWaaeWaaeaaceWHyoGbaKaaca aMc8+aaqqaaeaacaaMc8Uaam4Aaiaai2dacaaI0aaacaGLhWoaaiaa wIcacaGLPaaacaaI9aGaaeOnaiaabIdacaqGSaGaaeioaiaabgdaca qG2aGaaeOlaiaabcdacaqG2aGaaeilaaaa@441B@ and p ( Θ ^ | k = 5 ) = 68,703 .75. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbWaaeWaaeaaceWHyoGbaKaaca aMc8+aaqqaaeaacaaMc8Uaam4Aaiaai2dacaaI1aaacaGLhWoaaiaa wIcacaGLPaaacaaI9aGaaeOnaiaabIdacaqGSaGaae4naiaabcdaca qGZaGaaGOlaiaaiEdacaaI1aGaaiOlaaaa@4435@

We validate our model selection procedure by comparing the final choice with that obtained using the Deviance Information Criterion (DIC). In particular, we focus on k = 4, 5 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGRbGaaGypaiaaisdacaaISaGaaG jbVlaaiwdaaaa@36E3@ latent states for which the Bayes rule provides the largest values. The DIC confirms our results because we obtain 8,334.0 and 8,362.4 for k = 4 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGRbGaaGypaiaaisdaaaa@33E1@ and k = 5 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGRbGaaGypaiaaiwdacaGGSaaaaa@3492@ respectively.

Figure 5.1 compares the map of estimates for the first and the last quarter of the whole period. These can be compared with the maps of direct estimates reported in Figure 2.1. In particular, estimates on the first row of Figure 5.1 are obtained by the proposed LMM area level model. Those on the second row are obtained using a cross-sectional Fay-Herriot (FH) model computed with the R package hbsae (Boonstra, 2012), while those on the last row are obtained using the You et al. (2003, YRG) model, for which we have considered three possible choices for ρ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHbpGCcaGGSaaaaa@33DC@ 0.50, 0.75, and 1.00, as in You et al. (2003). To measure the overall fit of the three alternative YRG models we have compared posterior predictive p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbGaaGPaVlabgkHiTaaa@34D9@ values (Meng, 1994). In particular, simulated values of a suitable discrepancy measure are generated from the posterior predictive distribution and, then, compared to the corresponding measure for the observed data. More specifically, if d ( Θ ^ , Θ ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGKbWaaeWaaeaaceWHyoGbaKaaca GGSaGaaGjbVlaahI5aaiaawIcacaGLPaaaaaa@3873@ is a discrepancy measure that depends on the observed data, Θ ^ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWHyoGbaKaacaGGSaaaaa@3350@ and the parameter matrix Θ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWHyoGaaiilaaaa@3340@ then the posterior predictive p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbGaaGPaVlabgkHiTaaa@34D9@ value is defined as P [ d ( Θ ^ * , Θ ) > d ( Θ ^ , Θ ) | Θ ^ ] , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGqbWaamWaaeaacaWGKbWaaeWaae aaceWHyoGbaKaadaahaaWcbeqaaiaacQcaaaGccaaMb8Uaaiilaiaa ysW7caWHyoaacaGLOaGaayzkaaGaaGOpaiaadsgadaqadaqaaiqahI 5agaqcaiaacYcacaaMe8UaaCiMdaGaayjkaiaawMcaaiaaykW7daab baqaaiaaykW7ceWHyoGbaKaaaiaawEa7aaGaay5waiaaw2faaiaaiY caaaa@4C0C@ where Θ ^ * MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWHyoGbaKaadaahaaWcbeqaaiaacQ caaaaaaa@337B@ is a sample from the posterior predictive distribution. If a model fits the observed data well, then the two values of the discrepancy measure are similar and, as a result, the value of the p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbGaaGPaVlabgkHiTaaa@34D9@ value is expected to be close to 0.5. On the other hand, p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbGaaGPaVlabgkHiTaaa@34D9@ values near 0 or 1 signal a model that is not well suited to the data. As in Datta et al. (1999) and in You et al. (2003), we use the following discrepancy measure

d ( Θ ^ , Θ ) = i = 1 m ( θ ^ i θ i ) Ψ i 1 ( θ ^ i θ i ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGKbWaaeWaaeaaceWHyoGbaKaaca GGSaGaaGjbVlaahI5aaiaawIcacaGLPaaacaaI9aWaaabCaeqaleaa caWGPbGaaGypaiaaigdaaeaacaWGTbaaniabggHiLdGcdaqadaqaai qahI7agaqcamaaBaaaleaacaWGPbaabeaakiabgkHiTiaahI7adaWg aaWcbaGaamyAaaqabaaakiaawIcacaGLPaaadaahaaWcbeqaaOGama i2gkdiIcaacaWHOoWaa0baaSqaaiaadMgaaeaacqGHsislcaaIXaaa aOWaaeWaaeaaceWH4oGbaKaadaWgaaWcbaGaamyAaaqabaGccqGHsi slcaWH4oWaaSbaaSqaaiaadMgaaeqaaaGccaGLOaGaayzkaaaaaa@54AC@

for the overall fit. The posterior predictive measure suggests that the model with ρ = 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHbpGCcaaI9aGaaGymaaaa@34AE@ provides a better fit to the data, in fact it takes value 0.188 for ρ = 1 .00 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHbpGCcaaI9aGaaeymaiaab6caca qGWaGaaeimaiaacYcaaaa@376E@ 0.103 for ρ = 0 .75 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHbpGCcaaI9aGaaeimaiaab6caca qG3aGaaeynaiaacYcaaaa@3779@ and 0.032 for ρ = 0 .50 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHbpGCcaaI9aGaaeimaiaab6caca qG1aGaaeimaiaab6caaaa@3773@ Note that for our model, we obtain a p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGWbGaaGPaVlabgkHiTaaa@34D9@ value equal to 0.311. We have also implemented the Datta et al. (1999) estimation approach. However, the number of areas and the overall number of observations made the estimation computationally prohibitive. For this reason, it is not considered further.

From Figure 5.1, we observe that all model-based estimates are smoother than the original direct estimates. Maps are color-coded according to the quartiles of the direct estimates for 2004-Q1. In general, estimates for 2004-Q1 show a quite distinct division between North, Center, and South of Italy, with relatively higher unemployment incidences in the South of the country. For 2014-Q4, unemployment incidences are all much higher all over the country, because of the economic crisis that hit the country in 2008. LMM and FH show similar patterns, and are in line with those of the direct estimator. YRG, on the other hand, provides more shrunk estimates and this is particularly evident for 2014-Q4 where a general and distinct underestimation is provided. This behavior is displayed for all time points. In fact, Figure 5.2 shows the absolute difference between the direct estimates and model-based estimates. Areas are ordered according to estimated variance of the direct estimates. All model-based estimators show a common general behavior: smaller differences for more precise estimates and increasingly larger differences for more variable direct estimates. However, we can note that YRG provides systematically larger positive differences, by this casting some concerns on bias.

Figure 5.1 of article 54956 issue 2018002

Description for Figure 5.1

Figure showing sic maps of Italy to compare the unemployment incidences estimated using LMM, FH and YRG for 2004-Q1 and 2014-Q4. In general, estimates for 2004-Q1 show a quite distinct division between North, Center, and South of Italy, with relatively higher unemployment incidences in the South of the country. For 2014-Q4, unemployment incidences are all much higher all over the country. LMM and FH show similar patterns, and are in line with those of the direct estimator. YRG, on the other hand, provides more shrunk estimates and this is particularly evident for 2014-Q4 where a general and distinct underestimation is provided.

Figure 5.2 of article 54956 issue 2018002

Description for Figure 5.2

Figure presenting three scatter plots of the differences between DIR and model-based small area estimates. Models are LMM, FH and YRG. Areas are arranged according to increasing estimated variance of the direct estimator. For each graph, the difference between the DIR and the model (dir-lmm, dir-fh and dir-yrg) is on the y-axis ranging from -5 to 15. The ordered MSE are on the x-axis, ranging from 0 to 25,000. All model-based estimators show a common general behavior: smaller differences for more precise estimates and increasingly larger differences for more variable direct estimates. However, we can note that YRG provides systematically larger positive differences, by this casting some concerns on bias.

As mentioned earlier, LMM uses a discrete random variable to model unobserved heterogeneity rather than the more common continuous (usually Gaussian) assumption. As a consequence, small areas can be clustered according to the latent state to which they belong at each time point. In this application, latent states are ordered and can be associated to the level of unemployment, conditionally on the covariates. Figure 5.3 shows the evolution of the latent states clustering for the small areas over the 44 time points. The fourth cluster is very small and comprises areas with a very high unemployment incidence. In addition, the pattern seems to be very stable over time, as the probability of changing latent state is very low. Note that, although there is a noticeable temporal trend in the data, this is captured by the dummy variables inserted to account for trend and seasonality. These finding are supported by the estimated initial and transition probabilities:

π ^ = ( 0 .505 , 0 .340 , 0 .144 , 0 .011 ) , Π ^ = ( 0 .967 0 .027 0 .004 0 .002 0 .020 0 .956 0 .020 0 .004 0 .007 0 .035 0 .946 0 .012 0 .035 0 .007 0 .030 0 .929 ) . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaafaqaaeGacaaabaGabCiWdyaajaaaba GaaGypamaabmaabaGaaeimaiaab6cacaqG1aGaaeimaiaabwdacaaI SaGaaGjbVlaabcdacaqGUaGaae4maiaabsdacaqGWaGaaGilaiaays W7caqGWaGaaeOlaiaabgdacaqG0aGaaeinaiaaiYcacaaMe8Uaaeim aiaab6cacaqGWaGaaeymaiaabgdaaiaawIcacaGLPaaadaahaaWcbe qaaOGamai2gkdiIcaacaaMb8UaaGilaaqaaiqahc6agaqcaaqaaiaa i2dadaqadaqaauaabeqaeqaaaaaabaGaaeimaiaab6cacaqG5aGaae OnaiaabEdaaeaacaqGWaGaaeOlaiaabcdacaqGYaGaae4naaqaaiaa bcdacaqGUaGaaeimaiaabcdacaqG0aaabaGaaeimaiaab6cacaqGWa GaaeimaiaabkdaaeaacaqGWaGaaeOlaiaabcdacaqGYaGaaeimaaqa aiaabcdacaqGUaGaaeyoaiaabwdacaqG2aaabaGaaeimaiaab6caca qGWaGaaeOmaiaabcdaaeaacaqGWaGaaeOlaiaabcdacaqGWaGaaein aaqaaiaabcdacaqGUaGaaeimaiaabcdacaqG3aaabaGaaeimaiaab6 cacaqGWaGaae4maiaabwdaaeaacaqGWaGaaeOlaiaabMdacaqG0aGa aeOnaaqaaiaabcdacaqGUaGaaeimaiaabgdacaqGYaaabaGaaeimai aab6cacaqGWaGaae4maiaabwdaaeaacaqGWaGaaeOlaiaabcdacaqG WaGaae4naaqaaiaabcdacaqGUaGaaeimaiaabodacaqGWaaabaGaae imaiaab6cacaqG5aGaaeOmaiaabMdaaaaacaGLOaGaayzkaaGaaGOl aaaaaaa@8C16@

Figure 5.3 of article 54956 issue 2018002

Description for Figure 5.3

Figure showing the latent states distribution from 2004-Q1 to 2014-Q4. The LLMAs are on the y-axis and time is on the x-axis. The latent states are divided into four clusters, u = 1 to 4. The pattern seems to be very stable over time, as the probability of changing latent state is very low. Cluster u = 2 is the largest, followed by clusters u = 1 and u = 3 and finally, cluster u = 4 is by far the smallest.

Figure 5.4 shows the time series of direct estimates and the corresponding model-based estimates for a selection of small areas. Aosta MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Jc9qqqrpepC0xbbL8F4rqqrFfFv0dg9Wqpe0dar pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Ff0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@3692@ panel (a) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Jc9qqqrpepC0xbbL8F4rqqrFfFv0dg9Wqpe0dar pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Ff0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@3692@ is a small LMA in the very North of the country, with a small level of unemployment. LMM smooths the direct estimates more than the other methods, while YRG tracks the path of the direct estimates, but provides a noticeable negative bias. Milan MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Jc9qqqrpepC0xbbL8F4rqqrFfFv0dg9Wqpe0dar pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Ff0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@3692@ panel (b) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Jc9qqqrpepC0xbbL8F4rqqrFfFv0dg9Wqpe0dar pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Ff0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@3692@ is a large city in the North of the country and the corresponding LMA has usually a very large sample size. As expected, FH and LMM track the values of DIR, while YRG exhibits a clear tendency to underestimation. Perugia and Brindisi are two mid-size towns in the Centre and in the South of Italy, respectively. The pattern of the model-based estimators is very clear: LMM provides a very good smoothing of the quite erratic trend of the direct estimates, better than FH, while YRG again displays a tendency to negative bias, particularly after the first few quarters.

It is expected that model-based estimates, besides providing estimates for the out-of-sample areas, provide gains in efficiency over direct estimates. In Figure 5.5 we report the distribution of the CV for comparing model-based small areas estimates for each time point, classified as in Figure 2.3 according to different relevant values of CV. FH provides estimates for out of sample areas, but it does not seem to provide a useful estimation option for these data since only few estimates have CV smaller than 16%. On the other hand, YRG provides a very good improvement in terms of estimated efficiency, with almost all estimates with a CV smaller than 33.3%. LMM provides a good improvement over FH with only approximately 15% of the small area estimates with a CV larger than 33.3%.

Figure 5.4 of article 54956 issue 2018002

Description for Figure 5.4

Figure made of four linear graph to compare the time series of direct and model-based estimates for a selection of four small areas. For each graph, the x-axis is the time from 2004-Q1 to 2014-Q4. There are four lines on each graph: the direct, FH, YRG and LMM estimates. The first graph presents Aosta, with unemployment on the y-axis going from 0 to 5. We can see that LMM smooths the direct estimates more than the other methods, while YRG tracks the path of the direct estimates, but provides a noticeable negative bias. The second graph presents Milan, with unemployment on the y-axis going from 0 to 4. Here, FH and LMM track the values of DIR, while YRG exhibits a clear tendency to underestimation. The third and fourth graphes present Perugia (unemployment on the y-axis going from 0 to 10) and Brindisi (unemployment on the y-axis going from 0 to 9). For these two towns, LMM provides a very good smoothing of the quite erratic trend of the direct estimates, better than FH, while YRG again displays a tendency to negative bias, particularly after the first few quarters.

In addition, small area estimates should be close to population level quantities, when available. Here, we use data from the 2011 Italian Population Census and consider unemployment incidence for LMAs from the Census as a gold standard. In particular, we evaluate the distance between small area estimates for the closest time point, namely 2011-Q4, and the Census value, Cens i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaqGdbGaaeyzaiaab6gacaqGZbWaaS baaSqaaiaadMgaaeqaaOGaaiilaaaa@36D5@ and compute the Absolute Relative Error for each area ( ARE i ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaqadaqaaiaabgeacaqGsbGaaeyram aaBaaaleaacaWGPbaabeaaaOGaayjkaiaawMcaaaaa@367A@ as

ARE i = | θ ^ i Cens i | Cens i ( 5.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaqGbbGaaeOuaiaabweadaWgaaWcba GaamyAaaqabaGccaaI9aWaaSaaaeaadaabdaqaaiaaykW7cuaH4oqC gaqcamaaBaaaleaacaWGPbaabeaakiabgkHiTiaaboeacaqGLbGaae OBaiaabohadaWgaaWcbaGaamyAaaqabaGccaaMc8oacaGLhWUaayjc SdaabaGaae4qaiaabwgacaqGUbGaae4CamaaBaaaleaacaWGPbaabe aaaaGccaaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaacIcacaaI1aGa aiOlaiaaigdacaGGPaaaaa@5493@

for each area i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGPbGaaiOlaaaa@330C@ The ARE i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaqGbbGaaeOuaiaabweadaWgaaWcba GaamyAaaqabaaaaa@34E7@ also provides a sort of measure of relative bias and is important to evaluate and compare the performance in terms of overall error of the estimates. Note that the small area parameter of interest and the Census quantity do not have exactly the same definition. In fact, the LFS is a continuous survey and the corresponding unemployment incidence refers to a quarter, while that from the Census refers to a specific calendar day. In addition, order and wording of items in the two questionnaires used to evaluate the unemployment status differ slightly. We compare the distribution of ARE i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaqGbbGaaeOuaiaabweadaWgaaWcba GaamyAaaqabaaaaa@34E7@ for LMM and YRG in Figure 5.6. From the empirical distribution of ARE i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaqGbbGaaeOuaiaabweadaWgaaWcba GaamyAaaqabaGccaGGSaaaaa@35A1@ we observe that LMM systematically provides smaller values than YRG. When looking at the subgroup of in-sample areas, we can compare this distribution with that of the direct estimator, and we conclude that LMM is in line with DIR for almost one half of the small areas, and then LMM provides estimates with a relatively smaller value of ARE i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xb9peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaqGbbGaaeOuaiaabweadaWgaaWcba GaamyAaaqabaGccaGGUaaaaa@35A3@ In conclusion, YRG estimates have a lower estimated variance, but exhibit higher estimated bias, in terms of the comparison with the Census and the direct estimates. This puts concern on coverage. On the other hand, LMM estimates are not as good as YRG estimates in terms of CV, but when looking at the bias, the overall behavior seems to be much more reliable.

Figure 5.5 of article 54956 issue 2018002

Description for Figure 5.5

Figure made of four graphs: the distribution of the coefficients of variation for DIR, LMM, FH and YRG estimates, from 2004-Q1 to 2014-Q4. LLMAs are on the y-axis and time is on the x-axis. The CVs are divided into three classes: below 16.6%, between 16.6 and 33.3% and above 33.3%. The first graph showing the CV distribution of the direct estimates is the same as Figure 2.3. The CV distribution of the FH estimates shows that only a few estimates have CV smaller than 16.6%. On the other hand, almost all YRG estimates have a CV smaller than 33.3% and a large portion has a CV lower than 16.6%. Finally, LMM provides a good improvement over FH with only approximately 15% of the small area estimates with a CV larger than 33.3%.

Figure 5.6 of article 54956 issue 2018002

Description for Figure 5.6

Figure made of two linear graphs to present the empirical distribution of for in-sample areas and for all areas. The is on the x-axis ranging from 0 to 1.5. On the first graph illustrating the in-sample areas, there are three lines to show the LMM, YRG and DIR estimates. We can see that LMM is in line with DIR for almost one half of the small areas, and then LMM provides estimates with a relatively smaller value of On the second graph for all areas, there are two lines to illustrate the LMM and YRG estimates. For both graphs, LMM systematically provides smaller values of than YRG.


Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: