Small area estimation using Fay-Herriot area level model with sampling variance smoothing and modeling
Section 4. Application

In this section, we apply the models in Sections 2 and 3 to the Canadian Labour Force Survey (LFS) data and compare the EBLUP and HB estimates. The LFS releases monthly unemployment rate estimates for large areas such as the nation and provinces as well as local areas such as Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs) across Canada. The direct LFS estimates for some local areas are not reliable exhibiting very large coefficient of variations (CVs) due to small sample sizes. Model-based estimators are considered to improve the direct LFS estimates. As an illustration, we apply the Fay-Herriot model to the May 2016 unemployment rate estimates at the CMA/CA level, and compare the model-based estimates and the direct estimates with the census estimates to compare the effects of sampling variance smoothing and modeling. Hidiroglou et al. (2019) also compared the model-based LFS estimates with the census estimates. For the unemployment rate estimation, the local area employment insurance monthly beneficiary rate is used as an auxiliary variable in the model. For comparison of point estimates, we compute the absolute relative error (ARE) of the direct and model estimates with respect to the census estimates for each CMA/CA as follows:

ARE i =| θ i Census θ i Est θ i Census |, MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaeyqaiaabk facaqGfbWaaSbaaSqaaiaadMgaaeqaaOGaaGjbVlaaysW7cqGH9aqp caaMe8UaaGjbVpaaemaabaWaaSaaaeaacaaMc8UaeqiUde3aa0baaS qaaiaadMgaaeaacaqGdbGaaeyzaiaab6gacaqGZbGaaeyDaiaaboha aaGccqGHsislcqaH4oqCdaqhaaWcbaGaamyAaaqaaiaabweacaqGZb GaaeiDaaaakiaaykW7aeaacqaH4oqCdaqhaaWcbaGaamyAaaqaaiaa boeacaqGLbGaaeOBaiaabohacaqG1bGaae4CaaaaaaaakiaawEa7ca GLiWoacaGGSaaaaa@6105@

where θ i Est MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiUde3aa0 baaSqaaiaadMgaaeaacaqGfbGaae4Caiaabshaaaaaaa@3D9C@  is the direct or the EBLUP/HB estimate and θ i Census MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiUde3aa0 baaSqaaiaadMgaaeaacaqGdbGaaeyzaiaab6gacaqGZbGaaeyDaiaa bohaaaaaaa@406A@  is the corresponding census value of the unemployment rate. Then we take the average of AREs over CMA/CAs. For CV, we compute the average CVs of the direct and model-based estimates. We prefer a model with smaller ARE and smaller CV.

We first apply the models to all the 117 CMA/CAs with sample size 2, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyyzImRaaG OmaiaacYcaaaa@3B48@  and then apply them to 92 CMA/CAs with sample size 5, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyyzImRaaG ynaiaacYcaaaa@3B4B@  and finally 79 CMA/CAs with sample size 7. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyyzImRaaG 4naiaac6caaaa@3B4F@  Table 4.1 presents the average ARE and the corresponding average CV (in brackets). In Table 4.1, the model with Smoothed sv indicates that a smoothed sampling variance is used, Direct sv indicates that a direct sampling variance estimate is used.

With Smoothed sv, both FH-EBLUP and FH-HB substantially improve the direct survey estimates with much smaller ARE and CV. In particular, FH-HB has the smallest ARE, and FH-EBLUP has the smallest CV. For example, over the 117 areas, the direct LFS estimator has ARE 0.263 with average CV 0.329, FH-EBLUP Smoothed sv has ARE 0.124 with average CV 0.087, FH-HB Smoothed sv has ARE 0.118 with average CV 0.116. The good performance of FH-EBLUP and FH-HB with Smoothed sv indicates that the smoothing GVF (2.2) is very useful and effective in improving the model-based estimates.

With Direct sv, both FH-EBLUP and FH-HB perform the worst among all the models, with almost identical results under this scenario. The other three HB models perform better than the FH-EBLUP and FH-HB using direct sv. YLLM and STKM perform better than YCM with smaller ARE and smaller CV. YLLM and STKM perform very similarly for all the CMA/CA groups, and YLLM consistently has slightly smaller ARE than STKM, but YLLM has slightly larger CV than STKM. For example, over the 117 areas, YLLM has ARE 0.135, STKM has ARE 0.137, and YLLM has average CV 0.123, and STKM has average CV 0.122. YCM has ARE 0.148 with CV 0.136, FH-HB has ARE 0.171 with CV 0.221.


Table 4.1
Comparison of average absolute relative error (ARE) and average CV in parenthesis
Table summary
This table displays the results of Comparison of average absolute relative error (ARE) and average CV in parenthesis. The information is grouped by CMA/CAs (appearing as row headers), Direct, FH-EBLUP, FH-HB, YCM, YLLM and STKM (appearing as column headers).
CMA/CAs Direct FH-EBLUP FH-HB FH-EBLUP FH-HB YCM YLLM STKM
LFS Smoothed sv Smoothed sv Direct sv Direct sv Direct sv Direct sv Direct sv
Average over 117 CMA/CAs 0.263 0.124 0.118 0.170 0.171 0.148 0.135 0.137
(sample size 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGak0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyyzImRaaG Omaaaa@3CBA@ ) (0.329) (0.087) (0.116) (0.238) (0.221) (0.136) (0.123) (0.122)
Average over 92 CMA/CAs 0.216 0.124 0.116 0.133 0.132 0.132 0.125 0.127
(sample size 5 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGak0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyyzImRaaG ynaaaa@3CBD@ ) (0.262) (0.076) (0.103) (0.123) (0.123) (0.121) (0.117) (0.116)
Average over 79 CMA/CAs 0.181 0.122 0.113 0.126 0.122 0.122 0.118 0.120
(sample size 7 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGak0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyyzImRaaG 4naaaa@3CBF@ ) (0.232) (0.057) (0.094) (0.115) (0.115) (0.115) (0.114) (0.113)

Now we present a Bayesian model comparison using conditional predictive ordinate (CPO) for the four HB models with Direct sv. CPOs are the observed likelihoods based on the cross-validation predictive distribution f( y i | y obs(i) ). MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOzaiaayk W7daqadeqaaiaadMhadaWgaaWcbaGaamyAaaqabaGccaaMc8UaaiiF aiaaykW7caWG5bWaaSbaaSqaaiaab+gacaqGIbGaae4CaiaacIcaca WGPbGaaiykaaqabaaakiaawIcacaGLPaaacaGGUaaaaa@4948@  We compute the CPO values for each observed data point y i,obs MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBa aaleaacaWGPbGaaiilaiaaykW7caqGVbGaaeOyaiaabohaaeqaaaaa @3F36@  and larger CPO indicates that y i,obs MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBa aaleaacaWGPbGaaiilaiaaykW7caqGVbGaaeOyaiaabohaaeqaaaaa @3F36@  supports the model and a better model fit. For model choice, we can compute the CPO ratio of model A against model B. If this ratio is greater than 1, then y i,obs MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBa aaleaacaWGPbGaaiilaiaaykW7caqGVbGaaeOyaiaabohaaeqaaaaa @3F36@  supports model A. We compute the CPO ratio for YCM/FH-HB, YLLM/FH-HB and STKM/FH-HB, and count the number of the CPO ratios are larger than 1. We can also plot the CPO values or summarize the CPO values by taking the average of the estimated CPOs. For more detail on CPO, see for example, Gilks, Richardson and Spiegelhalter (1996), page 153, You and Rao (2000), and Molina, Nandram and Rao (2014). Table 4.2 presents the CPO mean and median values over the 117 CMA/CAs and the number of CPO ratios larger than 1.


Table 4.2
Summary of CPO values and CPO ratios over 117 CMA/CAs
Table summary
This table displays the results of Summary of CPO values and CPO ratios over 117 CMA/CAs FH-HB, YCM, YLLM and STKM (appearing as column headers).
FH-HB YCM YLLM STKM
Direct sv Direct sv Direct sv Direct sv
CPO Mean 0.1053 0.1222 0.1242 0.1238
CPO Median 0.0976 0.1004 0.1045 0.1051
# of CPO ratio >1 - 72 78 76

It is clear from Table 4.2 that YCM, YLLM and STKM have larger CPO values than FH-HB, which indicate that the HB model with sampling variance modeling is preferred when the direct sampling variance estimates are used, and YLLM and STKM are better than YCM. For CPO ratios, among the 117 areas, 72 areas/observations support YCM, 78 areas support YLLM and 76 areas support STKM. Therefore more observations support YCM, YLLM and STKM over FH-HB, and YLLM has the most number of CPO ratios that are larger than 1. The CPO comparison is consistent with the results reported in Table 4.1. For other model checking and evaluation methods, see Hidiroglou et al. (2019).


Date modified: