Small area estimation using Fay-Herriot area level model with sampling variance smoothing and modeling
Section 4. Application
In this section, we apply the models in
Sections 2 and 3 to the Canadian Labour Force Survey (LFS) data and
compare the EBLUP and HB estimates. The LFS releases monthly
unemployment rate estimates for large areas such as the nation and provinces as
well as local areas such as Census Metropolitan Areas (CMAs) and Census
Agglomerations (CAs) across Canada. The direct LFS estimates for some
local areas are not reliable exhibiting very large coefficient of variations
(CVs) due to small sample sizes. Model-based estimators are considered to
improve the direct LFS estimates. As an illustration, we apply the
Fay-Herriot model to the May 2016 unemployment rate estimates at the CMA/CA
level, and compare the model-based estimates and the direct estimates with the
census estimates to compare the effects of sampling variance smoothing and
modeling. Hidiroglou et al. (2019) also compared the model-based LFS
estimates with the census estimates. For the unemployment rate
estimation, the local area employment insurance monthly beneficiary rate is
used as an auxiliary variable in the model. For comparison of point estimates,
we compute the absolute relative error (ARE) of the direct and model estimates
with respect to the census estimates for each CMA/CA as follows:
where is the direct or the EBLUP/HB
estimate and is the corresponding census value
of the unemployment rate. Then we take the average of AREs over CMA/CAs. For
CV, we compute the average CVs of the direct and model-based estimates. We
prefer a model with smaller ARE and smaller CV.
We first apply the models to all the 117 CMA/CAs with sample size and then apply them to 92 CMA/CAs with sample
size and finally 79 CMA/CAs with sample size Table 4.1 presents the average ARE and
the corresponding average CV (in brackets). In Table 4.1, the model with
Smoothed sv indicates that a smoothed sampling variance is used, Direct sv
indicates that a direct sampling variance estimate is used.
With Smoothed sv, both FH-EBLUP and FH-HB substantially improve the
direct survey estimates with much smaller ARE and CV. In particular, FH-HB has
the smallest ARE, and FH-EBLUP has the smallest CV. For example, over the 117
areas, the direct LFS estimator has ARE 0.263 with average CV 0.329, FH-EBLUP Smoothed
sv has ARE 0.124 with average CV 0.087, FH-HB Smoothed sv has ARE 0.118 with
average CV 0.116. The good performance of FH-EBLUP and FH-HB with Smoothed sv
indicates that the smoothing GVF (2.2) is very useful and effective in
improving the model-based estimates.
With Direct sv, both FH-EBLUP and FH-HB perform the worst among all the
models, with almost identical results under this scenario. The other three HB
models perform better than the FH-EBLUP and FH-HB using direct sv. YLLM and
STKM perform better than YCM with smaller ARE and smaller CV. YLLM and STKM
perform very similarly for all the CMA/CA groups, and YLLM consistently has
slightly smaller ARE than STKM, but YLLM has slightly larger CV than STKM. For
example, over the 117 areas, YLLM has ARE 0.135, STKM has ARE 0.137, and YLLM
has average CV 0.123, and STKM has average CV 0.122. YCM has ARE 0.148 with CV
0.136, FH-HB has ARE 0.171 with CV 0.221.
Table 4.1
Comparison of average absolute relative error (ARE) and average CV in parenthesis
Table summary
This table displays the results of Comparison of average absolute relative error (ARE) and average CV in parenthesis. The information is grouped by CMA/CAs (appearing as row headers), Direct, FH-EBLUP, FH-HB, YCM, YLLM and STKM (appearing as column headers).
| CMA/CAs |
Direct |
FH-EBLUP |
FH-HB |
FH-EBLUP |
FH-HB |
YCM |
YLLM |
STKM |
| LFS |
Smoothed sv |
Smoothed sv |
Direct sv |
Direct sv |
Direct sv |
Direct sv |
Direct sv |
| Average over 117 CMA/CAs |
0.263 |
0.124 |
0.118 |
0.170 |
0.171 |
0.148 |
0.135 |
0.137 |
| (sample size
) |
(0.329) |
(0.087) |
(0.116) |
(0.238) |
(0.221) |
(0.136) |
(0.123) |
(0.122) |
| Average over 92 CMA/CAs |
0.216 |
0.124 |
0.116 |
0.133 |
0.132 |
0.132 |
0.125 |
0.127 |
| (sample size
) |
(0.262) |
(0.076) |
(0.103) |
(0.123) |
(0.123) |
(0.121) |
(0.117) |
(0.116) |
| Average over 79 CMA/CAs |
0.181 |
0.122 |
0.113 |
0.126 |
0.122 |
0.122 |
0.118 |
0.120 |
| (sample size
) |
(0.232) |
(0.057) |
(0.094) |
(0.115) |
(0.115) |
(0.115) |
(0.114) |
(0.113) |
Now we present a Bayesian model comparison using
conditional predictive ordinate (CPO) for the four HB models with Direct sv.
CPOs are the observed likelihoods based on the cross-validation predictive
distribution We compute the CPO values for each observed
data point and larger CPO indicates that supports the model and a better model fit. For
model choice, we can compute the CPO ratio of model A against model B. If this ratio
is greater than 1, then supports model A. We compute the CPO ratio for
YCM/FH-HB, YLLM/FH-HB and STKM/FH-HB, and count the number of the CPO ratios
are larger than 1. We can also plot the CPO values or summarize the CPO values
by taking the average of the estimated CPOs. For more detail on CPO, see for example,
Gilks, Richardson and Spiegelhalter (1996), page 153, You and Rao (2000),
and Molina, Nandram and Rao (2014). Table 4.2 presents the CPO mean and
median values over the 117 CMA/CAs and the number of CPO ratios larger than 1.
Table 4.2
Summary of CPO values and CPO ratios over 117 CMA/CAs
Table summary
This table displays the results of Summary of CPO values and CPO ratios over 117 CMA/CAs FH-HB, YCM, YLLM and STKM (appearing as column headers).
|
FH-HB |
YCM |
YLLM |
STKM |
| Direct sv |
Direct sv |
Direct sv |
Direct sv |
| CPO Mean |
0.1053 |
0.1222 |
0.1242 |
0.1238 |
| CPO Median |
0.0976 |
0.1004 |
0.1045 |
0.1051 |
| # of CPO ratio >1 |
- |
72 |
78 |
76 |
It is clear from Table 4.2 that YCM, YLLM and STKM
have larger CPO values than FH-HB, which indicate that the HB model with
sampling variance modeling is preferred when the direct sampling variance
estimates are used, and YLLM and STKM are better than YCM. For CPO ratios,
among the 117 areas, 72 areas/observations support YCM, 78 areas support YLLM
and 76 areas support STKM. Therefore more observations support YCM, YLLM and
STKM over FH-HB, and YLLM has the most number of CPO ratios that are larger
than 1. The CPO comparison is consistent with the results reported in Table 4.1.
For other model checking and evaluation methods, see Hidiroglou et al. (2019).
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa