Development of a small area estimation system at Statistics Canada
Section 7. Conclusion
A frequent
demand from users of data from NSOs is for more granularity for use in planning
and policy research purposes. NSOs can no longer simply increase the sample
sizes of their surveys to obtain reliable estimates at the requested level of
detail. Reasons for this include the high costs of doing so, response burden
concerns, as well as the difficult task of obtaining responses from sampled
units. An alternative being investigated by many NSOs is the use of small area
estimation techniques that provide a way to address the demand for more
granular data. With this in mind, Statistics Canada began the development of an SAE production system in the early 2000s and now have such a system available to
their statistical programs. The production system handles area and unit level
models, with multiple options such as different methods to estimate the
variance components, different linking models and both the EBLUP and HB
estimation methods. It is currently being used to produce experimental
estimates for several Statistics Canada statistical programs and it is expected
that the first published small area estimates will be available in 2019.
As it was
mentioned in the introduction, the only existing software in 2006 that would
produce small area estimates and their associated mean squared estimates was
sponsored by the EURAREA (2004) project.
The current production system developed at Statistics Canada is written in SAS,
with its methodology closely following Rao (2003) and includes some recent
advances. As it stands, it satisfies the existing requirements for small area
estimation at Statistics Canada. However, as
the use of small area estimation becomes more common within Statistics Canada,
there will be a need to add functionality to the system to meet this demand.
The recent book authored by Rao and Molina (2015) provides an idea of how much
development has taken place in small area estimation during recent years. The
incorporation of all this development into the production system would be
extremely time consuming, expensive, and may not be directly applicable to the
needs of Statistics Canada. It, therefore, follows that options other than programming these new functionalities in
the current SAS production system should be considered. One option would be to
investigate how packages developed elsewhere, such as those written in
can be integrated
into it. Notable packages written in R include sae (Molina and Marhuenda, 2015), mme (Lopez-Vizcaino, Lombardia and Morales, 2014), saery (Esteban, Morales and Perez, 2014) and sae2 (Fay and Diallo, 2015). These packages include small area
procedures that are not in the present system such as multinomial linear mixed
models, area level models with time effects and time series area level models
supporting univariate and multivariate applications. The existing SAS production system meets the needs of
Statistics Canada at this point in time, and there are no concrete plans to add
functionality to it.
Acknowledgements
We would like to thank the reviewers for their comments
and suggestions that led to improvements in this paper.
Appendix
Justification of the coefficient of determination
In order to determine a coefficient of determination
associated with the linking model,
we first rewrite it as
where
and
We
assume that an intercept is implicitly or explicitly included in
i.e.,
there exists a vector
such
that
In other
words, we assume that there exists a vector
such
that
If
were
known, we could estimate the unknown vector of model parameters
by the
least squares estimator
and the
unknown model variance
by the
unbiased estimator
The well-known adjusted coefficient of determination is
where
It is an
ideal coefficient of determination because it cannot be computed (since
is
unknown) but this is the target we would like to estimate. Simply replacing
with
does not
solve the problem as
reflects
the combined model and not just the linking model. The resulting coefficient of
determination would typically be too small. To obtain a better estimate of
we first
decompose
as
Assuming that an intercept is implicitly or explicitly
included in
and from the expression for
we have that
and
From (A.4), we can rewrite
as
where
As a result, the cross product term in (A.2) vanishes and equation (A.2)
reduces to
where
From (A.5), it follows that the ideal coefficient of
determination (A.1) can be rewritten as
The only unknown quantities in (A.7) are
and
A computable coefficient of determination can
thus be obtained by replacing
and
in (A.7) with
and
the consistent estimators of
and
implemented in the SAE system and described in
Section 3. The resulting coefficient of determination can be expressed as
with the function
defined in (A.7), and is a consistent
estimator of the ideal coefficient of determination
References
Battese, G.E.,
Harter, R.M. and Fuller, W.A. (1988). An error-components model for prediction
of crop areas using survey and satellite data. Journal of the American
Statistical Association, 83, 28-36.
Beaumont
, J.-F., and Bocci, C. (2016). Small area estimation
in the Labour Force Survey. Paper presented at the Advisory Committee on
Statistical Methods, May 2016,
Statistics Canada.
Brackstone, G.J. (1987). Small area data: Policy issues
and technical challenges. In Small
Area Statistics, (Eds., R. Platek, J.N.K. Rao, C.-E. Särndal and
M.P. Singh),
New York
:
John Wiley & Sons, Inc.,
3-20.
Chib, S., and Greenberg, E. (1995). Understanding the
Metropolis-Hastings algorithm. American Statistician, 49, 327-335.
Dick, P. (1995).
Modelling net undercoverage in the 1991 Canadian Census. Survey Methodology,
21, 1, 45-54. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1995001/article/14411-eng.pdf.
Drew, D., Singh,
M.P. and Choudhry, G.H. (1982). Evaluation of small area estimation techniques
for the Canadian Labour Force Survey. Survey Methodology, 8, 1, 17-47.
Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1982001/article/14328-eng.pdf.
Esteban, M.D.,
Morales, D. and Perez, A. (2014). saery:
Small Area Estimation for Rao and Yu Model.
URL http://CRAN.R-project.org/package=saery. R package version 1.0.
Estevao, V.,
Hidiroglou, M.A. and You, Y. (2015). Area
Level Model, Unit Level, and Hierarchical Bayes Methodology Specifications.
Internal document, Statistics Canada.
EURAREA (2004). Enhancing Small Area Estimation Techniques
to meet European Needs. https://cordis.europa.eu/project/rcn/58374_en.html.
Fay, R.E., and
Diallo, M. (2015). sae2: Small Area Estimation: Time-Series Models. URL
http://CRAN.Rproject.org/package=sae2. R package version 0.1-1.
Fay, R.E., and
Herriot, R.A. (1979). Estimation of income for small places: An application of
James-Stein procedures to Census data. Journal of the American Statistical
Association, 74, 269-277.
Fuller, W.A.,
and Rao, J.N.K. (2001). A regression composite estimator with application to
the Canadian Labour Force Survey. Survey Methodology, 27, 1, 45-51.
Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2001001/article/5853-eng.pdf.
Gambino, J.,
Kennedy, B. and Singh, M.P. (2001). Regression composite estimation for the
Canadian Labour Force Survey: Evaluation and implementation. Survey
Methodology, 27, 1, 65-74. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2001001/article/5855-eng.pdf.
Gelfand, A.E., and Smith, A.F.M. (1990). Sample-based approaches to calculating
marginal densities. Journal of the American Statistical Association, 85,
972-985.
Ghangurde, P.D.,
and Singh, M.P. (1977). Synthetic estimation in periodic household surveys. Survey Methodology, 3, 2, 152-181.
Gonzalez, M.E.,
and Hoza, C. (1978). Small-area estimation
with application to unemployment and housing estimates. Journal of the American Statistical
Association, 73, 7-15.
Kott, P. (1989).
Robust small domain estimation using random effects modeling. Survey
Methodology, 15, 1, 3-12. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/1989001/article/14581-eng.pdf.
Li, H., and
Lahiri, P. (2010). Adjusted maximum
method in the small area estimation problem. Journal of Multivariate
Analysis, 101, 882-892.
Lopez-Vizcaino,
E., Lombardia, M.J. and Morales, D. (2014). mme: Multinomial Mixed Effects
Models, 2014. URL http://CRAN.R-project.org/package=mme. R package version
0.1-5.
Molina,
I.
, and Marhuenda, Y. (2015). sae: An R package for small area estimation. The R Journal, 7, 1, 81-98.
Prasad, N.G.N.,
and Rao, J.N.K. (1990). The estimation of the mean
squared error of small-area estimators. Journal of the American Statistical
Association, 85, 163-171.
Prasad, N.G.N.,
and Rao, J.N.K. (1999). On robust small area estimation using a simple random
effects model. Survey Methodology, 25, 1, 67-72. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/1999001/article/4713-eng.pdf.
Rao, J.N.K.
(2003). Small Area Estimation.
New
York
: John
Wiley & Sons, Inc.
Rao, J.N.K., and
Molina,
I.
(2015). Small Area Estimation.
New
York
: John
Wiley & Sons, Inc.
Rivest, L.-P.,
and Belmonte, E. (2000). A conditional mean squared error of small area
estimators. Survey Methodology, 26, 1, 67-78. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2000001/article/5179-eng.pdf.
Rubin-Bleuer, S.
(2014). Specifications for EBLUP and Pseudo-EBLUP Estimators with
Nonnegligible Sampling Fractions.
Statistics
Canada
document.
Rubin-Bleuer,
S., Jang, L. and Godbout, S. (2016). The Pseudo-EBLUP estimator for a weighted
average with an application to the Canadian Survey of Employment, Payrolls and
Hours. Journal of Survey Statistics and Methodology, 4, 417-435.
Singh, M.P., and
Tessier, R. (1976). Some estimators for domain totals. Journal of the
American Statistical Association,
71, 322-325.
Singh, A.C.,
Kennedy, B. and Wu, S. (2001). Regression composite estimation for the Canadian
Labour Force Survey with a rotating panel design. Survey Methodology,
27, 1, 33-44. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2001001/article/5852-eng.pdf.
Stukel, D., and
Rao, J.N.K. (1997). Small-area estimation under two-fold nested error regression
model. Journal of Statistical Planning
and Inference, 78, 131-147.
Wang, J., and
Fuller, W.A. (2003). The mean square error of small area estimators constructed
with estimated area variances. Journal of American Statistical Association,
98, 716-723.
Wang, J.,
Fuller, W.A. and Qu, Y. (2008). Small area estimation under
a restriction. Survey Methodology, 34, 1, 29-36. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2008001/article/10619-eng.pdf.
You, Y., and
Rao, J.N.K. (2002). A pseudo empirical best linear unbiased prediction approach
to small area estimation using survey weights. The Canadian Journal of
Statistics, 30, 431-439.
You, Y., Rao,
J.N.K. and Dick, P. (2004). Benchmarking hierarchical Bayes small area
estimators in the Canadian census undercoverage
estimation. Statistics in Transition,
6, 631-640.
You, Y., Rao,
J.N.K. and Hidiroglou, M. (2013). On the performance of self benchmarked small
area estimators under the Fay-Herriot area level model. Survey Methodology,
39, 1, 217-229. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2013001/article/11830-eng.pdf.