Development of a small area estimation system at Statistics Canada
Section 1. Introduction
Today’s data users are becoming more and more sophisticated and are asking for more data and at more detailed levels. For National Statistical Offices (NSOs) facing declining response rates, producing data at finer levels of detail is a particularly daunting challenge. Small area estimation techniques are one way that can be considered to meet this demand to produce estimates for specified sub-populations or small areas. A small area refers to a subgroup of the population for which the sample size is so small that direct estimates are not reliable enough to be published. Examples of small areas include a geographical region (e.g., a province, county, municipality, etc.), a demographic group (e.g., age by sex), a demographic group within a geographic region or a detailed industry group. The demand for small area data has been recognized for years (see Brackstone, 1987), but recently, it has greatly increased as noted in the spring 2014 report of the Auditor General of Canada.
The study of small area estimation procedures has a long history at Statistics Canada, beginning in the seventies with Singh and Tessier (1976) and Ghangurde and Singh (1977). Drew, Singh and Choudhry (1982) proposed a sample dependent procedure to estimate employment characteristics below the provincial level. Dick (1995) modeled net undercoverage for the 1991 Canadian Census of Population. The development of a small area estimation system suited to Statistics Canada surveys is well-timed, as there is now a great deal of literature written on the subject, including the books by Rao (2003) and Rao and Molina (2015).
Four papers that have had a great impact in small area estimation (SAE) are Gonzalez and Hoza (1978), Fay and Herriot (1979), Battese, Harter and Fuller (1988), and Prasad and Rao (1990). Gonzalez and Hoza (1978) were among the first to propose small area estimation procedures (mainly synthetic estimation). Fay and Herriot (1979) developed procedures to estimate income for small areas using the long form Census Data. This method and its variants are among the most widely used procedures for producing small area estimates through the integration of auxiliary data with direct survey estimates. Battese et al. (1988) developed a small area procedure to estimate crop areas using survey and satellite data available for individual units. Finally, Prasad and Rao (1990) derived a nearly unbiased estimator of the model-based mean squared error for both the Fay-Herriot and Battese-Harter-Fuller estimators.
The statistical theory of model-based SAE is rather complex and much of the software available at National Statistical Offices has been programmed on a one-time basis and, as such, is not appropriate in a production environment. It was therefore decided to develop a system as it would be beneficial as a production tool, as well as a learning tool for employees. At the time that this was decided, around 2006, there existed computer programs developed by the EURAREA (2004) project for small area estimation. However, this set of programs was no longer in development mode and did not represent the latest advances in small area estimation. Therefore, a flexible small area estimation system that would address the needs of producing small area estimates in production was developed at Statistics Canada. Some of the basic requirements of this small area system included: allowing for both area and unit level models; incorporating the sampling design in the estimation of the parameters of interest and the mean squared error; ensuring that the small area estimates would add up to reliable higher level estimates (i.e., totals), and developing diagnostic tools to test the adequacy of the models used for small area estimation. A prototype system, written in SAS, was therefore developed by Estevao, Hidiroglou and You (2015) to reflect these requirements. This prototype has been transformed into a production system that is currently used by Statistics Canada.
The paper is organized as follows. Section 2 introduces the notation used in the article. Section 3 discusses the options available in the production system for the area level model and Empirical Best Linear Unbiased Prediction (EBLUP) methods. The options for the unit level model with EBLUP methods are presented in Section 4. The Hierarchical Bayes approach is presented in Section 5 for the area level model. Section 6 illustrates the production system using Statistics Canada’s Labour Force Survey. Finally, some conclusions are given in Section 7.
Report a problem on this page
Is something not working? Is there information outdated? Can't find what you're looking for?
Please contact us and let us know how we can help you.
- Date modified: