# Development of a small area estimation system at Statistics CanadaSection 2. Core notation and background

We first introduce some notation that will define the various small area estimators included in the production system. Let $U$ denote a population of size $N.$ This population is partitioned into $M$ mutually exclusive and exhaustive areas, where each area ${U}_{i}\subset U,\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}M$ has ${N}_{i}$ observations. A sample, $s,$ of size $n$ is drawn from the population using a well-defined probability mechanism $p\left(s\right)$ and the resulting sample is split into areas ${s}_{i}=s\cap {U}_{i},\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}M.$ Note that, for some of the areas, the realized sample size ${n}_{i}$ may be zero. The set of $m\left(m\le M\right)$ areas, where ${n}_{i}$ is strictly greater than 0, will be denoted as $A.$ The set of the remaining areas, where ${n}_{i}$ is equal than 0, will be denoted as $\overline{A}.$

Let ${\pi }_{j}={\sum }_{\left\{s:\text{\hspace{0.17em}}j\in s\right\}}p\left(s\right),\text{\hspace{0.17em}}j\in U,$ be the inclusion probabilities where $\left\{s:\text{\hspace{0.17em}}j\in s\right\}$ denotes summation over all samples $s$ containing unit $j.$ We denote the sampling weight for unit $j$ as ${d}_{j},$ where ${d}_{j}={\pi }_{j}^{-1}.$ The final weight associated with unit $j$ will be denoted as ${w}_{j}.$ This weight will normally be the product of the original design weight $\left({d}_{j}\right)$ times an adjustment factor that reflects the incorporation of available auxiliary data (via regression or calibration), as well as non-response adjustments. Note that the auxiliary data used in the adjustment factor may not necessarily be the same as those used for small area estimation.

The objective of a small area estimation system is to estimate a population parameter ${\theta }_{i}$ (e.g., a mean or a total) for each area $i$ for a given variable of interest $y$ when some area sample sizes ${n}_{i}$ are too small to use direct estimation procedures. A direct estimator of ${\theta }_{i}$ is one that uses values of the variable of interest, $y,$ strictly from the sample units in area $i.$ However, a major disadvantage of such estimators is that unacceptably large standard errors may result: this is especially true if the area sample size is small. Small area procedures use indirect estimators that borrow strength across areas, by using models which link all areas through some common parameters. Indirect estimators will be efficient (i.e., increase the effective sample size and thus decrease the standard error) if the model holds for each area. Departures from the model will result in reduced accuracy. There is a wide variety of indirect estimators available and a good summary is provided in Rao and Molina (2015).

Small area estimators are classified as area or unit level depending on the level at which the modeling is performed. Area level small area estimators are based on models linking a given parameter of interest to area-specific auxiliary variables. Unit level small area estimators are based on models linking the variable of interest to unit-specific auxiliary variables. Area level small area estimators are computed if the unit level area data are not available. They can also be computed if the unit level data are available by aggregating them to the appropriate area level. This might be useful in practice because the area level small area estimators may be less prone to outliers than their unit level counterpart.

﻿

Is something not working? Is there information outdated? Can't find what you're looking for?