# Development of a small area estimation system at Statistics Canada

Section 2. Core notation and background

We first introduce some notation that will define the various small area estimators included in the production system. Let $U$ denote a population of size $N.$ This population is partitioned into $M$ mutually exclusive and exhaustive areas, where each area ${U}_{i}\subset U,\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}M$ has ${N}_{i}$ observations. A sample, $s,$ of size $n$ is drawn from the population using a well-defined probability mechanism $p\left(s\right)$ and the resulting sample is split into areas ${s}_{i}=s\cap {U}_{i},\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}M.$ Note that, for some of the areas, the realized sample size ${n}_{i}$ may be zero. The set of $m\left(m\le M\right)$ areas, where ${n}_{i}$ is strictly greater than 0, will be denoted as $A.$ The set of the remaining areas, where ${n}_{i}$ is equal than 0, will be denoted as $\overline{A}.$

Let ${\pi}_{j}={\displaystyle {\sum}_{\left\{s:\text{\hspace{0.17em}}j\in s\right\}}p\left(s\right)},\text{\hspace{0.17em}}j\in U,$ be the inclusion probabilities where $\left\{s:\text{\hspace{0.17em}}j\in s\right\}$ denotes summation over all samples $s$ containing unit $j.$ We denote the sampling weight for unit $j$ as ${d}_{j},$ where ${d}_{j}={\pi}_{j}^{-1}.$ The final weight associated with unit $j$ will be denoted as ${w}_{j}.$ This weight will normally be the product of the original design weight $\left({d}_{j}\right)$ times an adjustment factor that reflects the incorporation of available auxiliary data (via regression or calibration), as well as non-response adjustments. Note that the auxiliary data used in the adjustment factor may not necessarily be the same as those used for small area estimation.

The objective of a small area estimation system is to
estimate a population parameter
${\theta}_{i}$
(e.g., a mean or a total) for each area
$i$
for a given variable of interest
$y$
when some area sample sizes
${n}_{i}$
are too small to use *direct estimation* procedures. A *direct
estimator* of
${\theta}_{i}$
is one that uses values of the variable of
interest,
$y,$
strictly from the sample units in area
$i.$
However, a major disadvantage of such
estimators is that unacceptably large standard errors may result: this is
especially true if the area sample size is small. Small area procedures use *indirect estimators *that borrow strength
across areas, by using models which link all areas through some common
parameters. Indirect estimators will be efficient (i.e., increase the effective
sample size and thus decrease the standard error) if the model holds for each
area. Departures from the model will result in reduced accuracy. There is a
wide variety of indirect estimators available and a good summary is provided in
Rao and Molina (2015).

Small area estimators are classified as area or unit
level depending on the level at which the modeling is performed. *Area level* small area estimators are based on models linking a given parameter of
interest to area-specific auxiliary variables. *Unit level* small area estimators are based on models linking the variable
of interest to unit-specific auxiliary variables. Area level small area
estimators are computed if the unit level area data are not available. They can
also be computed if the unit level data are available by aggregating them to
the appropriate area level. This might be useful in practice because the area
level small area estimators may be less prone to outliers than their unit level
counterpart.

## Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

- Date modified: