# A note on the concept of invariance in two-phase sampling designs Section 2. The concept of invarianceA note on the concept of invariance in two-phase sampling designs Section 2. The concept of invariance

We distinguish the concept of strong invariance that may also be called distribution invariance from that of weak invariance that may also be called first-two-moment invariance.

Definition 1. A two-phase sampling design is said to be strongly (or distribution) invariant provided that

$F\left({I}_{2}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{I}_{1}\right)=F\left({I}_{2}\right)\text{ }\text{ }\text{ }\text{ }\text{ }\left(2.1\right)$

A consequence of Definition 1 is that $F\left({I}_{1},{I}_{2}\right)=F\left({I}_{1}\right)F\left({I}_{2}\right)$ and therefore, with a strongly invariant two-phase sampling design, the vector ${I}_{2}$ can be generated prior to the vector ${I}_{1}.$ In practice, the concept of strong invariance is satisfied for only few two-phase sampling designs. A first example is Poisson sampling at the second phase. This covers the case of nonresponse, which is often viewed as a Poisson sampling design at the second phase. An other example is two-stage sampling. Both are described in greater detail below.

Example 1. At the first phase, a sample ${s}_{1}$  is selected according to an arbitrary sampling design followed by Poisson sampling at the second phase, where the units selection probability ${\pi }_{2i}\left({I}_{1}\right)$  are set prior to sampling, which means that ${\pi }_{2i}\left({I}_{1}\right)={\pi }_{2i}$  for $i\in U.$  Since Poisson sampling is completely characterized by its first-order selection probabilities, we have $F\left({I}_{2}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{I}_{1}\right)=F\left({I}_{2}\right).$  As a result, this sampling design is strongly invariant. It can be implemented as follows: first, generate the vector ${I}_{2}$  according to the Poisson sampling design $F\left({I}_{2}\right)$  and, independently, generate the vector ${I}_{1}$  according to the design $F\left({I}_{1}\right).$

Example 2. Two-stage cluster sampling can be described as follows: at the first stage, a sample of clusters is selected randomly from the population of clusters. Then, at the second stage, within each cluster selected at the first stage, a sample of elements is randomly selected. Note that, even in this case, the vector ${I}_{1}$  is still defined at the element level, with its size $N$  corresponding to the number of elements in the population. Under this set-up, the selection indicator for an element $j$  within cluster $i,$   ${I}_{1ij},$  is equal to 1 for all elements $j$  within a selected cluster $i.$  Therefore, two-stage sampling is a special case of two-phase sampling as described in Section 1. If the selection within clusters is independent of which clusters have been selected in the first phase, then we are in the presence of a strongly invariant two-stage cluster sampling design. This is satisfied if the selection of elements within clusters is independent of the selection of elements in any other cluster. A strongly invariant two-stage cluster sampling designs can be implemented by reversing the actual act of sampling: instead of sampling the clusters first, we begin by selecting the elements in each of the population clusters, and then sampling the clusters.

Note that our definition of strong invariance for two-stage designs is slightly different from the one given in Särndal, Swensson and Wretman (1992, Chapter 4) because the latter restrict to clusters selected at the first stage. However, for practical purposes, both definitions are essentially equivalent. We used Definition 1 rather the standard definition of Särndal et al. (1992) because the latter does not extend easily to the case of two-phase sampling.

Definition 2. A two-phase sampling design is said to be weakly (or first-two-moment) invariant if

${\pi }_{2i}\left({I}_{1}\right)={\pi }_{2i}\text{ }and\text{ }{\pi }_{2ij}\left({I}_{1}\right)={\pi }_{2ij}\text{ }i\in {s}_{1},j\in {s}_{1}.$

Clearly, a strongly invariant two-phase sampling design is weakly invariant but the opposite is not true. The next example describes a sampling design that is weakly invariant but not strongly invariant.

Example 3. At the first phase, we select a sample, ${s}_{1},$  of size ${n}_{1},$  according to an arbitrary fixed-size sampling design. From ${s}_{1},$  we select a simple random sample without replacement, ${s}_{2},$  of size ${n}_{2},$  where ${n}_{2}$  is fixed prior to sampling. This two-phase sampling design is weakly invariant since ${\pi }_{2i}={n}_{2}/{n}_{1},$  and ${\pi }_{2ij}={n}_{2}\left({n}_{2}-1\right)/{n}_{1}\left({n}_{1}-1\right),$  which remain the same from one realization of ${I}_{1}$  to another. However, it is not strongly invariant since it is not possible to generate ${I}_{2}$  prior to ${I}_{1}$  and meet the fixed-size sample size constraint for ${n}_{2}.$  In fact, this would also be true for any fixed-size sampling design at the second phase satisfying ${\pi }_{2i}\left({I}_{1}\right)={\pi }_{2i}$  and ${\pi }_{2ij}\left({I}_{1}\right)={\pi }_{2ij}.$

Finally, we describe a non-invariant two-phase sampling design.

Example 4. At the first phase, we select a simple random sample without replacement, ${s}_{1},$  of size ${n}_{1},$  according to an arbitrary fixed-size sampling design. For every $i\in {s}_{1},$  we record an auxiliary variable $x.$  From ${s}_{1},$  a second-phase sample, ${s}_{2},$  of fixed size ${n}_{2},$  is selected using an inclusion probability proportional-to-size procedure. In this case, we have

${\pi }_{2i}\left({I}_{1}\right)=\frac{{n}_{2}{x}_{i}}{\sum _{i\in U}\text{\hspace{0.17em}}{x}_{i}{I}_{1i}}.$

Clearly, the inclusion probability of unit $i$  in ${s}_{2}$  vary from one realization of ${I}_{1}$  to another. Since ${\pi }_{2i}\left({I}_{1}\right)$  is a function of ${I}_{1},$  it is known only after the first-phase sample ${s}_{1}$  is actually realized.

Date modified: