# Unequal probability inverse sampling Section 2. Formalization of the problem

The following notation is used:

- $U:$ a population of $N$ enterprises, i.e., $U\mathrm{=}\left\{\mathrm{1,}\dots \mathrm{,}i\mathrm{,}\dots \mathrm{,}N\right\}$ $(U$ may denote the population of enterprises in an economic region),

- $L:$ the list of occupations,

- $M:$ the number of occupations in the list, i.e., the size of $L,$

- ${F}_{i}:$ the list of occupations in enterprise $i,$ with ${F}_{i}\subset L\mathrm{,}$

- ${D}_{i}:$ the list of occupations absent from enterprise $i,$ with ${D}_{i}\subset L\mathrm{,}$ ${F}_{i}\cup {D}_{i}\mathrm{=}L$ and ${D}_{i}\cap {F}_{i}\mathrm{=}\varnothing \mathrm{,}$

- $M{p}_{i}:$ the number of occupations in enterprise $i,$ i.e., the size of ${F}_{i},$

- $r:$ the number of distinct occupations to be obtained in each enterprise,

- ${X}_{i}:$ the number of failures before the $r$ occupations in enterprise $i$ are obtained by selecting the occupations using a given design.

The main objective is to estimate the average wage for an occupation in the total population. Let ${y}_{ik}$ be the average wage for occupation $k$ in enterprise $i,$ and let ${z}_{ik}$ be the number of employees with occupation $k$ in enterprise $i.$ The objective is to estimate the average wage for occupation $k$ given by

$${\overline{Y}}_{k}\mathrm{=}\frac{{\displaystyle \sum _{i\in U\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{F}_{i}\u220dk}}{z}_{ik}{y}_{ik}}{{\displaystyle \sum _{i\in U\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{F}_{i}\u220dk}}{z}_{ik}}\mathrm{.}$$

Assume that a sample of enterprises ${S}_{1}$ is selected from $U$ using some given design with inclusion probabilities ${\pi}_{1i}\mathrm{.}$ In enterprise $i,$ a sample of occupations ${S}_{i}$ is selected using one of the designs described above with inclusion probability ${\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}.$ If the design is with replacement, ${\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}$ represents the expected number of times that occupation $k$ is selected in enterprise $i.$

${\overline{Y}}_{k}$ can be estimated using a “ratio” type estimator (Hájek 1971):

$${\widehat{\overline{Y}}}_{k}\mathrm{=}\frac{{\displaystyle \sum _{i\in {S}_{1}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}\left({S}_{i}\cap {F}_{i}\right)\u220dk}}\frac{{z}_{ik}{y}_{ik}}{{\pi}_{1i}{\pi}_{k\mathrm{|}i}}}{{\displaystyle \sum _{i\in {S}_{1}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}\left({S}_{i}\cap {F}_{i}\right)\u220dk}}\frac{{z}_{ik}}{{\pi}_{1i}{\pi}_{k\mathrm{|}i}}}\mathrm{.}$$

Therefore, the probability that an occupation will be selected in an enterprise must be known. However, with an inverse type design, the probability is unknown and must therefore be estimated in order to estimate ${\overline{Y}}_{k}.$ Since the inclusion probabilities appear in the denominator, it is preferable to estimate the inverses of ${\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}.$ In an enterprise, an occupation’s probability of being selected decreases as the number of occupations increases. In addition, the probability depends on the inverse sampling design used in each enterprise.

- Date modified: