# Unequal probability inverse sampling Section 3. Simple random sampling with replacement

Assume that enterprise $i$ has proportion ${p}_{i}$ of the occupations in the list in the enterprise. If the sample of occupations is drawn with replacement in enterprise $i$ until $r$ occupations in the enterprise have been identified, then ${X}_{i}$ has a negative binomial distribution denoted by ${X}_{i}\sim NB\left(r\mathrm{,}{p}_{i}\right).$ In that case,

$$\mathrm{Pr}\left({X}_{i}\mathrm{=}{x}_{i}\right)\mathrm{=}\left(\begin{array}{c}r+{x}_{i}-1\\ {x}_{i}\end{array}\right){p}_{i}^{r}{\left(1-{p}_{i}\right)}^{{x}_{i}}\mathrm{,}$$

with ${x}_{i}\in \mathbb{N}\mathrm{=}\left\{\mathrm{0,1,2,3,}\dots \right\}\mathrm{,}{p}_{i}\in \left[\mathrm{0,1}\right]\mathrm{,}r\in {\mathbb{N}}^{\mathrm{*}}\mathrm{=}\left\{\mathrm{1,2,3,}\dots \right\}.$ Furthermore,

$$\text{E}\left({X}_{i}\right)\mathrm{=}\frac{r\left(1-{p}_{i}\right)}{{p}_{i}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{var}\left({X}_{i}\right)\mathrm{=}\frac{r\left(1-{p}_{i}\right)}{{p}_{i}^{2}}\mathrm{.}$$

Let ${A}_{ik}\mathrm{,}k\in L\mathrm{,}$ be the number of times that unit $k$ is selected in the sample taken from enterprise $i.$ In a simple design with replacement of size $n,$ the values of ${A}_{ik}$ have a multinomial distribution. Therefore,

$$\mathrm{Pr}\left({A}_{ik}\mathrm{=}{a}_{ik}\mathrm{,}k\in L\right)\mathrm{=}\frac{n\mathrm{!}}{{M}^{n}}{\displaystyle \prod _{k\in L}}\frac{1}{{a}_{ik}\mathrm{!}}\mathrm{,}$$

where ${A}_{ik}\mathrm{=0,}\dots \mathrm{,}n\mathrm{,}$ and

$$\sum _{k\in L}}\text{\hspace{0.17em}}{a}_{ik}\mathrm{=}n\mathrm{.$$

If this multinomial vector is conditioned on a fixed size in a given part of the population, then

$$\begin{array}{ll}\mathrm{Pr}\left({A}_{ik}\mathrm{=}{a}_{ik}\mathrm{,}k\in {F}_{i}|\text{\hspace{0.17em}}{\displaystyle \sum _{k\in {F}_{i}}}{A}_{ik}\mathrm{=}r\right)\hfill & \mathrm{=}\frac{\mathrm{Pr}\left({A}_{ik}\mathrm{=}{a}_{ik}\mathrm{,}k\in {F}_{i}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{et}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\displaystyle \sum _{k\in {F}_{i}}}{A}_{ik}\mathrm{=}r\right)}{\mathrm{Pr}\left({\displaystyle \sum _{k\in {F}_{i}}}{A}_{ik}\mathrm{=}r\right)}\hfill \\ \hfill & \mathrm{=}\frac{\frac{n\mathrm{!}{\left(1-{p}_{i}\right)}^{\left(n-r\right)}}{\left(n-r\right)\mathrm{!}{M}^{r}}{\displaystyle \prod _{k\in {F}_{i}}}\frac{1}{\text{\hspace{0.17em}}{a}_{ik}\mathrm{!}}}{\frac{n\mathrm{!}{p}_{i}^{r}{\left(1-{p}_{i}\right)}^{n-r}}{r\mathrm{!}\left(n-r\right)\mathrm{!}}}\hfill \\ \hfill & \mathrm{=}r\mathrm{!}{\left(\frac{1}{M{p}_{i}}\right)}^{r}{\displaystyle \prod _{k\in {F}_{i}}}\frac{1}{{a}_{ik}\mathrm{!}}\mathrm{,}\hfill \end{array}$$

with

$$\sum _{k\in {F}_{i}}}\text{\hspace{0.17em}}{a}_{ik}\mathrm{=}r\mathrm{.$$

This shows that, if the sum of ${A}_{ik}$ is conditioned on one part of the population, the distribution remains multinomial and conditionally there is still a simple design with replacement.

With the procedure in which we draw with replacement until we obtain $r$ occupations in enterprise $i,$ we have

$$\text{E}\left({A}_{ik}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{X}_{i}\right)\mathrm{=}\{\begin{array}{ll}\frac{r}{M{p}_{i}}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {F}_{i}\hfill \\ \frac{{X}_{i}}{M-M{p}_{i}}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {D}_{i}\mathrm{.}\hfill \end{array}$$

In fact, conditionally on ${X}_{i},$ in ${F}_{i}$ of size $M{p}_{i}\mathrm{,}$ $r$ occupations are selected and, in ${D}_{i}$ of size $M\left(1-{p}_{i}\right),$ ${X}_{i}$ occupations are selected.

In the case with replacement, what is calculated is not really an inclusion probability, but rather the expected value of ${A}_{ik}$ which is denoted as ${\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i},$

$${\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}\mathrm{=}\text{EE}\left({A}_{ik}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{X}_{i}\right)\mathrm{=}\frac{r}{M{p}_{i}}\mathrm{,}$$

$k\in L\mathrm{.}$ The problem is that we know $M\mathrm{,}r$ and ${X}_{i},$ but not ${p}_{i}.$ We can estimate ${p}_{i}$ using the method of moments by solving $\text{E}\left({X}_{i}\right)\mathrm{=}{X}_{i},$ which yields

$${X}_{i}\mathrm{=}\frac{r\left(1-{\widehat{p}}_{i}\right)}{{\widehat{p}}_{i}}$$

and therefore

$${\widehat{p}}_{i1}\mathrm{=}\frac{r}{{X}_{i}+r}\mathrm{.}$$

The maximum likelihood method provides the same estimator as the method of moments, but this estimator is biased (Mikulski and Smith 1976; Johnson, Kemp and Kotz 2005, page 222). If $r\ge \mathrm{2,}$ the unbiased minimum variance estimator of ${p}_{i}$ is

$${\widehat{p}}_{i2}\mathrm{=}\frac{r-1}{{X}_{i}+r-1}\mathrm{.}$$

However, $1/{\widehat{p}}_{i1}$ is unbiased for $1/{p}_{i}.$

Since we are using weights that are inverses of ${\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i},$ the inverses of ${\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}$ are thus estimated as follows:

$$\widehat{1/{\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}}\mathrm{=}\{\begin{array}{lll}\frac{M{\widehat{p}}_{i2}}{r}\hfill & \mathrm{=}\frac{M\left(r-1\right)}{r\left({X}_{i}+r-1\right)}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {F}_{i}\hfill \\ \frac{M\left(1-{\widehat{p}}_{i2}\right)}{{X}_{i}}\hfill & \mathrm{=}\frac{M}{{X}_{i}+r-1}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {D}_{i}\mathrm{.}\hfill \end{array}$$

However, the case with replacement is not very satisfactory, because selecting $r$ occupations with replacement does not necessarily result in $r$ distinct occupations, since the same occupation may be selected more than once. Furthermore, sampling may be especially long if $M{p}_{i}$ is small. Therefore, sampling without replacement is preferred.

- Date modified: