# Unequal probability inverse sampling Section 3. Simple random sampling with replacementUnequal probability inverse sampling Section 3. Simple random sampling with replacement

Assume that enterprise $i$ has proportion ${p}_{i}$ of the occupations in the list in the enterprise. If the sample of occupations is drawn with replacement in enterprise $i$ until $r$ occupations in the enterprise have been identified, then ${X}_{i}$ has a negative binomial distribution denoted by ${X}_{i}\sim NB\left(r,{p}_{i}\right).$ In that case,

$\mathrm{Pr}\left({X}_{i}={x}_{i}\right)=\left(\begin{array}{c}r+{x}_{i}-1\\ {x}_{i}\end{array}\right){p}_{i}^{r}{\left(1-{p}_{i}\right)}^{{x}_{i}},$

with ${x}_{i}\in ℕ=\left\{0,1,2,3,\dots \right\},{p}_{i}\in \left[0,1\right],r\in {ℕ}^{*}=\left\{1,2,3,\dots \right\}.$ Furthermore,

$\text{E}\left({X}_{i}\right)=\frac{r\left(1-{p}_{i}\right)}{{p}_{i}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{var}\left({X}_{i}\right)=\frac{r\left(1-{p}_{i}\right)}{{p}_{i}^{2}}.$

Let ${A}_{ik},k\in L,$ be the number of times that unit $k$ is selected in the sample taken from enterprise $i.$ In a simple design with replacement of size $n,$ the values of ${A}_{ik}$ have a multinomial distribution. Therefore,

$\mathrm{Pr}\left({A}_{ik}={a}_{ik},k\in L\right)=\frac{n!}{{M}^{n}}\prod _{k\in L}\frac{1}{{a}_{ik}!},$

where ${A}_{ik}=0,\dots ,n,$ and

$\sum _{k\in L}\text{\hspace{0.17em}}{a}_{ik}=n.$

If this multinomial vector is conditioned on a fixed size in a given part of the population, then

$\begin{array}{ll}\mathrm{Pr}\left({A}_{ik}={a}_{ik},k\in {F}_{i}|\text{\hspace{0.17em}}\sum _{k\in {F}_{i}}{A}_{ik}=r\right)\hfill & =\frac{\mathrm{Pr}\left({A}_{ik}={a}_{ik},k\in {F}_{i}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{et}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\sum _{k\in {F}_{i}}{A}_{ik}=r\right)}{\mathrm{Pr}\left(\sum _{k\in {F}_{i}}{A}_{ik}=r\right)}\hfill \\ \hfill & =\frac{\frac{n!{\left(1-{p}_{i}\right)}^{\left(n-r\right)}}{\left(n-r\right)!{M}^{r}}\prod _{k\in {F}_{i}}\frac{1}{\text{\hspace{0.17em}}{a}_{ik}!}}{\frac{n!{p}_{i}^{r}{\left(1-{p}_{i}\right)}^{n-r}}{r!\left(n-r\right)!}}\hfill \\ \hfill & =r!{\left(\frac{1}{M{p}_{i}}\right)}^{r}\prod _{k\in {F}_{i}}\frac{1}{{a}_{ik}!},\hfill \end{array}$

with

$\sum _{k\in {F}_{i}}\text{\hspace{0.17em}}{a}_{ik}=r.$

This shows that, if the sum of ${A}_{ik}$ is conditioned on one part of the population, the distribution remains multinomial and conditionally there is still a simple design with replacement.

With the procedure in which we draw with replacement until we obtain $r$ occupations in enterprise $i,$ we have

$\text{E}\left({A}_{ik}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{X}_{i}\right)=\left\{\begin{array}{ll}\frac{r}{M{p}_{i}}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {F}_{i}\hfill \\ \frac{{X}_{i}}{M-M{p}_{i}}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {D}_{i}.\hfill \end{array}$

In fact, conditionally on ${X}_{i},$ in ${F}_{i}$ of size $M{p}_{i},$ $r$ occupations are selected and, in ${D}_{i}$ of size $M\left(1-{p}_{i}\right),$ ${X}_{i}$ occupations are selected.

In the case with replacement, what is calculated is not really an inclusion probability, but rather the expected value of ${A}_{ik}$ which is denoted as ${\pi }_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i},$

${\pi }_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}=\text{EE}\left({A}_{ik}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{X}_{i}\right)=\frac{r}{M{p}_{i}},$

$k\in L.$ The problem is that we know $M,r$ and ${X}_{i},$ but not ${p}_{i}.$ We can estimate ${p}_{i}$ using the method of moments by solving $\text{E}\left({X}_{i}\right)={X}_{i},$ which yields

${X}_{i}=\frac{r\left(1-{\stackrel{^}{p}}_{i}\right)}{{\stackrel{^}{p}}_{i}}$

and therefore

${\stackrel{^}{p}}_{i1}=\frac{r}{{X}_{i}+r}.$

The maximum likelihood method provides the same estimator as the method of moments, but this estimator is biased (Mikulski and Smith 1976; Johnson, Kemp and Kotz 2005, page 222). If $r\ge 2,$ the unbiased minimum variance estimator of ${p}_{i}$ is

${\stackrel{^}{p}}_{i2}=\frac{r-1}{{X}_{i}+r-1}.$

However, $1/{\stackrel{^}{p}}_{i1}$ is unbiased for $1/{p}_{i}.$

Since we are using weights that are inverses of ${\pi }_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i},$ the inverses of ${\pi }_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}$ are thus estimated as follows:

$\stackrel{^}{1/{\pi }_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}}=\left\{\begin{array}{lll}\frac{M{\stackrel{^}{p}}_{i2}}{r}\hfill & =\frac{M\left(r-1\right)}{r\left({X}_{i}+r-1\right)}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {F}_{i}\hfill \\ \frac{M\left(1-{\stackrel{^}{p}}_{i2}\right)}{{X}_{i}}\hfill & =\frac{M}{{X}_{i}+r-1}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {D}_{i}.\hfill \end{array}$

However, the case with replacement is not very satisfactory, because selecting $r$ occupations with replacement does not necessarily result in $r$ distinct occupations, since the same occupation may be selected more than once. Furthermore, sampling may be especially long if $M{p}_{i}$ is small. Therefore, sampling without replacement is preferred.

Date modified: