# Unequal probability inverse sampling Section 4. Simple random sampling without replacement

For the case without replacement, the notation used is the same as for the draw with replacement. The number of failures ${X}_{i}$ therefore has a negative hypergeometric distribution. This probability distribution is little known, to the point that it has been presented as a “forgotten” distribution by Miller and Fridell (2007). This distribution is the counterpart to the negative binomial for the draw without replacement. The general framework is as follows: We consider a population of size $M$ in which there are $M{p}_{i}$ favourable units, namely the occupations in the list that exist in the enterprise. If the draws are equal probability without replacement until $r$ favorable units appear, then the negative hypergeometric variable, ${X}_{i}\sim NH\left(M\mathrm{,}r\mathrm{,}M{p}_{i}\right)\mathrm{,}$ counts the number of failures before $r$ favourable events occur.

The probability distribution is

$$\mathrm{Pr}\left({X}_{i}\mathrm{=}x\right)\mathrm{=}p\left(x\mathrm{;}M\mathrm{,}r\mathrm{,}M{p}_{i}\right)\mathrm{=}\frac{\left(\begin{array}{c}x+r-1\\ x\end{array}\right)\left(\begin{array}{c}M-x-r\\ M{p}_{i}-r\end{array}\right)}{\left(\begin{array}{c}M\\ M{p}_{i}\end{array}\right)}\mathrm{,}$$

where $x\in \left\{\mathrm{0,}\dots \mathrm{,}M\left(1-{p}_{i}\right)\right\},$ $M\in \left\{\mathrm{1,2,}\dots \right\},$ $M{p}_{i}\in \left\{\mathrm{1,2,}\dots \mathrm{,}M\right\},$ and $r\in \left\{\mathrm{1,2,}\dots \mathrm{,}M{p}_{i}\right\}.$

$$\text{E}\left({X}_{i}\right)\mathrm{=}\frac{Mr\left(1-{p}_{i}\right)}{M{p}_{i}+1}\mathrm{,}\text{var}\left({X}_{i}\right)\mathrm{=}\frac{rM\left(1-{p}_{i}\right)\left(M+1\right)\left(M{p}_{i}-r+1\right)}{{\left(M{p}_{i}+1\right)}^{2}\left(M{p}_{i}+2\right)}\mathrm{.}$$

Again, ${A}_{ik}$ denotes the number of times that unit $k$ is selected in the sample. Now, the value of ${A}_{ik}$ can be only 0 or 1. If $n$ units are selected using a simple design without replacement in $L,$ the sample design is defined as

$$\mathrm{Pr}\left({A}_{ik}\mathrm{=}{a}_{ik}\mathrm{,}k\in L\right)\mathrm{=}{\left(\begin{array}{c}M\\ n\end{array}\right)}^{-1}\mathrm{,}$$

where ${a}_{ik}\in \left\{\mathrm{0,1}\right\}\mathrm{,}$ and

$$\sum _{k\in L}}\text{\hspace{0.17em}}{a}_{ik}\mathrm{=}n\mathrm{.$$

If the vector of ${A}_{ik}$ is conditioned on a fixed size in one part of the population, we have

$$\begin{array}{ll}\mathrm{Pr}\left({A}_{ik}\mathrm{=}{a}_{ik}\mathrm{,}k\in {F}_{i}|\text{\hspace{0.17em}}{\displaystyle \sum _{k\in {F}_{i}}}{A}_{ik}\mathrm{=}r\right)\hfill & \mathrm{=}\frac{\mathrm{Pr}\left({A}_{ik}\mathrm{=}{a}_{ik}\mathrm{,}k\in {F}_{i}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\displaystyle \sum _{k\in {F}_{i}}}\text{\hspace{0.17em}}{A}_{ik}\mathrm{=}r\right)}{\mathrm{Pr}\left({\displaystyle \sum _{k\in {F}_{i}}}\text{\hspace{0.17em}}{A}_{ik}\mathrm{=}r\right)}\hfill \\ \hfill & \mathrm{=}{\left[\frac{\left(\begin{array}{c}M{p}_{i}\\ r\end{array}\right)\left(\begin{array}{c}M-M{p}_{i}\\ n-r\end{array}\right)}{\left(\begin{array}{c}M\\ n\end{array}\right)}\right]}^{-1}{\displaystyle \sum _{\begin{array}{c}k\in {D}_{i}\\ {\displaystyle {\sum}_{k\in \text{\hspace{0.17em}}{F}_{i}}{A}_{ik}=n-r}\\ {A}_{ik}\in \left\{0,1\right\}\end{array}}\frac{1}{\left(\begin{array}{l}M\hfill \\ n\hfill \end{array}\right)}}\hfill \\ \hfill & \mathrm{=}{\left[\frac{\left(\begin{array}{c}M{p}_{i}\\ r\end{array}\right)\left(\begin{array}{c}M-M{p}_{i}\\ n-r\end{array}\right)}{\left(\begin{array}{c}M\\ n\end{array}\right)}\right]}^{-1}\frac{\left(\begin{array}{c}M-M{p}_{i}\\ n-r\end{array}\right)}{\left(\begin{array}{c}M\\ n\end{array}\right)}\hfill \\ \hfill & \mathrm{=}{\left(\begin{array}{c}M{p}_{i}\\ r\end{array}\right)}^{-1}\mathrm{,}\hfill \end{array}$$

with

$$\sum _{k\in {F}_{i}}}\text{\hspace{0.17em}}{a}_{ik}\mathrm{=}r\mathrm{.$$

This shows that, if the sum of ${A}_{ik}$ is conditioned on one part of the population, we still have a simple design without replacement. In the procedure in which we draw without replacement until we obtain $r$ occupations in enterprise $i,$ we therefore have

$$\text{E}\left({A}_{ik}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{X}_{i}\right)\mathrm{=}\{\begin{array}{ll}\frac{r}{M{p}_{i}}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {F}_{i}\hfill \\ \frac{{X}_{i}}{M-M{p}_{i}}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {D}_{i}\mathrm{.}\hfill \end{array}$$

The inclusion probability is therefore

$${\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}\mathrm{=}\text{EE}\left({A}_{ik}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{X}_{i}\right)\mathrm{=}\{\begin{array}{ll}\frac{r}{M{p}_{i}}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {F}_{i}\hfill \\ \frac{\text{E}\left({X}_{i}\right)}{M-M{p}_{i}}\mathrm{=}\frac{r}{M{p}_{i}+1}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {D}_{i}\mathrm{,}\hfill \end{array}$$

for all $k\in L\mathrm{.}$ Again, the problem is that we know $M\mathrm{,}r$ and ${X}_{i},$ but not ${p}_{i}.$ We can estimate ${p}_{i}$ using the maximum likelihood method, through a numerical method.

Using the method of moments, an estimate can be obtained by solving for ${p}_{i}$ in the equation ${X}_{i}\mathrm{=}\text{E}\left({X}_{i}\right)$ , that is,

$${X}_{i}\mathrm{=}\frac{Mr\left(1-{\widehat{p}}_{i}\right)}{M{\widehat{p}}_{i}+1}\mathrm{.}$$

Hence

$${\widehat{p}}_{i1}\mathrm{=}\frac{Mr-{X}_{i}}{M\left(r+{X}_{i}\right)}\mathrm{.}$$

However, in a few lines it is verified that, if $r\ge \mathrm{2,}$

$${\widehat{p}}_{i2}\mathrm{=}\frac{r-1}{r+{X}_{i}-1}$$

is unbiased for ${p}_{i}\mathrm{.}$

Again, since we are using weights that are inverses of ${\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}.$ The inverses of the inclusion probabilities are thus estimated as follows:

$$\widehat{1/{\pi}_{k\text{\hspace{0.17em}}|\text{\hspace{0.17em}}i}}\mathrm{=}\{\begin{array}{lll}\frac{M{\widehat{p}}_{i2}}{r}\hfill & \mathrm{=}\frac{M\left(r-1\right)}{r\left({X}_{i}+r-1\right)}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {F}_{i}\hfill \\ \frac{M\left(1-{\widehat{p}}_{i2}\right)}{{X}_{i}}\hfill & \mathrm{=}\frac{M}{{X}_{i}+r-1}\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\in {D}_{i}\mathrm{.}\hfill \end{array}$$

These weights are also used in the estimator by Murthy (1957), which is unbiased (see also Salehi and Seber 2001). If $M{p}_{i}\mathrm{<}r\mathrm{,}$ all occupations will be selected in enterprise $i$ and the estimated inclusion probabilities are then equal to 1.

- Date modified: