# A short note on quantile and expectile estimation in unequal probability samples 2. Quantile estimation

We consider a finite population with $N$ elements and a continuous survey variable $Y.$ We are interested in quantiles of the cumulative distribution function $F\left(y\right)\mathrm{=}{\displaystyle {\sum}_{i\mathrm{=1}}^{N}}1\left\{{Y}_{i}\le y\right\}/N$ and define as

$$Q\left(\alpha \right)\mathrm{=}\text{inf}\left\{\mathrm{arg}\underset{q}{\mathrm{min}}{\displaystyle \sum _{i\mathrm{=1}}^{N}}\text{\hspace{0.17em}}{w}_{\alpha}\left({Y}_{i}-q\right)\left|\text{\hspace{0.17em}}{Y}_{i}-q\text{\hspace{0.17em}}\right|\right\}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(2.1)$$

the Quantile function of $Y$ (see Koenker 2005), where

$${w}_{\alpha}\left(\epsilon \right)\mathrm{=}(\begin{array}{ll}\alpha \hfill & \text{for}\text{\hspace{0.17em}}\epsilon \mathrm{>0}\hfill \\ 1-\alpha \hfill & \text{for}\text{\hspace{0.17em}}\epsilon \le 0.\hfill \end{array}$$

The “inf” argument in (2.1) is required in finite populations since the “arg min” is not unique. We draw a sample from the population with known inclusion probabilities ${\pi}_{i},$ $i\mathrm{=1,}\dots \mathrm{,}N.$ Denoting by ${y}_{1}\mathrm{,}\dots \mathrm{,}{y}_{n}$ the resulting sample, we estimate the quantile function by replacing (2.1) through its weighted sample version

$${\widehat{Q}}_{N}\left(\alpha \right)\mathrm{=}\mathrm{inf}\left\{\mathrm{arg}\underset{q}{\mathrm{min}}{\displaystyle \sum _{j\mathrm{=1}}^{n}}\text{\hspace{0.17em}}\frac{1}{{\pi}_{j}}{w}_{\alpha \mathrm{,}j}\left|\text{\hspace{0.17em}}{y}_{j}-q\text{\hspace{0.17em}}\right|\right\}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(2.2)$$

with ${w}_{\alpha \mathrm{,}j}\mathrm{=}{w}_{\alpha}\left({y}_{j}-q\right)$ as defined above. It is easy to see that the sum in (2.2) is a design-unbiased estimate for the sum in $Q\left(\alpha \right)$ given in (2.1). Nonetheless, because we take the “arg min” it follows that ${\widehat{Q}}_{N}\left(\alpha \right)$ is not unbiased for $Q\left(\alpha \right).$ We therefore look at consistency statements for ${\widehat{Q}}_{N}\left(\alpha \right)$ as follows. Let ${R}_{i}\left(q\right)\mathrm{=}{w}_{\alpha}\left({y}_{i}-q\right)\left|\text{\hspace{0.17em}}{y}_{i}-q\text{\hspace{0.17em}}\right|$ and

$${\overline{R}}_{N}\left(q\right)\mathrm{:=}\frac{1}{N}{\displaystyle \sum _{i}}\text{\hspace{0.17em}}{R}_{i}\left(q\right)\mathrm{.}$$

We draw a sample from ${R}_{i}\left(q\right)\mathrm{,}i\mathrm{=1,}\dots \mathrm{,}N$ and assume we apply a consistent sampling scheme in that

$${\overline{r}}_{n}\left(q\right)\mathrm{:=}\frac{1}{N}{\displaystyle \sum _{j\mathrm{=1}}^{n}}\text{\hspace{0.17em}}\frac{1}{{\pi}_{j}}{r}_{j}\left(q\right)$$

is design-consistent for ${\overline{R}}_{N}\left(q\right),$ where ${r}_{j}\left(q\right)$ denotes the sample of ${R}_{i}\left(q\right).$ Note that ${r}_{j}\left(q\right)$ and hence ${\overline{r}}_{n}\left(q\right),$ ${R}_{i}\left(q\right)$ and ${\overline{R}}_{N}\left(q\right)$ also depend on $\alpha $ which is suppressed in the notation for readability. Let ${q}_{0}$ be the minimum of ${\overline{R}}_{N}\left(q\right)$ which is not necessarily unique due to the finite structure of the population. We can take the “inf” argument, i.e., ${q}_{0}\mathrm{=}\mathrm{inf}\left\{\text{arg}\text{\hspace{0.17em}}\mathrm{min}{\overline{R}}_{N}\left(q\right)\right\},$ but for simplicity we assume a superpopulation model (see Isaki and Fuller 1982) by considering the finite population to be a sample from an infinite superpopulation. In the latter we assume that survey variable $Y$ has a continuous cumulative distribution function so ${q}_{0}$ results in a unique $\alpha $ quantile. We get for $\delta \mathrm{>0}$

$$P\left({\overline{r}}_{n}\left({q}_{0}\right)\mathrm{<}{\overline{r}}_{n}\left({q}_{0}-\delta \right)\right)\iff P\left(\frac{1}{N}{\displaystyle \sum _{j\mathrm{=1}}^{n}}\text{\hspace{0.17em}}\frac{1}{{\pi}_{j}}\left\{{r}_{j}\left({q}_{0}\right)-{r}_{j}\left({q}_{0}-\delta \right)\right\}\mathrm{<0}\right)\mathrm{.}$$

Note that the argument in the probability statement is a design-consistent estimate for ${\overline{R}}_{N}\left({q}_{0}\right)-{\overline{R}}_{N}\left({q}_{0}-\delta \right),$ which is less than zero since ${q}_{0}$ is the minimum of ${\overline{R}}_{N}(\cdot ).$ Hence, the probability tends to one in the sense of design consistency defined in Isaki and Fuller (1982). The same holds of course for $\delta \mathrm{<0.}$ With this statement we may conclude that the estimated minimum ${\widehat{q}}_{0}\mathrm{=}\text{arg}\text{\hspace{0.17em}}\mathrm{min}{\displaystyle {\sum}_{j\mathrm{=1}}^{n}}1/{\pi}_{j}\text{\hspace{0.17em}}{r}_{j}\left(q\right)$ is a design-consistent estimate for ${q}_{0}$ so that ${\widehat{Q}}_{N}\left(\alpha \right)$ in (2.2) is in turn design-consistent for ${Q}_{N}\left(\alpha \right).$ It is easily shown that ${\widehat{Q}}_{N}\left(\alpha \right)$ is the inverse of the normed weighted cumulative distribution function

$${\widehat{F}}_{N}\left(y\right)\mathrm{:=}\frac{{\displaystyle \sum _{j\mathrm{=1}}^{n}}\text{\hspace{0.17em}}1\left\{{y}_{j}\le y\right\}/{\pi}_{j}}{{\displaystyle \sum _{j\mathrm{=1}}^{n}}\text{\hspace{0.17em}}1/{\pi}_{j}}$$

using the same notation as in Kuk (1988). Note that ${\widehat{F}}_{N}\left(y\right)$ is the Hajek (1971) estimate of the cumulative distribution function (see also Rao and Wu 2009) and as such not a Horvitz-Thompson estimate. As a consequence ${\widehat{Q}}_{N}\left(\alpha \right)$ is not design-unbiased. Nonetheless, ${\widehat{F}}_{N}\left(y\right)$ is a valid distribution function, and hence it can be considered as normalized version of the Lahiri or Horvitz-Thompson estimator of the distribution function (see Lahiri 1951) which is denoted by

$${\widehat{F}}_{L}\left(y\right)\mathrm{:=}\frac{1}{N}{\displaystyle \sum _{j\mathrm{=1}}^{n}}1/{\pi}_{j}1\left\{{y}_{j}\le y\right\}.$$

Kuk (1988) proposes to replace ${\widehat{F}}_{L}(\cdot )$ with alternative estimates of the distribution function: Instead of estimating the distribution function itself he suggests to estimate the complementary proportion ${\widehat{S}}_{R}\left(y\right)$ which then leads to the estimate ${\widehat{F}}_{R}\left(y\right)$ defined through

$${\widehat{F}}_{R}\left(y\right)\mathrm{=1}-{\widehat{S}}_{R}\left(y\right)\mathrm{=1}-\frac{1}{N}{\displaystyle \sum _{j\mathrm{=1}}^{n}}1/{\pi}_{j}1\left\{{y}_{j}\mathrm{>}y\right\}\mathrm{.}$$

Resulting directly from these definitions we can express ${\widehat{F}}_{R}(\cdot )$ in terms of ${\widehat{F}}_{N}(\cdot )$ through

$${\widehat{F}}_{R}\mathrm{=1}-\frac{1}{N}{\displaystyle \sum _{j\mathrm{=1}}^{n}}1/{\pi}_{j}+{\widehat{F}}_{L}\text{\hspace{1em}}\text{and}\text{\hspace{1em}}{\widehat{F}}_{L}\mathrm{=}\frac{{\displaystyle \sum _{j\mathrm{=1}}^{n}}1/{\pi}_{j}}{N}{\widehat{F}}_{N}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(2.3)$$

Kuk (1988) shows that, under sampling with unequal probabilities, estimation of the median derived from ${\widehat{F}}_{R}$ outperforms median estimates derived from ${\widehat{F}}_{N}$ and ${\widehat{F}}_{L}$ in terms of mean squared estimation error. Note that the estimators ${\widehat{F}}_{N},$ ${\widehat{F}}_{L}$ and ${\widehat{F}}_{R}$ coincide in the case of simple random sampling without replacement where ${\pi}_{j}\mathrm{=}\pi \mathrm{=}n/N.$

- Date modified: