# A comparison between nonparametric estimators for finite population distribution functions 4. Design-based propertiesA comparison between nonparametric estimators for finite population distribution functions 4. Design-based properties

In the previous section we have shown that the model-based estimators $\stackrel{^}{F}\left(t\right)$ and ${\stackrel{^}{F}}^{*}\text{​}\left(t\right)$ are asymptotically model-unbiased and model mean square error consistent. However, they are not design-unbiased in general and therefore they should not be used when the sample inclusion probabilities are not constant. In these cases the generalized difference estimators $\stackrel{˜}{F}\left(t\right)$ and ${\stackrel{˜}{F}}^{*}\text{​}\left(t\right)$ should be used. In fact, it follows from the results in Breidt and Opsomer (2000) that under fairly general conditions $\stackrel{˜}{F}\left(t\right)$ is asymptotically design-unbiased and that its design mean square error is given by

${E}_{d}\left({|\stackrel{˜}{F}\left(t\right)-{F}_{N}\left(t\right)\text{\hspace{0.17em}}|}^{2}\right)=\frac{1}{{N}^{2}}\sum _{i,j\in U}\frac{{\pi }_{i,j}-{\pi }_{i}{\pi }_{j}}{{\pi }_{i}{\pi }_{j}}\left[I\left({y}_{i}\le t\right)-{\overline{G}}_{i}\left(t\right)\right]\left[I\left({y}_{j}\le t\right)-{\overline{G}}_{j}\left(t\right)\right]+o\left({n}^{-1}\right),$

where ${E}_{d}\left(\cdot \right)$ denotes expectation with respect to the sample design, ${\pi }_{i,j}$ denotes the joint sample inclusion probability for units $i$ and $j$ (it is understood that ${\pi }_{i,i}={\pi }_{i}\right),$ and where

${\overline{G}}_{i}\left(t\right):=\sum _{j\in U}{\overline{w}}_{i,j}I\left({y}_{j}\le t\right).$

The regression weights ${\overline{w}}_{i,j}$ in the definition of ${\overline{G}}_{i}\left(t\right)$ refer to the whole finite population $U$ and are given by

where

${\overline{M}}_{r,s}\left(x\right):=\sum _{k\in U}\frac{1}{N\lambda }K\left(\frac{x-{x}_{k}}{\lambda }\right){\left(\frac{x-{x}_{k}}{\lambda }\right)}^{r},\text{ }\text{ }\text{ }r=0,1,2.$

Moreover, according to Breidt and Opsomer (2000),

$\stackrel{˜}{V}\left(\stackrel{˜}{F}\left(t\right)\right):=\frac{1}{{N}^{2}}\sum _{i,j\in s}\frac{{\pi }_{i,j}-{\pi }_{i}{\pi }_{j}}{{\pi }_{i,j}{\pi }_{i}{\pi }_{j}}\left[I\left({y}_{i}\le t\right)-{\stackrel{˜}{G}}_{i}\left(t\right)\right]\left[I\left({y}_{j}\le t\right)-{\stackrel{˜}{G}}_{j}\left(t\right)\right]$

is a consistent estimator for the design mean square error of $\stackrel{˜}{F}\left(t\right).$

Unfortunately the results in Breidt and Opsomer (2000) cannot be applied to the generalized difference estimator ${\stackrel{˜}{F}}^{*}\text{​}\left(t\right)$ as well, since the latter estimator does not fall into the class of local polynomial regression estimators due to the presence of the regression function estimators ${\stackrel{˜}{m}}_{i}$ and ${\stackrel{˜}{m}}_{j}$ inside the indicator functions in the fitted values ${\stackrel{˜}{G}}_{i}^{*}\text{​}\left(t\right).$ However, the results for $\stackrel{˜}{F}\left(t\right)$ suggest that in large samples ${\stackrel{˜}{G}}_{i}^{*}\left(t\right)$ and

${\overline{G}}_{i}^{*}\left(t\right):=\sum _{j\in U}{\overline{w}}_{i,j}I\left({y}_{j}-{\overline{m}}_{j}\le t-{\overline{m}}_{i}\right),$

where ${\overline{m}}_{i}:={\sum }_{j\in U}{\overline{w}}_{i,j}{y}_{j},$ are approximately the same, and that

${E}_{d}\left({|\text{\hspace{0.17em}}{\stackrel{˜}{F}}^{*}\left(t\right)-{F}_{N}\left(t\right)\text{\hspace{0.17em}}|}^{2}\right)=\frac{1}{{N}^{2}}\sum _{i,j\in U}\frac{{\pi }_{i,j}-{\pi }_{i}{\pi }_{j}}{{\pi }_{i}{\pi }_{j}}\left[I\left({y}_{i}\le t\right)-{\overline{G}}_{i}^{*}\left(t\right)\right]\left[I\left({y}_{j}\le t\right)-{\overline{G}}_{j}^{*}\left(t\right)\right]+o\left({n}^{-1}\right)$

Based on this conjecture, we tested

$\stackrel{˜}{V}\left({\stackrel{˜}{F}}^{*}\left(t\right)\right):=\frac{1}{{N}^{2}}\sum _{i,j\in s}\frac{{\pi }_{i,j}-{\pi }_{i}{\pi }_{j}}{{\pi }_{i,j}{\pi }_{i}{\pi }_{j}}\left[I\left({y}_{i}\le t\right)-{\stackrel{˜}{G}}_{i}^{*}\left(t\right)\right]\left[I\left({y}_{j}\le t\right)-{\stackrel{˜}{G}}_{j}^{*}\left(t\right)\right].$

as estimator for the design mean square error of the generalized difference estimator ${\stackrel{˜}{F}}^{*}\text{​}\left(t\right)$ in the simulation study of the following section.

Date modified: