# A comparison between nonparametric estimators for finite population distribution functions 3. Model-based propertiesA comparison between nonparametric estimators for finite population distribution functions 3. Model-based properties

In this section we provide asymptotic expansions for the model bias and the model variance of the estimators introduced in the previous section. The expansions are based on the following assumptions:

• (C1)    $N\to \infty$ and the sequence of population ${x}_{i}-$ values and of sample designs are such that

${H}_{N,s}\left(x\right):=\frac{1}{n}\sum _{i\in s}I\left({x}_{i}\le x\right)$

• and

${H}_{N,\overline{s}}\left(x\right):=\frac{1}{N-n}\sum _{i\notin s}I\left({x}_{i}\le x\right)$

• converge to absolutely continuous distribution functions ${H}_{s}\left(x\right):={\int }_{a}^{x}{h}_{s}\left(z\right)dz$ and ${H}_{\overline{s}}\left(x\right):={\int }_{a}^{x}{h}_{\text{\hspace{0.17em}}\overline{s}}\left(z\right)dz,$ respectively. The support of ${H}_{s}\left(x\right)$ and ${H}_{\overline{s}}\left(x\right)$ is given by a bounded interval $\left[a,b\right]$ and the density functions ${h}_{s}\left(x\right)$ and ${h}_{\text{\hspace{0.17em}}\overline{s}}\left(x\right)$ have bounded first derivatives for $x\in \left(a,b\right).$ ${h}_{s}\left(x\right)$ is bounded away from zero.
• (C2)    The kernel function $K\left(u\right)$ is symmetric, has support on $\left[-1,1\right]$ and has bounded derivative for $u\in \left(-1,1\right).$ The bandwidth sequence $\lambda$ goes to zero slow enough to make sure that

• is of order $o\left(\lambda \right).$
• (C3)    The population ${y}_{i}-$ values are generated from model (2.1). The function $m\left(x\right)$ is such that

• for some $\delta >0,$ and the family of error component distribution functions $G\left(\epsilon \text{\hspace{0.17em}}|x\right)$ is such that

$\begin{array}{l}|\begin{array}{l}\text{\hspace{0.17em}}G\left(\epsilon \text{\hspace{0.17em}}|x\right)-G\left({\epsilon }_{0}|{x}_{0}\right)-{G}^{\left(1,0\right)}\left({\epsilon }_{0}|{x}_{0}\right)\left(\epsilon -{\epsilon }_{0}\right)-{G}^{\left(0,1\right)}\left({\epsilon }_{0}|{x}_{0}\right)\left(x-{x}_{0}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}-\frac{1}{2}\left({G}^{\left(2,0\right)}\left({\epsilon }_{0}|{x}_{0}\right){\left(\epsilon -{\epsilon }_{0}\right)}^{2}+2{G}^{\left(1,1\right)}\left({\epsilon }_{0}|{x}_{0}\right)\left(\epsilon -{\epsilon }_{0}\right)\left(x-{x}_{0}\right)+{G}^{\left(0,2\right)}\left({\epsilon }_{0}|{x}_{0}\right){\left(x-{x}_{0}\right)}^{2}\right)\text{\hspace{0.17em}}\end{array}|\\ \text{ }\text{ }\le C\left({|\text{\hspace{0.17em}}\epsilon -{\epsilon }_{0}|}^{\text{\hspace{0.17em}}2+\delta }+{|\text{\hspace{0.17em}}x-{x}_{0}|}^{\text{\hspace{0.17em}}2+\delta }\right)\end{array}$

• for some $C>0$ and some $\delta >0,$ where

Assumption (C1) poses a restriction on how the sample and nonsample ${x}_{i}-$ values are generated. Together with assumption (C2) it makes sure that the estimation errors of the kernel density estimators for ${h}_{s}\left(x\right)$ and ${h}_{\text{\hspace{0.17em}}\overline{s}}\left(x\right)$ go to zero uniformly for $x\in \left[a+\lambda ,b-\lambda \right]$ and that they are uniformly bounded for $x\in \left[a,b\right].$ Replacing (C1) by more specific assumptions may allow for relaxing (C2) and for improving the uniform convergence rate for the estimation error of the kernel density estimators (see for example the results in Hansen 2008). Assumption (C3) is finally needed to make sure that the model mean square errors of the two estimators converge to zero. It can be relaxed at the cost of slowing down the convergence rates. In addition to assumptions (C1) to (C3) we shall also need the following assumption (C4) to make sure that the model mean square errors of the generalized difference estimators go to zero:

• (C4)    The first order sample inclusion probabilities are given by

• where ${n}^{*}$ is the expected sample size and $\pi \left(x\right)$ is a function which is bounded away from zero and has bounded first derivative for $x\in \left(a,b\right).$

Proposition 1. Under assumptions (C1) to (C3) it follows that:

$\begin{array}{ll}E\left(\stackrel{^}{F}\left(t\right)-{F}_{N}\left(t\right)\right)\hfill & ={\lambda }^{2}\frac{N-n}{N}\frac{{\mu }_{2}}{2{\mu }_{0}}{\int }_{a}^{b}\left[{G}^{\left(2,0\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right){\left({m}^{\prime }\left(x\right)\right)}^{2}-{G}^{\left(1,0\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right){{m}^{\prime \prime }}^{}\left(x\right)\hfill \\ \hfill & \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}-2{G}^{\left(1,1\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right){m}^{\prime }\left(x\right)+{G}^{\left(0,2\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right)\right]{h}_{\text{\hspace{0.17em}}\overline{s}}\left(x\right)dx+o\left({\lambda }^{2}\right)\hfill \end{array}$

and

$\begin{array}{ll}\text{var}\left(\stackrel{^}{F}\left(t\right)-{F}_{N}\left(t\right)\right)\hfill & =\frac{1}{n}{\left(\frac{N-n}{N}\right)}^{2}{\int }_{a}^{b}\left[G\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right)-{G}^{2}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right)\right]\left[{h}_{\text{\hspace{0.17em}}\overline{s}}\left(x\right)/{h}_{s}\left(x\right)\right]{h}_{\text{\hspace{0.17em}}\overline{s}}\left(x\right)dx\hfill \\ \hfill & \text{\hspace{0.17em}}+\frac{1}{N-n}{\left(\frac{N-n}{N}\right)}^{2}{\int }_{a}^{b}\left[G\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right)-{G}^{2}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right)\right]{h}_{\text{\hspace{0.17em}}\overline{s}}\left(x\right)dx+o\left({n}^{-1}\right),\hfill \end{array}$

where ${\mu }_{r}:={\int }_{-1}^{-1}K\left(u\right){u}^{r}du$ for $r=0,1,2.$

Adding assumption (C4) it can be shown that

$\begin{array}{ll}E\left(\stackrel{˜}{F}\left(t\right)-{F}_{N}\left(t\right)\right)\hfill & ={\lambda }^{2}\frac{N-n}{N}\frac{{\mu }_{2}}{2{\mu }_{0}}{\int }_{a}^{b}\left[{G}^{\left(2,0\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right){\left({m}^{\prime }\left(x\right)\right)}^{2}-{G}^{\left(1,0\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right){{m}^{\prime \prime }}^{}\left(x\right)\hfill \\ \hfill & \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}-2{G}^{\left(1,1\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right){m}^{\prime }\left(x\right)+{G}^{\left(0,2\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right)\right]h\left(x\right)dx+o\left({\lambda }^{2}\right),\hfill \end{array}$

where

$h\left(x\right):={h}_{\text{\hspace{0.17em}}\overline{s}}\left(x\right)+\left(1-{\pi }^{-1}\left(x\right)\right){h}_{s}\left(x\right),$

and it can be shown that

$\text{var}\left(\stackrel{˜}{F}\left(t\right)-{F}_{N}\left(t\right)\right)=\text{var}\left(\stackrel{^}{F}\left(t\right)-{F}_{N}\left(t\right)\right)+o\left({n}^{-1}\right).$

Proposition 2. Under assumptions (C1) to (C3) and assuming that

• i)        the function

${\sigma }^{2}\left(x\right):={\int }_{-\infty }^{\infty }{\epsilon }^{2}dG\left(\epsilon \text{\hspace{0.17em}}|x\right)$

• has bounded first derivative for $x\in \left(a,b\right)$
• ii)

$\underset{x\in \left[a,b\right]}{\mathrm{sup}}\text{\hspace{0.17em}}{\int }_{-\infty }^{\infty }{\epsilon }^{4}dG\left(\epsilon \text{\hspace{0.17em}}|x\right)<\infty ,$

it can be shown that

$\begin{array}{ll}E\left({\stackrel{^}{F}}^{*}\text{​}\left(t\right)-{F}_{N}\left(t\right)\right)\hfill & ={\lambda }^{2}\frac{N-n}{N}\frac{{\mu }_{2}}{{\mu }_{0}}{\int }_{a}^{b}{G}^{\left(0,2\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right){h}_{\text{\hspace{0.17em}}\overline{s}}\left(x\right)dx\hfill \\ \hfill & \text{\hspace{0.17em}}+\frac{1}{n\lambda }\frac{N-n}{N}\left[\frac{K\left(0\right)-\kappa }{{\mu }_{0}}{\int }_{a}^{b}{G}^{\left(1,0\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right)\left(t-m\left(x\right)\right){h}_{s}^{-1}\left(x\right){h}_{\text{\hspace{0.17em}}\overline{s}}\left(x\right)dx\hfill \\ \hfill & \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+\frac{\kappa -\theta }{{\mu }_{0}^{2}}{\int }_{a}^{b}{G}^{\left(2,0\right)}\left(t-m\left(x\right)\text{\hspace{0.17em}}|x\right){\sigma }^{2}\left(x\right){h}_{s}^{-1}\left(x\right){h}_{\text{\hspace{0.17em}}\overline{s}}\left(x\right)dx\right]+o\left({\lambda }^{2}+{\left(n\lambda \right)}^{-1}\right),\hfill \end{array}$

where $\kappa :={\int }_{-1}^{1}{K}^{2}\left(u\right)du$ and $\theta :={\int }_{-1}^{1}K\left(v\right){\int }_{-1}^{1}K\left(u+v\right)K\left(u\right)dudv,$ and it can be shown that

$\text{var}\left({\stackrel{^}{F}}^{*}\text{​}\left(t\right)-{F}_{N}\left(t\right)\right)=\text{var}\left(\stackrel{^}{F}\left(t\right)-{F}_{N}\left(t\right)\right)+o\left({n}^{-1}+{\lambda }^{5}\right).$

Adding assumption (C4) it can also be shown that

$E( F ˜ * ( t )− F N ( t ) ) = λ 2 N−n N μ 2 μ 0 ∫ a b G ( 0,2 ) ( t−m( x ) |x )h( x )dx + 1 nλ N−n N [ K( 0 )−κ μ 0 ∫ a b G ( 1,0 ) ( t−m( x ) |x )( t−m( x ) ) h s −1 ( x )h( x )dx + κ−θ μ 0 2 ∫ a b G ( 2,0 ) ( t−m( x ) |x ) σ 2 ( x ) h s −1 ( x )h( x )dx ] +o( λ 2 + ( nλ ) −1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFjpeea0xe9Lqpe0x e9q8qqvqFr0dXdbrVc=b0P0xb9peee0hXddrpe0=1qpeea0=yrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaafaqaaeWaca aabaGaamyramaabmaabaGabmOrayaaiaWaaWbaaSqabeaacaaIQaaa aOWaaeWaaeaacaWG0baacaGLOaGaayzkaaGaeyOeI0IaamOramaaBa aaleaacaWGobaabeaakmaabmaabaGaamiDaaGaayjkaiaawMcaaaGa ayjkaiaawMcaaaqaaiaai2dacqaH7oaBdaahaaWcbeqaaiaaikdaaa GcdaWcaaqaaiaad6eacqGHsislcaWGUbaabaGaamOtaaaadaWcaaqa aiabeY7aTnaaBaaaleaacaaIYaaabeaaaOqaaiabeY7aTnaaBaaale aacaaIWaaabeaaaaGcdaWdXaqaaiaadEeadaahaaWcbeqaamaabmaa baGaaGimaiaaiYcacaaIYaaacaGLOaGaayzkaaaaaaqaaiaadggaae aacaWGIbaaniabgUIiYdGcdaqadaqaamaaeiaabaGaamiDaiabgkHi Tiaad2gadaqadaqaaiaadIhaaiaawIcacaGLPaaacaaMc8oacaGLiW oacaWG4baacaGLOaGaayzkaaGaamiAamaabmaabaGaamiEaaGaayjk aiaawMcaaiaadsgacaWG4baabaaabaGaaGjbVlabgUcaRmaalaaaba GaaGymaaqaaiaad6gacqaH7oaBaaWaaSaaaeaacaWGobGaeyOeI0Ia amOBaaqaaiaad6eaaaWaamqaaeaadaWcaaqaaiaadUeadaqadaqaai aaicdaaiaawIcacaGLPaaacqGHsislcqaH6oWAaeaacqaH8oqBdaWg aaWcbaGaaGimaaqabaaaaOWaa8qmaeaacaWGhbWaaWbaaSqabeaada qadaqaaiaaigdacaaISaGaaGimaaGaayjkaiaawMcaaaaaaeaacaWG HbaabaGaamOyaaqdcqGHRiI8aOWaaeWaaeaadaabcaqaaiaadshacq GHsislcaWGTbWaaeWaaeaacaWG4baacaGLOaGaayzkaaGaaGPaVdGa ayjcSdGaamiEaaGaayjkaiaawMcaamaabmaabaGaamiDaiabgkHiTi aad2gadaqadaqaaiaadIhaaiaawIcacaGLPaaaaiaawIcacaGLPaaa caWGObWaa0baaSqaaiaadohaaeaacqGHsislcaaIXaaaaOWaaeWaae aacaWG4baacaGLOaGaayzkaaGaamiAamaabmaabaGaamiEaaGaayjk aiaawMcaaiaadsgacaWG4baacaGLBbaaaeaaaqaabeqaaiaaysW7da WacaqaaiaaysW7caaMe8UaaGjbVlaaysW7caaMe8UaaGjbVlaaysW7 caaMe8UaaGjbVlaaysW7caaMe8UaaGjbVlaaysW7caaMe8UaaGjbVl aaysW7cqGHRaWkdaWcaaqaaiabeQ7aRjabgkHiTiabeI7aXbqaaiab eY7aTnaaDaaaleaacaaIWaaabaGaaGOmaaaaaaGcdaWdXaqaaiaadE eadaahaaWcbeqaamaabmaabaGaaGOmaiaaiYcacaaIWaaacaGLOaGa ayzkaaaaaaqaaiaadggaaeaacaWGIbaaniabgUIiYdGcdaqadaqaam aaeiaabaGaamiDaiabgkHiTiaad2gadaqadaqaaiaadIhaaiaawIca caGLPaaacaaMc8oacaGLiWoacaWG4baacaGLOaGaayzkaaGaeq4Wdm 3aaWbaaSqabeaacaaIYaaaaOWaaeWaaeaacaWG4baacaGLOaGaayzk aaGaamiAamaaDaaaleaacaWGZbaabaGaeyOeI0IaaGymaaaakmaabm aabaGaamiEaaGaayjkaiaawMcaaiaadIgadaqadaqaaiaadIhaaiaa wIcacaGLPaaacaWGKbGaamiEaaGaayzxaaaabaGaaGjbVlabgUcaRi aad+gadaqadaqaaiabeU7aSnaaCaaaleqabaGaaGOmaaaakiabgUca RmaabmaabaGaamOBaiabeU7aSbGaayjkaiaawMcaamaaCaaaleqaba GaeyOeI0IaaGymaaaaaOGaayjkaiaawMcaaaaaaaaa@F971@$

and that

$\text{var}\left({\stackrel{˜}{F}}^{*}\text{​}\left(t\right)-{F}_{N}\left(t\right)\right)=\text{var}\left(\stackrel{^}{F}\left(t\right)-{F}_{N}\left(t\right)\right)+o\left({n}^{-1}+{\lambda }^{5}\right).$

The proofs of the Propositions are given in the Appendix. Dorfman and Hall (1993) derived similar expansions for the Kuo estimator with local constant regression weights instead of local linear ones.

Note that in view of the asymptotic expansions it is possible to choose bandwidth sequences $\lambda$ in such a way as to make sure that the squares of the model biases are of smaller order of magnitude than the corresponding model variances. For the estimators based on the fitted values of Kuo this is achieved whenever $\lambda =o\left({n}^{-1/4}\right),$ while for the estimators with the modified fitted values this requires that $\lambda$ goes to zero faster than $O\left({n}^{-1/4}\right)$ and slower than $O\left({n}^{-1/2}\right).$ The convergence rates for the model biases of the latter estimators are optimized when $\lambda =O\left({n}^{-1/3}\right)$ and in this case the resulting model biases are both of order $O\left({n}^{-2/3}\right).$ The model biases for the estimators based on the fitted values of Kuo can be made to converge much faster, depending on the sequences ${H}_{N,s}\left(x\right)$ and ${H}_{N,\text{\hspace{0.17em}}\overline{s}}\left(x\right)$ and on the bandwidth sequence  $\lambda .$

Given the above considerations concerning the model biases and given the fact that the leading terms in the model variances are the same for both types of fitted values, it would be of interest to know the second order terms in the model variances in order to establish which estimator is more efficient from the model-based perspective. The proofs in the Appendix suggest however that the second order terms depend on more specific assumptions than (C1) to (C3) and that, in particular for the estimators based on the modified fitted values, they are difficult to determine.

Date modified: