Browse by

7. Conclusions

Piero Demetrio Falorsi and Paolo Righi

The paper proposes a new approach for defining the optimal inclusion probabilities in various survey contexts, which are characterized by the need to disseminate survey estimates of prefixed accuracy, for a multiplicity of both variables and domains of interest.

This paper’s main contribution is the practical computation of these probabilities by means of a new algorithm, which is suitable for a general multi-way sampling design in which the standard stratified sampling represents a special case. The proposed approach, the algorithm and the final computation are domain- and variable-driven.

In our framework, the domain membership indicator variables are assumed to be known, while the variables of interest are not known. The procedure is, then, applied on the predicted values of the characteristics of interest via a superpopulation model, and the algorithm enables taking into account model uncertainty; this reflects the non-knowledge of the values of variables of interest. Using the Anticipated Variance as the measure of the estimators’ precision, this approach overcomes the limits of the standard algorithms for the sample allocation, in which the variables of interest driving the solution are assumed to be known.

The proposed algorithm exploits standard procedure, but does present some computational innovations which may be useful for dealing with the complexity deriving from the fact that the Anticipated Variances are implicit functions of the inclusion probabilities. The algorithm was tested on simulated and real survey data, to evaluate its performance and properties. The results of a small set of experiments are presented here. They confirm an improvement, in terms of efficiency, of the sampling strategy. A natural generalization of the case examined here may be developed by considering, as known during the design planning stage, the indicators of the domains and other quantitative independent variables. We note that the Anticipated Variance considering only the domain indicators is larger than the Anticipated Variance of this more general case. Thus, our solution represents an upper (and somehow robust) boundary solution in the design phase. Furthermore, the algorithmic solution can be easily adapted to this more general situation.

Acknowledgements

This research was funded by the partnership of the Global Strategy to improve Agricultural and Rural Statistics: http://www.fao.org/economic/ess/ess-capacity/ess-strategy/en/.

Appendix

Appendix A1

AV of the HT estimator

Let us consider the residual $η_{(d r) k}$ as expressed by equation (3.5), and replace the term $y_{r k}$ with ${\tilde{y}}_{r k} + u_{r k},$ thus obtaining

$η_{(d r) k} = ({\tilde{y}}_{r k} + u_{r k}) γ_{d k} - π_{k} {δ^{'}}_{k} {[A (π)]}^{- 1} \sum_{j \in U} π_{j} δ_{j} ({\tilde{y}}_{r j} + u_{r j}) γ_{d j} (1 / π_{j} - 1) . (A 1.1)$

The weighted least predictions of ${\tilde{y}}_{r k} γ_{d k}$ and $u_{r k} γ_{d k},$ with predictors $π_{k} δ_{k}$ and weights $1 / π_{k} - 1,$ are

${\hat{\tilde{y}}}_{(d r) k} = π_{k} a_{(d r) k} (A 1.2)$

and

${\hat{u}}_{(d r) k} = π_{k} {δ^{'}}_{k} {[A (π)]}^{- 1} \sum_{j \in U} π_{j} δ_{j} u_{r j} γ_{d j} (1 / π_{j} - 1), (A 1.3)$

with

$a_{(d r) k} (π) = {δ^{'}}_{k} {[A (π)]}^{- 1} \sum_{j \in U} π_{j} δ_{j} {\tilde{y}}_{r j} γ_{d k} (1 / π_{j} - 1) . (A 1.4)$

Using the formulae (A1.2) and (A1.3), the expression (A1.1) may be reformulated as $η_{(d r) k} = ({\tilde{y}}_{r k} + u_{r k}) γ_{d k} - [{\hat{\tilde{y}}}_{(d r) k} + {\hat{u}}_{(d r) k}] .$ Therefore, the model expectation of $η_{(d r) k}^{2}$ is

$E_{M} (η_{(d r) k}^{2}) = {({\tilde{y}}_{r k} γ_{d k} - {\hat{\tilde{y}}}_{(d r) k})}^{2} + E_{M} [{(u_{r k} γ_{d k} - {\hat{u}}_{(d r) k})}^{2}] + Mean zero terms, (A 1.5)$

because $E_{M} (u_{r k}) = 0.$ Furthermore,

$E_{M} [{(u_{r k} γ_{d k} - {\hat{u}}_{(d r) k})}^{2}] = σ_{r k}^{2} γ_{d k} + E_{M} {({\hat{u}}_{(d r) k})}^{2} - 2 E_{M} (u_{r k} γ_{d k}, {\hat{u}}_{(d r) k}), (A 1.6)$

where $E_{M} (u_{r k} γ_{d k} {\hat{u}}_{(d r) k}) = π_{k} b_{(d r) k} (π)$ and $E_{M} {({\hat{u}}_{(d r) k})}^{2} = π_{k}^{2} c_{(d r) k} (π),$ with

$b_{(d r) k} (π) = {δ^{'}}_{k} {[A (π)]}^{- 1} δ_{k} σ_{r k}^{2} γ_{d k} (1 - π_{k}) (A 1.7)$

and

$c_{(d r) k} (π) = {δ^{'}}_{k} {[A (π)]}^{- 1} [\sum_{j \in U} δ_{j} {δ^{'}}_{j} σ_{r j}^{2} γ_{d j} {(1 - π_{j})}^{2}] {[A (π)]}^{- 1} δ_{k} . (A 1.8)$

Expression (4.5) is easily derived by plugging expressions from (A1.2) to (A1.8) into equation (4.3).

Appendix A2

Convergence of the algorithm

The optimization problem (5.1) is solved by two nested fixed point iterations. Given an unknown vector $x$ of dimension $q,$ the fixed point iteration chooses an initial guess ${}^{(0)}x .$ Then, it computes subsequent iterates by ${}^{(τ + 1)}x = g ({}^{(τ)}x),$ with $τ = 1, 2, \dots,$ with $g (\cdot)$ being a system of $q$ updating equations. The multivariate function $g$ has a fixed point in a domain $Q \subseteq ℜ^{q}$ if $g$ maps $Q$ in $Q .$ Let $J_{g} (x)$ be the Jacobian matrix of first partial derivate of $g$ evaluated at $x,$ if there exists a constant $ρ < 1$ such that, in some natural matrix norm, $‖ J_{g} (x) ‖ \leq ρ, x \in Q,$ $g$ has a unique fixed point $x^{*} \in Q,$ and the fixed point iteration is guaranteed to converge to $x^{*}$ for any initial guess chosen in $Q .$ As regards the proposed algorithm, the convergence of the IL and OL is obtained when the terms ${}^{(α τ)}A {AV}_{3 (d r)}$ converge to the fixed point. This means that the vectors ${}^{(α)}π$ and ${}^{(α τ)}π$ do not change in the OL and IL iterations. The demonstration below considers the method proposed by Chromy (1987) to solve the LCSP of system (5.7), and makes use of some reasonable assumptions: (1) ${\hat{u}}_{(d r) k} ≅ 0;$ (2) $[N / (N - H)] ≅ 1;$ (3) ${\hat{\tilde{y}}}_{r k} ≅ {\tilde{y}}_{r k};$ (4) ${}^{(α)}π_{k} ≅ {}^{(α τ)}Δ {}^{(α τ)}π_{k}$ with $0 < {}^{(α τ)}Δ \leq 1;$ (5) $c_{k} ≅ \bar{c} .$ Assumption (1) corresponds to the upward approximation of the Anticipated Variance, given in Remark 4.1, and implies that $b_{(d r) k} ({}^{(α)}π) = c_{(d r) k} ({}^{(α)}π) = 0.$ Assumption (3) implies that $a_{(d r) k} ({}^{(α)}π) {\tilde{y}}_{r k} γ_{d k} ≅ {\tilde{y}}_{r k}^{2} γ_{d k} / {}^{(α)}π_{k} .$ Assumption (4) states that the structure of the inclusion probabilities remains roughly constant in the different IL iterations. The assumption becomes reasonable considering that the updating equation A2.2 below (of a given inclusion probability) is essentially determined by the variance threshold that requires the largest sample size. It is plausible to hypothesize that this threshold remains more or less the same in the subsequent IL iterations of a given OL.

Proof of convergence of the Inner Loop. By reformulating expression (4.6) in accordance with the assumptions from (1) to (4),

${}^{(α τ + 1)}A {AV}_{3 (d r)} = \sum_{k \in U} [(\frac{1}{{}^{(α τ + 1)}π_{k}} - 1) (2 \frac{{\tilde{y}}_{r k}^{2} γ_{d k}}{{}^{(α τ + 1)}Δ} - \frac{{\tilde{y}}_{r k}^{2} γ_{d k}}{{}^{(α τ + 1)}Δ^{2}})] . (A 2.1)$

Considering in problem (5.7) that the ${}^{(a τ)}A {AV}_{3 (d r)}$ values are fixed, each value of the vector ${}^{(α τ + 1)}π$ is obtained as a solution of the LCSP with the Chromy algorithm. Denote with $α τ v *$ the iteration of the Chromy algorithm into which it converges, where ${}^{(α τ v * + 1)}π ≅ {}^{(α τ v *)}π .$ Then, the IL updates the generic probability in accordance with the expression

${}^{(α τ + 1)}π_{k} = {[\sum_{(d r)} {}^{(α τ v * + 1)}ϕ_{(d r)} \frac{({\tilde{y}}_{r k}^{2} + σ_{r k}^{2}) γ_{d k}}{\bar{c}}]}^{1 / 2}, (A 2.2)$

where the right-hand term represents the updating formula of the Chromy algorithm, and $\sum_{(d r)}$ stands for $\sum_{d = 1}^{D} \sum_{r = 1}^{R},$ and ${}^{(α τ v * + 1)}ϕ_{(d r)}$ is the generalized Lagrange multiplier, where

$\begin{array}{l} {}^{(α τ v * + 1)}ϕ_{(d r)} & = & {}^{(α τ v *)}ϕ_{(d r)} {[\frac{{}^{(α τ v *)}V_{(d r)}}{{\overset{⃛}{V}}_{(d r)} + {}^{(α τ)}A {AV}_{3 (d r)}}]}^{2}, \\ {}^{(α τ v *)}V_{(d r)} & = & \sum_{k \in U} \frac{({\tilde{y}}_{r k}^{2} + σ_{r k}^{2}) γ_{d k}}{{}^{(α τ v *)}π_{k}} \end{array} (A 2.3)$

and

${\overset{⃛}{V}}_{(d r)} = {\bar{V}}_{(d r)} + \sum_{k \in U} ({\tilde{y}}_{r k}^{2} + σ_{r k}^{2}) γ_{d k} .$

The Kuhn-Tucker theory states that ${}^{(α τ v *)}ϕ_{(d r)} [{}^{(α τ v *)}V_{(d r)} - ({\overset{⃛}{V}}_{(d r)} + {}^{(α τ)}A V_{3 (d r)})] = 0;$ therefore, ${}^{(α τ v * + 1)}ϕ_{(d r)} = {}^{(α τ v *)}ϕ_{(d r)}$ and ${}^{(α τ v * + 1)}ϕ_{(d r)} > 0$ iff ${}^{(α τ v *)}V_{(d r)} / ({\overset{⃛}{V}}_{(d r)} + {}^{(α τ)}A V_{3 (d r)}) = 1.$ Chromy asserts that few ${}^{(α τ v *)}ϕ_{(d r)}$ $(for r = 1, \dots, R; d = 1, \dots, D)$ are larger than zero, and that in most cases, only one value is strictly positive. Denoting with ${}^{(α τ)}A A V_{3} = {({}^{(α τ)}A {AV}_{3 (11)}, \dots, {}^{(α τ)}A {AV}_{3 (1 R)}, \dots, {}^{(α τ)}A {AV}_{3 (D R)})}^{'},$ we define ${}^{(α τ + 1)}A A V_{3} = g ({}^{(α τ)}A A V_{3})$ as the system of $D \times R$ updating equations where the generic $(\bar{d r})$ equation of the system

$\begin{array}{l} g_{(\bar{d r})} ({}^{(α τ)}A A V_{3}) & ≅ & \sum_{k \in U} (2 \frac{{\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k}}{{}^{(α τ + 1)}Δ} - \frac{{\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k}}{{}^{(α τ + 1)}Δ^{2}}) \\ \times & {{[\sum_{(d r)} {}^{(α τ v *)}ϕ_{(d r)} {[\frac{{}^{(α τ v *)}V_{(d r)}}{{\overset{⃛}{V}}_{(d r)} + {}^{(α τ)}A {AV}_{3 (d r)}}]}^{2} \frac{({\tilde{y}}_{r k}^{2} + σ_{r k}^{2}) γ_{d k}}{\bar{c}}]}^{- 1 / 2} - 1}, (A 2.4) \end{array}$

is obtained by plugging expression (A2.2) into (A2.1). If the convergence is obtained, then in the last iteration, ${}^{(α τ + 1)}A A V_{3} ≅ {}^{(α τ)}A A V_{3} .$ The function of equation (A2.4) is continuous and differentiable. Moreover, it maps onto the interval of the possible values of ${AAV}_{3 (d r)} .$ Then, the IL converges if the following condition is fulfilled:

$‖ J_{g} (A A V_{3}) ‖ \leq 1. (A 2.5)$

The Jacobian matrix is positive semi-defined, and a well-known result states that $trace (J_{g} {J^{'}}_{g}) \leq trace {(J_{g})}^{2} .$ By considering the Frobenius norm ${‖ J_{g} ‖}_{F} = \sqrt{trace (J_{g} {J^{'}}_{g})},$ it is ${‖ J_{g} ‖}_{F} \leq trace (J_{g}) .$ Thus we can take into account the trace of the Jacobian matrix to verify condition (A2.5). Let ${g^{'}}_{(\bar{d r})} = \partial g_{(\bar{d r})} ({}^{(α τ - 1)}A A V_{3 (d r)} / \partial {}^{(α τ - 1)}A {AV}_{3 (\bar{d r})})$ be the $(\bar{d r})$ element of the diagonal of $J_{g} (A A V_{3}) .$ Using the Kuhn-Tucker condition ${}^{(α τ v *)}V_{(d r)} / ({\overset{⃛}{V}}_{(d r)} + {}^{(α τ)}A V_{3 (d r)}) = 1,$

$\begin{array}{l} {g^{'}}_{(\bar{d r})} & = & \sum_{k \in U} (2 \frac{{\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k}}{{}^{(α τ + 1)}Δ} - \frac{{\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k}}{{}^{(α τ + 1)}Δ^{2}}) {[\sum_{(d r)} {}^{(α τ v *)}ϕ_{(d r)} \frac{({\tilde{y}}_{r k}^{2} + σ_{r k}^{2}) γ_{d k}}{\bar{c}}]}^{- 3 / 2} \\ \times & {}^{(α τ v *)}ϕ_{(\bar{d r})} \frac{1}{{}^{(α τ v *)}V_{(\bar{d r})}} \frac{({\tilde{y}}_{\bar{r} k}^{2} + σ_{\bar{r} k}^{2}) γ_{\bar{d} k}}{\bar{c}} . \end{array}$

Since many ${}^{(α τ v *)}ϕ_{(\bar{d r})} = 0$ (Chromy 1987), the respective ${g^{'}}_{(\bar{d r})}$ is null. When ${}^{(α τ v *)}ϕ_{(\bar{d r})} > 0,$ then

$\begin{array}{l} {g^{'}}_{(\bar{d r})} & \leq & \sum_{k \in U} (2 \frac{{\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k}}{{}^{(α τ + 1)}Δ} - \frac{{\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k}}{{}^{(α τ + 1)}Δ^{2}}) {[{}^{(α τ v *)}ϕ_{(\bar{d r})} \frac{({\tilde{y}}_{\bar{r} k}^{2} + σ_{\bar{r} k}^{2}) γ_{\bar{d} k}}{\bar{c}}]}^{- 3 / 2} \times {}^{(α τ v *)}ϕ_{(\bar{d r})} \frac{1}{{}^{(α τ v *)}V_{(\bar{d r})}} \frac{({\tilde{y}}_{\bar{r} k}^{2} + σ_{\bar{r} k}^{2}) γ_{\bar{d} k}}{\bar{c}} \\ = & \sum_{k \in U} (2 \frac{{\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k}}{{}^{(α τ + 1)}Δ} - \frac{{\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k}}{{}^{(α τ + 1)}Δ^{2}}) \frac{1}{\sqrt{{}^{(α τ v *)}ϕ_{(\bar{d r})} \frac{({\tilde{y}}_{\bar{r} k}^{2} + σ_{\bar{r} k}^{2}) γ_{\bar{d} k}}{\bar{c}}} {}^{(α τ v *)}V_{(\bar{d r})}} \\ \leq & \sum_{k \in U} \frac{\frac{{\tilde{y}}_{\bar{r} k} γ_{\bar{d} k}}{{}^{(α τ + 1)}Δ} (2 - \frac{1}{{}^{(α τ + 1)}Δ})}{\sqrt{\bar{c} {}^{(α τ v *)}ϕ_{(\bar{d r})} γ_{\bar{d} k}} {}^{(α τ v *)}V_{(\bar{d r})}} < < 1. \end{array}$

Therefore, the $trace (J_{g})$ should be less than 1.

Proof of convergence of the Outer Loop. Let ${}^{(α τ + 1)}π$ be the fixed point solution of the IL; then, the OL updates the vector ${}^{(α)}π$ with ${}^{(α + 1)}π = {}^{(α τ + 1)}π .$ Under conditions (1), (2) and (3),

${}^{(α + 1)}A {AV}_{3 (d r)} = \sum_{k \in U} (\frac{1}{{}^{(α τ + 1)}π_{k}} - 1) {\tilde{y}}_{r k}^{2} γ_{d k} . (A 2.6)$

Plugging expression (A2.2) into formula (A2.6) when the IL converges, the system of $D \times R$ updating equations of ${}^{(α + 1)}A A V_{3}$ is given by ${}^{(α + 1)}A A V_{3} = j ({}^{(α τ)}A A V_{3}),$ where the generic equation of $j$ is

$\begin{array}{l} {}^{(α + 1)}A {AV}_{3 (d r)} & = & j_{(\bar{d r})} ({}^{(α τ)}A A V_{3}) \\ = & \sum_{k \in U} {\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k} ({[\sum_{(d r)} {}^{(α τ v *)}ϕ_{(d r)} {[\frac{{}^{(α τ v *)}V_{(d r)}}{{\overset{⃛}{V}}_{(\bar{d} r)} + {}^{(α τ)}A {AV}_{3 (\bar{d} r)}}]}^{2} \frac{({\tilde{y}}_{r k}^{2} + σ_{r k}^{2}) γ_{d k}}{\bar{c}}]}^{- 1 / 2} - 1) . (A 2.7) \end{array}$

Denoting with ${}^{(α)}A A V_{3} = {}^{(α τ = 0)}A A V_{3},$ the system j may be expressed in a recursive form

${}^{(α + 1)}A A V_{3} ≅ j (g ({}^{(α τ - 1)}A A V_{3})) = j (g (g (..... g ({}^{(α τ = 0)}A A V_{3})))) = f ({}^{(α)}A A V_{3}),$

with $f (\cdot) = j (g (g (..... g (\cdot))))$ as the system of $D \times R$ updating equations of ${}^{(α + 1)}A A V_{3},$ with respect to the previous values of the OL, ${}^{(α)}A A V_{3} .$ To demonstrate the convergence of OL, it is necessary to demonstrate that the Jacobian norm $‖ J_{f} (A A V_{3}) ‖$ is lower than 1. Using standard results of matrix algebra,

$‖ J_{f} (A A V_{3}) ‖ \leq ‖ J_{j} ({}^{(α τ)}A A V_{3}) ‖ \times ‖ J_{g} ({}^{(α τ - 1)}A A V_{3}) ‖ \times \dots \times ‖ J_{g} ({}^{(α τ = 0)}A A V_{3}) ‖,$

in which the generic norm $‖ J_{g} (\cdot) ‖$ is lesser than 1 (see the IL proof of convergence). Let ${j^{'}}_{(\bar{d r})}$ be the $(\bar{d r})$ element on the diagonal of $J_{j} ({}^{(α τ)}A A V_{3}) .$ It is

$\begin{array}{l} {j^{'}}_{(\bar{d r})} & = & \sum_{k \in U} {\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k} {[\sum_{(d r)} {}^{(α τ v *)}ϕ_{(d r)} \frac{({\tilde{y}}_{r k}^{2} + σ_{r k}^{2}) γ_{d k}}{\bar{c}}]}^{- 3 / 2} \\ \times & {}^{(α τ v *)}ϕ_{(\bar{d r})} \frac{1}{{}^{(α τ v *)}V_{(\bar{d r})}} \frac{({\tilde{y}}_{\bar{r} k}^{2} + σ_{\bar{r} k}^{2}) γ_{\bar{d} k}}{\bar{c}} . (A 2.8) \end{array}$

Therefore, we have

$\begin{array}{l} {j^{'}}_{(\bar{d r})} & \leq & \sum_{k \in U} {\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k} {[{}^{(α τ v *)}ϕ_{(\bar{d r})} \frac{({\tilde{y}}_{\bar{r} k}^{2} + σ_{\bar{r} k}^{2}) γ_{\bar{d} k}}{\bar{c}}]}^{- 3 / 2} {}^{(α τ v *)}ϕ_{(\bar{d r})} \frac{1}{{}^{(α τ v *)}V_{(\bar{d r})}} \frac{({\tilde{y}}_{\bar{r} k}^{2} + σ_{\bar{r} k}^{2}) γ_{\bar{d} k}}{\bar{c}} \\ = & \frac{1}{{}^{(α τ v *)}V_{(\bar{d r})}} \sum_{k \in U} {\tilde{y}}_{\bar{r} k}^{2} γ_{\bar{d} k} {[{}^{(α τ v *)}ϕ_{(\bar{d r})} \frac{({\tilde{y}}_{\bar{r} k}^{2} + σ_{\bar{r} k}^{2}) γ_{\bar{d} k}}{\bar{c}}]}^{- 1 / 2} . \end{array}$

The following inequality holds

${j^{'}}_{(\bar{d r})} < \frac{\sum_{k \in U} {\tilde{y}}_{\bar{r} k} γ_{\bar{d} k}}{\sqrt{\bar{c} {}^{(α τ v *)}ϕ_{(\bar{d r})}} {}^{(α τ v *)}V_{(\bar{d r})}} < < 1.$

Consequently, the norm $‖ J_{j} ({}^{(α τ)}A A V_{3}) ‖ < 1,$ and therefore the OL converges.

Annexe A3

Proof that the approximation of Remark 4.1 is upward

Since ${\hat{u}}_{(d r) k}$ is the weighted least square prediction of $u_{r k} γ_{d k},$ by using a different value of the ${\hat{u}}_{(d r) k},$ such as ${\hat{u}}_{(d r) k} = 0,$ we obtain

$\sum_{k \in U} (1 / π_{k} - 1) E_{M} [{(u_{r k} γ_{d k} - {\hat{u}}_{(d r) k})}^{2}] \leq \sum_{k \in U} (1 / π_{k} - 1) E_{M} [{(u_{r k} γ_{d k} - 0)}^{2}],$

where $E_{M} [{(u_{r k} γ_{d k} - 0)}^{2}] = σ_{r k}^{2} γ_{d k} .$ Replacing the terms $E_{M} [{(u_{r k} γ_{d k} - {\hat{u}}_{(d r) k})}^{2}]$ with $σ_{r k}^{2} γ_{d k}$ in expression (A1.5), the AAV (4.3) is inflated. The approximation ${\hat{u}}_{(d r) k} = 0$ implies that $b_{(d r) k} (π) = c_{(d r) k} (π) = 0.$ Finally, we emphasize that in most cases, the upward is slight, since the ${\hat{u}}_{(d r) k}$ are obtained by the $z_{k}$ variables that generally have a very low predictive power for the $u_{r k} γ_{d k}$ values (see Section 4). In these situations ${\hat{u}}_{(d r) k} ≅ (1 / N) \sum_{k \in U} u_{r k} γ_{d k} ≅ 0 .$ So $E_{M} (u_{r k} γ_{d k} {\hat{u}}_{(d r) k}) ≅ 0$ and $E_{M} {({\hat{u}}_{(d r) k})}^{2} ≅ 0.$

Annexe A4

Proof of expression (4.7)

In this case, each $δ_{k}$ vector has $H - 1$ zero elements and 1 element equal to 1 (corresponding to the planned population to which the unit $k$ belongs). Given the input values, the optimization procedure $π_{k} = π_{h}$ for $k \in U_{h} .$ Under the above assumption, ${[A (π)]}^{- 1}$ is a diagonal matrix with the $h h^{th}$ element given by ${[A_{h h} (π)]}^{- 1} = {[N_{h} π_{h}^{2} (1 / π_{h} - 1)]}^{- 1} .$ Considering ${\tilde{y}}_{r k} = {\bar{Y}}_{r h},$ expressions (A1.2) and (A1.3) can be reformulated as, respectively,

${\hat{\tilde{y}}}_{(d r) k} = π_{h} {δ^{'}}_{k} {[A (π)]}^{- 1} N_{h} π_{h} (1 / π_{h} - 1) {\bar{Y}}_{r h} = {\bar{Y}}_{r h} . (A 4.1)$

${\hat{u}}_{(d r) k} = π_{h} {δ^{'}}_{k} {[A (π)]}^{- 1} π_{h} (1 / π_{h} - 1) \sum_{j \in U} u_{r j} = {(π_{h} N_{h})}^{- 1} \sum_{j \in U_{h}} u_{r j}, (A 4.2)$

but $\sum_{j \in U_{h}} u_{r j} = 0$ as the sum of the residual of a regression model.

Using the formulae (A4.1) and (A4.2), expression (4.5) is given by

$\begin{array}{l} AAV ({\hat{t}}_{(d r)}) & = & [N / (N - H)] \sum_{h} (\frac{1}{π_{h}} - 1) \sum_{k \in U_{h}} E_{M} {(u_{r k} γ_{d k})}^{2} \\ = & [N / (N - H)] \sum_{d = 1}^{D} \sum_{h \in H_{d}} σ_{r h}^{2} N_{h} (N_{h} / n_{h} - 1), \end{array}$

since $π_{h} = n_{h} / N_{h}$ and expression (4.7) may be obtained.

References

Bethel, J. (1989). Sample allocation in multivariate surveys. Survey Methodology, 15, 1, 47-57.

Boyd, S., and Vanderberg, L. (2004). Convex Optimization. Cambridge University Press.

Breidt, F.J., and Chauvet, G. (2011). Improved variance estimation for balanced samples drawn via the cube method. Journal of Statistical Planning and Inference, 141, 479-487.

Chauvet, G., Bonnéry, D. and Deville, J.-C. (2011). Optimal inclusion probabilities for balanced sampling. Journal of Statistical Planning and Inference, 141, 984-994.

Choudhry, G.H., Rao, J.N.K. and Hidiroglou, M.A. (2012). On sample allocation for efficient domain estimation. Survey Methodology, 18, 1, 23-29.

Chromy, J. (1987). Design optimization with multiple objectives. Proceedings of the Survey Research Methods Section, American Statistical Association, 194-199.

Cochran, W.G. (1977). Sampling Techniques. New York: John Wiley & Sons, Inc.

Deville, J.-C., and Tillé, Y. (2004). Efficient balanced sampling: The cube method. Biometrika, 91, 893-912.

Deville, J.-C., and Tillé, Y. (2005). Variance approximation under balanced sampling, Journal of Statistical Planning and Inference, 128, 569-591.

Dykstra R. and Wollan P. (1987). Finding I-projections subject to a finite set of linear inequality constraints, Applied Statistics, 36, 377-383.

Ernst, L.R. (1989). Further applications of linear programming to sampling problems. Proceedings of the Survey Research Methods Section, American Statistical Association, 625-631.

Falorsi, P.D., and Righi, P. (2008). A balanced sampling approach for multi-way stratification designs for small area estimation. Survey Methodology, 34, 2, 223-234.

Falorsi, P.D., Orsini, D. and Righi, P. (2006). Balanced and coordinated sampling designs for small domain estimation. Statistics in Transition, 7, 1173-1198.

Gonzalez, J.M., and Eltinge, J.L. (2010). Optimal survey design: A review. Section on Survey Research Methods – JSM 2010, October.

Isaki, C.T., and Fuller, W.A. (1982). Survey design under a regression superpopulation model. Journal of the American Statistical Association, 77, 89-96.

Khan, M.G.M., Mati, T. and Ahsan, M.J. (2010). An optimal multivariate stratified sampling design using auxiliary information: An integer solution using goal programming approach. Journal of Official Statistics, 26, 695-708.

Kokan, A., and Khan, S. (1967). Optimum allocation in multivariate surveys: An analytical solution. Journal of the Royal Statistical Society, Series B, 29, 115-125.

Lu, W., and Sitter, R.R. (2002). Multi-way stratification by linear programming made practical. Survey Methodology, 28, 2, 199-207.

Nedyalkova, D., and Tillé, Y. (2008). Optimal sampling and estimation strategies under the linear model. Biometrika, 95, 521-537.

Tillé, Y. (2006). Sampling Algorithms. Springer-Verlag, New York.

Tillé, Y., and Favre, A.-C. (2005). Optimal allocation in balanced sampling. Statistics and Probability Letters, 74, 31-37.

Winkler, W.E. (2001). Multi-way survey stratification and sampling. Research Report Series, Statistics #2001-01. Statistical Research Division U.S. Bureau of the Census Washington D.C. 20233.

Date modified:: 2015-11-27

Language selection

Search and menus

Search

Publications

Survey Methodology