7. Conclusions
Piero Demetrio Falorsi and Paolo Righi
Previous
The
paper proposes a new approach for defining the optimal inclusion probabilities
in various survey contexts, which are characterized by the need to disseminate
survey estimates of prefixed accuracy, for a multiplicity of both variables and
domains of interest.
This
paper’s main contribution is the practical computation of these probabilities
by means of a new algorithm, which is suitable for a general multi-way sampling
design in which the standard stratified sampling represents a special case. The
proposed approach, the algorithm and the final computation are domain- and
variable-driven.
In our
framework, the domain membership indicator variables are assumed to be known,
while the variables of interest are not known. The procedure is, then, applied on
the predicted values of the characteristics of interest via a superpopulation
model, and the algorithm enables taking into account model uncertainty; this reflects
the non-knowledge of the values of variables of interest. Using the Anticipated
Variance as the measure of the estimators’ precision, this approach overcomes
the limits of the standard algorithms for the sample allocation, in which the
variables of interest driving the solution are assumed to be known.
The
proposed algorithm exploits standard procedure, but does present some
computational innovations which may be useful for dealing with the complexity
deriving from the fact that the Anticipated Variances are implicit functions of
the inclusion probabilities. The algorithm was tested on simulated and real
survey data, to evaluate its performance and properties. The results of a small
set of experiments are presented here. They confirm an improvement, in terms of
efficiency, of the sampling strategy. A natural generalization of the case examined
here may be developed by considering, as known during the design planning
stage, the indicators of the domains and other quantitative independent
variables. We note that the Anticipated Variance considering only the domain
indicators is larger than the Anticipated Variance of this more general case. Thus,
our solution represents an upper (and somehow robust) boundary solution in the
design phase. Furthermore, the algorithmic solution can be easily adapted to
this more general situation.
Acknowledgements
This
research was funded by the partnership of the Global Strategy to improve
Agricultural and Rural Statistics:
http://www.fao.org/economic/ess/ess-capacity/ess-strategy/en/.
Appendix
Appendix A1
AV of the HT estimator
Let us consider
the residual as expressed by equation (3.5),
and replace the term with thus obtaining
The weighted least
predictions of and with predictors and weights are
and
with
Using the formulae
(A1.2) and (A1.3), the expression (A1.1) may be reformulated as
Therefore, the model expectation
of is
because Furthermore,
where and with
and
Expression
(4.5) is easily derived by plugging expressions from (A1.2) to (A1.8) into
equation (4.3).
Appendix A2
Convergence of the algorithm
The optimization
problem (5.1) is solved by two nested fixed
point iterations. Given an unknown vector of dimension the fixed point iteration chooses
an initial guess Then, it computes subsequent
iterates by with with being a system of updating equations. The
multivariate function has a fixed point in a domain if maps in Let be the Jacobian matrix of first
partial derivate of evaluated at if there exists a constant such
that, in some natural matrix norm, has a unique fixed point
and the fixed point iteration is
guaranteed to converge to
for any initial guess chosen in As regards the proposed
algorithm, the convergence of the IL and OL is obtained when the terms converge to the fixed point. This
means that the vectors and
do not change in the OL and IL
iterations. The
demonstration below considers the method proposed by Chromy (1987) to solve the
LCSP of system (5.7), and makes use of some reasonable assumptions: (1)
(2) (3) (4) with (5) Assumption (1) corresponds to the upward approximation of the
Anticipated Variance, given in Remark 4.1, and implies that Assumption (3) implies that Assumption (4) states that the
structure of the inclusion probabilities remains roughly constant in the
different IL iterations. The assumption becomes reasonable considering that the
updating equation A2.2 below (of a given inclusion probability) is essentially
determined by the variance threshold that requires the largest sample size. It
is plausible to hypothesize that this threshold remains more or less the same
in the subsequent IL iterations of a given OL.
Proof of convergence of the Inner Loop. By reformulating expression (4.6) in accordance with the assumptions from (1) to
(4),
Considering in
problem (5.7) that the values are fixed, each value of
the vector is obtained as a solution of the
LCSP with the Chromy algorithm. Denote with the iteration of the Chromy
algorithm into which it converges, where Then, the IL updates the generic
probability in accordance with the expression
where the
right-hand term represents the updating formula of the Chromy algorithm, and stands for and is the generalized Lagrange
multiplier, where
and
The
Kuhn-Tucker theory states that therefore, and
iff
Chromy asserts that few are larger than zero, and that in
most cases, only one value is strictly positive. Denoting with we define as the system of updating equations where the
generic equation of the system
is obtained
by plugging expression (A2.2) into (A2.1). If the convergence is obtained, then
in the last iteration,
The function of equation (A2.4)
is continuous and differentiable. Moreover, it maps onto the interval of the
possible values of
Then, the IL converges if the
following condition is fulfilled:
The Jacobian
matrix is positive semi-defined, and a well-known result states that By considering the Frobenius norm it is Thus we can take into account the
trace of the Jacobian matrix to verify condition (A2.5). Let be the element of the diagonal of Using the Kuhn-Tucker condition
Since many (Chromy 1987), the respective is null. When then
Therefore,
the should be less than 1.
Proof of convergence of the Outer Loop. Let be the fixed point solution of
the IL; then, the OL updates the vector with Under conditions (1), (2) and
(3),
Plugging
expression (A2.2) into formula (A2.6) when the IL converges, the system of updating equations of is given by where the generic equation of is
Denoting with the system j may be expressed in a recursive form
with as the system of updating equations of with respect to the previous
values of the OL, To demonstrate the convergence of
OL, it is necessary to demonstrate that the Jacobian norm is lower than 1. Using standard
results of matrix algebra,
in which the
generic norm is lesser than 1 (see the IL
proof of convergence). Let be the element on the diagonal of It is
Therefore, we have
The following inequality holds
Consequently,
the norm
and therefore the OL converges.
Annexe A3
Proof that the approximation of Remark 4.1 is upward
Since is the weighted least square
prediction of by using a different value of the
such as we obtain
where Replacing the terms with in expression (A1.5), the AAV
(4.3) is inflated. The
approximation implies
that Finally,
we emphasize that in most cases, the upward is slight, since the are obtained by the variables that generally have a
very low predictive power for the
values (see Section 4). In these
situations
So and
Annexe A4
Proof of expression (4.7)
In this case, each
vector has zero elements and 1 element equal
to 1 (corresponding to the planned population to which the unit belongs). Given the input values,
the optimization procedure for Under
the above assumption, is a diagonal matrix with the element given by Considering
expressions (A1.2) and (A1.3) can
be reformulated as, respectively,
but
as the sum of the residual of a
regression model.
Using the formulae
(A4.1) and (A4.2), expression (4.5) is given by
since
and expression (4.7) may be
obtained.
References
Bethel, J. (1989). Sample allocation in multivariate surveys. Survey Methodology, 15, 1, 47-57.
Boyd, S., and Vanderberg, L. (2004). Convex
Optimization. Cambridge University Press.
Breidt, F.J., and Chauvet, G. (2011). Improved variance estimation
for balanced samples drawn via the cube method. Journal of Statistical Planning and Inference, 141, 479-487.
Chauvet, G., Bonnéry, D. and Deville, J.-C. (2011). Optimal
inclusion probabilities for balanced sampling. Journal of Statistical Planning and Inference, 141, 984-994.
Choudhry, G.H., Rao, J.N.K. and Hidiroglou, M.A. (2012).
On sample allocation for efficient domain estimation. Survey Methodology, 18, 1, 23-29.
Chromy, J. (1987). Design optimization with multiple objectives. Proceedings of the Survey Research
Methods Section, American Statistical Association, 194-199.
Cochran, W.G. (1977). Sampling Techniques. New York: John Wiley & Sons, Inc.
Deville, J.-C., and Tillé, Y. (2004). Efficient balanced
sampling: The cube method. Biometrika,
91, 893-912.
Deville,
J.-C., and Tillé, Y. (2005). Variance approximation under balanced
sampling, Journal of Statistical Planning
and Inference, 128, 569-591.
Dykstra R. and Wollan P. (1987). Finding I-projections subject to a finite set of linear inequality constraints, Applied Statistics, 36, 377-383.
Ernst, L.R. (1989). Further applications of linear programming
to sampling problems. Proceedings of the
Survey Research Methods Section, American Statistical Association, 625-631.
Falorsi,
P.D., and Righi, P. (2008). A balanced sampling approach for multi-way stratification
designs for small area estimation. Survey
Methodology, 34, 2, 223-234.
Falorsi, P.D., Orsini, D. and Righi, P. (2006). Balanced
and coordinated sampling designs for small domain estimation. Statistics in Transition, 7, 1173-1198.
Gonzalez, J.M., and Eltinge, J.L. (2010). Optimal survey design: A review. Section on Survey Research Methods – JSM 2010, October.
Isaki, C.T., and Fuller, W.A. (1982). Survey design
under a regression superpopulation model. Journal
of the American Statistical Association, 77, 89-96.
Khan, M.G.M., Mati, T. and Ahsan, M.J. (2010). An
optimal multivariate stratified sampling design using auxiliary information: An
integer solution using goal programming approach. Journal of Official Statistics, 26, 695-708.
Kokan, A., and Khan, S. (1967). Optimum allocation in
multivariate surveys: An analytical solution. Journal of the Royal Statistical Society, Series B, 29, 115-125.
Lu, W., and Sitter, R.R. (2002). Multi-way stratification
by linear programming made practical. Survey
Methodology, 28, 2, 199-207.
Nedyalkova,
D., and Tillé, Y. (2008). Optimal sampling and estimation strategies
under the linear model. Biometrika,
95, 521-537.
Tillé, Y. (2006). Sampling Algorithms. Springer-Verlag, New York.
Tillé, Y., and Favre, A.-C. (2005). Optimal allocation
in balanced sampling. Statistics and Probability
Letters, 74, 31-37.
Winkler, W.E. (2001). Multi-way survey stratification
and sampling. Research Report Series,
Statistics #2001-01. Statistical Research Division U.S. Bureau of the Census
Washington D.C. 20233.
Previous