1. Introduction

Piero Demetrio Falorsi and Paolo Righi

Previous | Next

Surveys conducted in the context official statistics commonly produce a large number of estimates relating to both different parameters of interest and highly detailed estimation domains. When the domain indicator variables are available for each sampling unit in the sampling frame, the survey sampling designer could attempt to select a sample in which the size for each domain is fixed. Thus, direct estimates can be obtained for each domain and sampling errors at the domain level would be controlled. We hereby present a unified and general framework for defining the optimal inclusion probabilities for uni-stage sampling designs when the domain membership variables are known at the design stage. This case may be the most recurrent scenario in establishment surveys and other survey contexts, such as agricultural surveys or social surveys if the domains are geographical (e.g., type of municipality, region, province, etc.). The growing development of data integration among administrative registers and survey frames may also increase the applicability of the approach presented herein in social surveys too. The proposal may be useful for planning an optimal second phase survey if, during the first phase, the domain membership variables have been collected.

The problem of defining optimal sampling designs has been addressed in some recent papers. Gonzalez and Eltinge (2010) present an interesting overview of the approaches for defining optimal sampling strategies. The optimization problem is usually dealt with in stratified sampling designs with a fixed sample size in each stratum. The optimal allocation in stratified samplings for a univariate population is well-known in sampling literature (Cochran 1977). In multivariate cases, where more than one characteristic is to be measured on each sampled unit, the optimal allocation for individual characteristics is of little practical use unless the various characteristics under study are highly correlated. This is because an allocation which is optimal for one characteristic is generally far from being optimal for others. The multidimensionality of the problem leads to definition of a compromise allocation method (Khan, Mati and Ahsan 2010) with a loss of precision compared to the individual optimal allocations. Several authors have discussed various criteria for obtaining a feasible compromise allocation - see e.g., Kokan and Khan (1967), Chromy (1987), Bethel (1989), Falorsi and Righi (2008), Falorsi, Orsini and Righi (2006) and Choudhry, Rao and Hidiroglou (2012).

Recently, some papers have focused on finding optimal inclusion probabilities in balanced sampling (Tillé and Favre 2005; Chauvet, Bonnéry and Deville 2011), a general class of sampling designs that includes stratified sampling designs as special cases. In particular, Chauvet et al. (2011) propose the adoption of the fixed point algorithm for defining the optimal inclusion probabilities. Nevertheless, the above mentioned papers do not address the case in which the balancing variables depend on the inclusion probabilities and present only a partial solution to the problem related to the fact that the sampling variance is an implicit function of the inclusion probabilities. Choudhry et al. (2012) propose an optimal allocation algorithm for domain estimates in stratified sampling (if the estimation domains do not cut across the strata). Their algorithm represents a special case of the approach proposed herein. The methodological setting illustrated here is a substantial improvement with respect to the earlier version of the methodology described in Falorsi and Righi (2008) which only accounted for the case in which the values of the variables of interest were known and the measure of accuracy was expressed by the design variance; furthermore, the previous version did not consider the fact that the design variance, bounded in the optimization problem, is an implicit function of the inclusion probabilities. This paper studies the more realistic case in which the variables of interest are not known and must be estimated. Moreover, it explicitly deals with the problem that the anticipated variances are implicit functions of the inclusion probabilities. The new optimization algorithm can be easily performed because it is based on a general decomposition of the measure of accuracy. A general sampling design which includes most of the one-stage sampling designs adopted in actual surveys is proposed, e.g., Simple Random Sampling Without Replacement (SRSWOR), Stratified SRSWOR, Stratified PPS, Designs with incomplete stratification, etc. The framework is based on a joint use of balanced sampling designs (Deville and Tillé 2004) which, depending upon the different definitions of the balancing equations, represents a wide-ranging sampling design and superpopulation models for predicting the unknown values of the variables of interest. The paper is structured as follows. Section 2 introduces definitions and notations. Section 3 and Section 4 illustrate the sampling design and the Anticipated Variance. The algorithm for defining the optimal inclusion probabilities is described in Section 5. In Section 6, some experiments based on real business data show the empirical properties of the algorithm. The conclusions are given in Section 7.

Previous | Next

Date modified: