Variance estimation in multi-phase calibration
Section 1. Introduction

Survey statistics makes use of available auxiliary information on known population totals in order to improve survey estimates. A calibration estimator uses calibrated weights which are as close as possible, according to a given distance measure, to the initial sampling design weights, while also satisfying a set of constraints induced by the auxiliary information. Arbitrary sampling designs are allowed at all phases of sampling and the auxiliary information can be used at any phase and is incorporated in the estimation process.

Multi-phase sampling along with calibration to known auxiliary information is a powerful and cost effective technique. The process of calibration has been extensively studied and among the multi-phase designs the special case of two phases was an exception that was elaborately investigated. Rao (1973) and Cochran (1977, chapter 12) provided the basic results for stratification and non-response in two-phase sampling. A detailed framework of the linear weighting approach in two-phase sampling appears in Särndal, Swensson and Wretman (1992, chapter 9). Other estimation procedures were investigated for important sampling designs such as cases when the second-phase sample has been restratified using information gathered from the first-phase sample (Binder, Babyak, Brodeur, Hidiroglou and Jocelyn 2000). The variance estimation has been a main subject of active research using different approaches such as the linearization method as presented in Binder (1996), using jackknife (Kott and Stukel 1997) or other replication procedures (Rao and Shao 1992; Fuller 1998; Kim, Navarro and Fuller 2006). More related to our work, Breidt and Fuller (1993) gave efficient estimation procedures for three-phase sampling in the presence of auxiliary information and Hidiroglou and Särndal (1998) studied the use of auxiliary information for two-phase sampling while allowing a minor modification in the distance function that results with additive calibrating factors (also known as g – factors) rather than multiplicative ones. A common characteristic of these results is the presentation of last phase calibrated weights via calibrated weights of previous phases. This is a major drawback, as it requires computation of weights of all former phases in order to obtain those of later ones and as a consequence makes it difficult to provide a well established methodology of how to estimate the variance of the calibrated estimators in designs with more than two phases.

To address this problem we use the modification of the generalized least squares (GLS) distance function, introduced by Hidiroglou and Särndal (1998), to provide a presentation of the vector of multi-phase calibrated weights which are presented solely through the initial weights based on the sampling design and does not include g – factors. From this presentation we are able to construct multi-phase calibrated estimators that have the form of multi-variate regression estimators which in turn enable to derive a general formula for a consistent estimator for the variance of multi-phase calibrated estimators that holds for any number of phases of calibration. A comparison in the relatively simple case of two phases, where an alternative formula for an estimator for the variance exists in the literature, shows that the two estimators fundamentally differ in form and interpretation. It is important to note that in that specific case the new proposed variance estimator does not show superiority (nor inferiority) in terms of its bias or variance, though it demonstrates some other favorable characteristics which will be discussed in section 3.2. However, the main goal of this paper is not to prove superiority in the two-phase case but to introduce the alternative approach under which the new presentation of the calibrated weights can produce a closed form formula for an estimator for the variance of multi-phase calibrated estimators that holds for any number of phases.

The paper is organized as follows. Section 2 sets up the notation which will be very similar to the one used by Hidiroglou and Särndal (1998). Section 3 provides the methodology and presents the special cases of two-phase and three-phase calibration in subsection 3.2 more elaborately. In Section 4 we present a simulation study to demonstrate some characteristics of the new approach. Finally, in Section 5 we state our concluding remarks and offer some areas for future study.


Date modified: