4. Anticipated variance
Piero Demetrio Falorsi and Paolo Righi
Previous | Next
Prior to sampling,
the
values are not known and the
variance expressed in formula (3.4) cannot be used for planning the sampling
precision at the design phase. In practice, it is necessary to either obtain
some proxy values or predict the
values based on superpopulation
models that exploit auxiliary information. The increasing availability of
auxiliary information (deriving by integration of administrative registers and
survey frames) facilitates the use of predictions. Under a model-based inference, the
values are assumed to be the realization of a
superpopulation model
The model we study has the following form:
where
is a vector of predictors (available
in the sampling frame),
is a vector of regression
coefficients and
is
a known function,
is the error term and
denotes the expectation under the
model. The parameters
and the variances
are assumed to be known, although
in practice they are usually estimated. The model (4.1) is variable-specific
and different models for different variables may be used and this does not
create additional difficulty. As a measure of uncertainty, we consider the Anticipated Variance (AV) (Isaki and
Fuller 1982):
A general
expression for the
under linear models was derived
by Nedyalkova and Tillé (2008). Their formulation is obtained by considering a
linear function
and a unique set of auxiliary
variables,
used for both the prediction of
the
values and for balancing the
sample. In our context, we have introduced
and
highlighting that the auxiliary
variables can be different for prediction and balancing. The variables
must be as predictive of
as possible, while the variables
play an instrumental role in
controlling the sample sizes for sub-populations.
In the context
considered here, inserting the approximate variance (3.4) in the equation
(4.2), we obtain the approximate expression of the
where the
terms
in (3.4) are replaced by
By defining
the equation
(4.3) may be reformulated as
where the
third variance component of
is
and
and
are real numbers defined
respectively in equations (A1.4), (A1.7) and (A1.8) of Appendix A1.
Remark 4.1. Expression (4.5) is a cumbersome formula but, for all practical
purposes, calculations may be simplified by considering a slight upward approximation by
setting
in (4.6). The proof is given in
Appendix A3. An upward approximation is a safe choice in this setting, since it
averts from the risk of defining an insufficient sample size for the expected
accuracy.
Remark 4.2. The SSRSWOR design is obtained if the planned domains define a unique
partition of population (Option 1 of the example in Section 2) and the model
(4.1) is specified so that the predicted values are:
with
(for
The
becomes
where
is the
set of planned domains included in
(see
Appendix A4). Note that the expression (4.7) agrees with the Result 2 of Nedyalkova and Tillé (2008),
but for the term
If
the expression (4.7) would
approximate the variance of the HT estimate in the SSRSWOR design. The above
approximation is proved true when the number of domains
remains small compared to the
overall population size
and when the domain sizes
are large.
Previous | Next