2. Estimation for multiple sampling frames
Guillaume Chauvet and Guylène Tandeau de Marsac
Previous | Next
A finite population upon which is defined a variable
of interest of value for individual is considered. If a sample is selected from with inclusion probabilities , the estimator proposed by Narain (1951) and
Horvitz and Thompson (1952) is unbiased for total if all probabilities are strictly positive.
We are interested in the scenario where the population is fully covered
by two overlapping sampling frames, and
. We used Lohr’s (2011) notation, namely
the domain covered by only; the domain covered by only; the domain covered both by and . A sample is selected in with inclusion probabilities . For any domain , the sub-total is unbiasedly estimated by with . A sample is selected in with inclusion probabilities . For any domain , the sub-total is unbiasedly estimated by with . The objective is to combine the samples and to get estimation as accurate as possible.
2.1 Hartley estimator
Hartley (1962) proposes the
class of unbiased estimators
with one parameter to be determined. The choice gives samples and the same weight
for the estimation on the intersection domain . Hartley (1962)
proposes choosing the parameter that minimizes the variance of . This leads to
which can be re-expressed as
when the samples and are
independent. As noted by Lohr (2007), the optimal coefficient may not be
between 0 and 1 if a covariance term present in (2.3) is large. To simplify, let us assume that , which is the case if and are used as
strata in the selection of . Then if and only if . When is selected by
simple random sampling, this will be the case, for example, if in the low values
of the variable are
concentrated in the domain .
In practice, the variance and
covariance terms are unknown and must be replaced by estimators, which
introduces additional variability. Another
disadvantage is that the optimal parameter depends on the variable of interest
considered. If optimal estimators are
calculated for different variables of interest, estimations may be internally
inconsistent (Lohr 2011).
2.2 Kalton and Anderson estimator
A more general class of
estimators is obtained by noting that total can be
re-expressed as
with a coefficient specific to the individual . Kalton and Anderson (1986) propose the choice , which leads to the estimator
with on one hand if and if , and on the other hand if and if
. The estimation weights are the same regardless of the
variable of interest, which guarantees internal consistency of the estimations;
on the other hand, the Kalton and Anderson estimator is less effective than
Hartley’s optimal estimator for a given variable of interest. Note that it is a Hansen-Hurwitz (1943) type
estimator, which can be re-expressed as
noting the number of
times when unit is selected in
the pooled sample . In particular
this gives .
2.3 Bankier estimator
Bankier (1986) proposes using
a Horvitz-Thompson type estimator, calculating the inclusion probabilities in
the pooled sample.
If the samples
and are
independent, we get and the
estimator
Previous | Next