4 Proposed methods
Pierre Lavallée and Sébastien Labelle-Blanchet
Previous | Next
The methods proposed in this section for reducing the
variance of the estimates are mainly based on the use of weighted links for the
computation of the estimates of under Indirect Sampling. We will therefore use
estimator (2.9), rather than estimator (2.3). A first set of methods is based
on the use of weighted links that are proportional to some measure of size
for the establishments. The second set of methods uses the optimal solutions
presented in Section 2.2, under different assumptions. Finally, the last set of
methods considers the use of the exact selection probabilities, rather than the
estimation weights obtained from the GWSM, under two sampling scenarios.
4.1 Methods based on
the use of weighted links
Method 1: proportional to
We first propose to reduce the variance (2.11) by
setting proportional to Formally, this can be written as . In business surveys, because
stratification is usually done by size (according to some size measure),
setting proportional to can be viewed as assigning large weights to
links of large establishments, and small weights to small ones, which is a
natural approach.
With this method, we have Because of the many-to-one correspondence
between and we obtain Therefore, from (2.8), we have
(4.1)
Using (4.1), we can rewrite estimator (2.9) as
(4.2)
where
(4.3)
for and It should be noted that if all establishments of a given enterprise belong to the same
stratum say, we have and estimator (4.2) is then equivalent to
estimator (2.1) (and (2.3)).
For computing the variance of we use formula (2.11) with the values (4.3).
For the example of Section 3, we get 439,111, which is a strong
reduction compared to 1,115,111,
but still relatively far from 80,480.
Method 2: proportional to some size measure
We propose to reduce the variance (2.11) by setting proportional to some size measure correlated with the variable of interest We assume that variable is available for all establishments This variable could be used, for instance, to
stratify the sampling frame by size. As for Method 1, setting can be viewed as assigning large weights to
links of large establishments, and small weights to small ones, which again is
a natural approach. With this method, we have
We have because of the many-to-one correspondence
between and Therefore, from (2.8), we have
(4.4)
Using (4.4), we can rewrite (2.9) as
(4.5)
where
(4.6)
for and
To compute the variance of we use formula (2.11) together with (4.6). For
the example of Section 3, the variable corresponds to the number of employees (see
Figure 3.1). The correlation between the revenue and the number of employees is relatively high 92.8%).
We obtain 686,540, which is again a
strong reduction compared to 1,115,111,
but still relatively far from 80,480.
Method 3: proportional to the variable of interest
The third method proposed is to reduce the variance
(2.11) by setting proportional to the variable of interest measured for the establishment belonging to enterprise Obviously, setting assigns large weights to links of large
establishments, and small weights to small ones, which again is a natural
approach. Because is unknown at the beginning of the survey,
this method might not look as being implementable since depends on Now, because of the many-to-one correspondence
between and every quantity entering in are measured through the Indirect Sampling
process.
The proposed method is feasible in this setting and we
have The weights are directly given by (4.4), by replacing by Estimator obtained from (2.9) reduces to
(4.7)
which is nothing else than estimator (3.1) obtained
from the classical sampling theory.
Note that in general, this method requires one set of
weighted links per variable of interest One solution would be to restrict the
determination of the weighted links to few key
variables of interest, each associated with a larger set of correlated
covariates. However, in the present situation, such a restriction is not
necessary because estimator (4.7) corresponds simply to estimator (3.1).
Indeed, at the end, we obtain estimation weights that simply correspond to the
sampling weights.
For computing the variance of (4.7), we simply use
formula (3.2). For the example of Section 3, we obtain 80,480: this is a very
large reduction compared to 1,115,111.
4.2 Methods using
weak-optimal weighted links
Method 4: Using weak-optimal weighted links under stratified SRSWoR
This method uses the weak-optimal weighted links of Deville and Lavallée (2006) described in
Section 2.2. As mentioned earlier, these are obtained by minimising the
variance (2.11) for a very specific choice of variable of interest: for an enterprise of and for all other enterprises of The resulting weak-optimal weighted links do
not involve the variable per se. Writing the values of involves expressions that can be cleverly
expressed in matrix notation. Using summations, the expressions become much
more complicated, because they involve a mixture of the joint selection
probabilities of establishments and that can belong to the same stratum, or not.
Let us define the square matrix of size where Let be the inverse of matrix i.e.,
Let be the square submatrix of containing all elements (establishments) belonging to enterprise Following Deville and Lavallée (2006), we have
Unfortunately, for the present case, the many-to-one
correspondence between and does not help further in obtaining a simpler
form for
Note that if an enterprise contains an establishment in the take-all stratum 1,
we have and the matrix is singular. In this case, the chosen solution
is to set for the take-all establishment of enterprise and for the other establishments of enterprise This means that and for this means that the complete value will be assigned to establishment that contributes 0 to the variance.
We have
(4.8)
Using (4.8), we can rewrite estimator (2.9) as
(4.9)
where
(4.10)
for and
To compute the variance of we use formula (2.11) with the values (4.10).
For the example of Section 3, we get 23,111, which is a
tremendous reduction of variance compared to both 1,115,111 and 80,480.
Method 5: Using weak-optimal weighted links under Poisson Sampling
In the context of business surveys, Poisson Sampling
selects sample by going through the establishments of population and selecting establishment if where The selection probabilities are simply given
by for and the resulting realised stratum sample size
is random. In this context, this sampling
design can also be seen as stratified Bernoulli Sampling (see Särndal, Swensson
and Wretman 1992).
Poisson Sampling (or stratified Bernoulli Sampling) is a
very simple sample design. As it can be noted, the selection of each
establishment of is done independently from one establishment
to another. This means that the joint selection probability of two different establishments and of is simply given by By conditioning on it can be shown that stratified Bernoulli
Sampling corresponds to stratified SRSWoR. The estimator to be used with
stratified Bernoulli Sampling is the ratio estimator
(4.11)
The variance of estimator (4.11) is approximately
given by formula (2.11) (see Brewer and Hanif 1983). Because of the relative
closeness between the two designs, assuming Poisson Sampling can be a
reasonable approach for computing the weak-optimal weighted links
The weak-optimal weighted links are obtained by computing as in Method 4, but assuming that sample
selection is done using Poisson Sampling. This assumption significantly
simplifies the calculations because the matrix then becomes a diagonal matrix, which is easy
to invert. Because of the many-to-one correspondence between and we obtain after the minimisation process
(4.12)
where Therefore, from (2.8), we have
(4.13)
Using (4.13), we can rewrite (2.9) as
(4.14)
where
(4.15)
for and Note that the previous results assume that for all establishments of For the case where for a given establishment of an enterprise we set and for For computing the variance of we use formula (2.11) with the values
given by (4.15).
For the example of Section 3, we get 22,857. Again, this is a
very large reduction of variance compared to both 1,115,111 and 80,480.
Method 6: Using weak-optimal weighted links under Poisson Sampling of
grouped-establishments
This method consists once more in using the weak-optimal
weighted links of Deville and Lavallée (2006) described in Section 2.2, but
with grouped-establishments. As a first step, we build grouped-establishments
in the population where a grouped-establishment consists in all establishments that are part
of the same stratum and that are belonging to the same enterprise This creates a new population containing grouped-establishments. The sample of grouped-establishments contains all
grouped-establishments formed from the establishments of sample The selection probability of the
grouped-establishment is given by
(4.16)
for where is the number of establishments within the
grouped-establishment
The rationale behind the use of grouped-establishments
is to have only one unit per stratum belonging to a given enterprise. Because,
by construction, the grouped-establishments of an enterprise belong to different strata, their selection is
done independently from one grouped-establishment to another. This implies that
the solution to weak optimality is then similar to the one obtained in Section
4.5 for Poisson Sampling, but with grouped-establishments. Therefore, we have
(4.17)
where and is the number of groups-establishments
contained in enterprise
The use of grouped-establishments can be seen as an
intermediate step in the Indirect Sampling process going from population to population That is, the Indirect Sampling process goes
from population to population and then from population to population In the present case, we have for all establishments. Following the rules of
transitivity defined by Deville and Lavallée (2006), we can show that the
weak-optimal weighted links for and (and thus, are given by
(4.18)
Therefore, from (2.8), we have
(4.19)
Using (4.19), we can rewrite (2.9) as
(4.20)
where
(4.21)
for and Note that the previous results assume that for all grouped-establishments of For the case where for a given grouped-establishment of an enterprise we set for the all the establishments of this grouped-establishment and for all other establishments not part of the
grouped-establishment We have when at least one establishment belonging to has For computing the variance of we use formula (2.11) with the
values (4.21).
For the example of Section 3, we get 23,000. Again, this is a
very significant reduction of variance compared to both 1,115,111 and 80,480.
4.3 Other methods
Method 7: Using a designated establishment
As mentioned before, the rationale behind the use of
grouped-establishments in Method 6 is to have only one unit per stratum
belonging to a given enterprise. Using a similar idea, one can decide on a
single establishment that will represent the complete enterprise. That is, for
each enterprise belonging to we identify one establishment of that will be used for the selection of its
owning enterprise. A natural choice for the designated
establishment within the enterprise is the one with the largest value for a
given variable For example, can be the establishment's revenue.
Choosing a designated establishment yields a new
sampling frame that contains the same number of units as the
target population i.e.,
Since there is a one-to-one correspondence
between the designated establishment and its owning enterprise, the designated
establishment of enterprise may also be labelled using The new frame can keep the same stratification definition as
the original frame That is, if the stratification of was done by province and industrial classes
based on the establishments' values, the stratification of is done by the same classes based on the
designated establishments' values.
From the sampling frame we select a sample of designated establishments with stratified
SRSWoR by using sampling fractions equal to the original ones, i.e., for The estimation of the total is obtained using the following estimator:
(4.22)
It can be shown that estimator (4.22) is unbiased,
and its variance is given by
(4.23)
where and
Note that although we only keep one designated
establishment per enterprise, we are still able to produce estimates per
stratum, or for any domain of interest (e.g.,
different industrial activities). For example, let us consider the small
example of Section 3. For the first enterprise of (i.e.,
the one with a total revenue of 2,400), the designated establishment would be
the first establishment of the take-all stratum of (i.e.,
the establishment with 25 employees). None of the three other establishments of
this enterprise would be available for sampling. However, if we were interested
in producing an estimate for the second stratum, we would simply restrict the
computation of the values in (4.22) to the establishments belonging to
this second stratum. In the present case, rather than using 2,400
in (4.22), we would then use 300.
This corresponds to domain estimation (Särndal, et al. 1992).
For the example of Section 3, we obtain 1,820,000! With this
method, since an establishment inherits all revenues of the enterprise, the use
of a designated establishment is advantageous when this establishment is in the
take-all stratum. However, the designated establishment may itself be contained
in a take-some stratum, and this results in a stratum that is even more skewed.
The total revenue of the enterprise, multiplied by the sampling weight, is
assigned to this take-some stratum, and this increases the variance
significantly.
Method 8: Using the selection probabilities of the
enterprises
As mentioned in Lavallée (2002, 2007), using the Rao-Blackwell
theorem, sufficient statistics can improve an existing estimator by producing a
new estimator with a mean squared error that is smaller than or equal to that
of the initial estimator (see Cassel, Särndal and Wretman 1977). Note that this
form of improvement was used, for instance, by Thompson (1990) in the context
of Adaptive Cluster Sampling.
Starting from estimator (2.1) (or (2.3)), the estimator obtained by using the Rao-Blackwell theorem is
given by
(4.24)
where is the probability of having selected
establishment from given that the enterprises of have been selected from
Using the many-to-one correspondence between and an approximation to the probability can be obtained. That is, for we have
(4.25)
where is the selection probability of enterprise which corresponds to the probability of
selecting any of its establishments. Note that result (4.25)
becomes exact in the context of Poisson Sampling. Using (4.25), estimator
(4.24) is then approximately equivalent to the following Horvitz-Thompson
estimator
(4.26)
Since estimator (4.26) is nothing else than a Horvitz-Thompson
estimator based on the selection of enterprises, its variance is given by
(4.27)
The computation of the selection probability requires the knowledge of the selection
probabilities of all establishments of enterprise In general, this can be difficult or even
impossible to obtain (see Lavallée 2002, 2007). This can be a severe barrier
for using estimator (4.26) in practice, and actually, this is one of the
driving reason why using the GWSM. However, in the present case, this reveals
to be possible because the complete frame is available for the selection of the
establishments. The task is also simplified by the use of stratified SRSWoR.
The selection probabilities can then be computed by adapting formula
(4.16). It is also possible to compute the joint selection probabilities but this is more difficult.
For the example of Section 3, we obtain 14,545, and this value
corresponds to the lowest variance of the proposed methods.
Previous | Next