3 The problem with skewed populations
Pierre Lavallée and Sébastien Labelle-Blanchet
Previous | Next
As mentioned in the introduction, the application of the
GWSM to business surveys can produce estimates with large variances. This lack
of precision is due to the skewness of the population. We propose to illustrate
the problem with a small example given in Figure 3.1.
We want to study the revenue of the population of Figure 3.1 containing 3
enterprises, where enterprise 1 contains 4
establishments, enterprise 2 contains 4
establishments, and enterprise 3 contains 3
establishments. As it can be observed from Figure 3.1, the revenue of the 11
establishments can be considered as a skewed population.
For the survey, we build a frame containing the 11
establishments, and we decide to stratify the establishments according to three
size strata: stratum 1
contains the establishments with 2
contains those with and 3
those with (in practice, such a stratification is not
possible since the stratification variable is the same as the variable of interest, and
instead, we would use some size variable highly correlated with the variable of
interest In stratum 1,
we use a sampling fraction of 1 (i.e.,
for 2,
the sample size is 1 (i.e., and for 3,
the sample size is 2 (i.e.,
There are possible samples that can be selected from for estimating the true total 3,800.
For each of these 45 samples, we computed using (2.1). The estimates are presented in
the left box plot of Figure 3.2.
Data table for Figure 3.1
We also computed estimates of assuming the use of stratified SRSWoR without Indirect Sampling. That is, in
each stratum we select a sample of establishments using SRSWoR and we measure
only the variable of interest for the establishments of directly linked to the sampled establishments of Thus, we measure the variable of interest for the sampled establishments of Unlike Indirect Sampling, we do not measure
the variables of interest of the other establishments of the enterprises
containing the initially sampled establishments. This corresponds to the
classical sampling theory. Thus, we estimated using
(3.1)
It can be proved that estimator (3.1) is unbiased,
and its variance is given by
(3.2)
where and The estimates are presented in the right box
plot of Figure 3.2.
Data table for Figure 3.2
As we can see from Figure 3.2, the estimates obtained
from Indirect Sampling (and the GWSM) are quite variable from one sample to the
next. If we do not use Indirect Sampling (i.e.,
we use the classical approach), the variability is much less. This result can
be seen directly from the variances of and Using formulas (2.7) and (3.2), we obtain the
variance 80,480,
while 1,115,111!
The next section presents methods designed to reduce the
variability of the estimates produced using Indirect Sampling.
Previous | Next