On combining independent probability samples
Section 2. Combining separate estimates
We assume that we have
estimators,
of a population total
resulting from
independent samples from the same population.
Our options greatly depend on what information is available. If we have
estimates and corresponding variance estimates, then a linear combination based
on weights calculated from estimated variances may be an interesting option. We
could also weight the estimators with respect to sample size, if available, but
that is known to be far from optimal in some situations. We recall the theory
for an optimal linear combination of independent unbiased estimators. The
linear combination of
with the smallest variance is
where
are positive weights that sum to 1. The variance of
is
It is common that variance estimates are used in place
of the unknown variances when calculating the
-weights, see Cochran and Carroll (1953) and Cochran (1954). If the variance
estimators are consistent, that approach will asymptotically provide the
optimal weighting. Moreover, under the assumption that the variance estimators
are independent of the estimators
the resulting estimator
is unbiased and
its variance depends only on the variance of
and the MSEs of the
see Rubin and Weisberg (1974). However, as we soon will
illustrate, the assumption of independence is likely to be violated in many
sampling applications. In case of positive correlations between the estimators
and their variance estimators, we will on average put more weight on small
estimates because they tend to have smaller estimated variances. Thus the
combined estimator (using weights based on estimated variances) will be
negatively biased and the negative bias can increase as the number of
independent surveys we combine increases, see Example 1. The opposite holds as
well, in case of negative correlation, but that is likely a rarer situation in
sampling applications.
Example 1: A very simplistic example that illustrates that the bias can increase
as the number of independent surveys we combine increase. Let the unbiased
estimator
for one sample take
the values 1 or 2 with equal probabilities and let the variance estimator take
values
times the estimator (perfectly correlated) and let
it be unbiased
Clearly the expected
value of
is 1.5. Next, we
consider the linear combination of two independent estimators
of the same type as
using estimated
variances. The pair
has the following four
possible outcomes (1,1), (1,2), (2,1), (2,2), each with probability 1/4. The
corresponding outcomes for the linear combination
with estimated
variances are 1, 4/3, 4/3, 2 with expectation
It is negatively
biased. If a third independent estimator of the same type is added we have the
eight outcomes (1,1,1), (1,1,2), (1,2,1), (1,2,2), (2,1,1), (2,1,2), (2,2,1), (2,2,2), each with equal probability 1/8. The
corresponding outcomes for
are 1, 6/5, 6/5, 3/2,
6/5, 3/2, 3/2, 2, with expectation
It is even more
negatively biased, and the bias continues to grow as more independent
estimators of the same type are added in the combination.
2.1 Why positive correlation between estimator and
variance estimator is common in sampling applications
The issue of positive correlation between the estimator
of a total and its variance estimator has previously been noticed by e.g., Gregoire and Schabenberger (1999) when
sampling skewed biological populations, but we show that a high correlation may
appear in more general sampling applications. Assume that the target variable
is non-negative and that
for exactly
units. The proportion of non-zero (positive)
is denoted by
This is a very common situation in sampling
and we get such a target variable if we estimate a domain total
outside of the domain) or if only a subset of
the population has the property of interest.
The design-based unbiased Horvitz-Thompson (HT)
estimator is given by
where
denotes the random set of sampled units and
Under fixed size designs the variance of
is
where
is the second order inclusion probability. The
corresponding variance estimator is
Provided that all
are strictly positive, it follows that the
variance estimator is an unbiased estimator of
The number of non-zero
in
(and hence in
is here denoted by
and it will usually be a random number. It can
be shown that the number of non-zero elements in
is approximately proportional to
if
is small, which indicates that there might be
a strong correlation between
and
in general if
is small. To show that the number of non-zero
terms in
is approximately proportional to
we look at three cases, where the third case
is the most general.
Case 1: Assume that all the non-zero
are
different, i.e.,
for
and
for all
The
double sum in
then
contains
non-zero
terms of the form
where
is equal to
or
and
There are
non-zero terms of the form
where
In total the number of non-zero terms is
If
is fairly large and
is small, then
and roughly we have
The number of non-zero terms is approximately
proportional to
Case 2: Assume that all the non-zero
are equal, e.g.,
is an indicator variable and
and
for all
Then the double sum in
contain
non-zero terms of the form
where
is equal to
or
and
If
is fairly large and
is small, then
and roughly we have
Thus, the number of non-zero terms is still
approximately proportional to
Case 3: If some of the non-zero
are equal and the rest are different, then the
number of non-zero terms will be between
(case 2) and
(case 1). Thus, the number of non-zero
terms in
is always approximately proportional to
if
is small.
If
for all
then all non-zero terms are positive. This
condition holds e.g., for simple random sampling (SRS) and high entropy unequal
probability designs such as Conditional Poisson, Sampford and Pareto. More
discussion about entropy of sampling designs can be found in e.g., Grafström (2010). The average size of the positive terms in
or
is not likely to depend much on
Thus, if
contains
positive terms, and
contains a number of positive terms that is
proportional to
their sizes are mainly determined by
A high relative variance in
can cause a high correlation between
and
see Example 2.
Commonly used designs can produce a high relative
variance for
If we do simple random sampling without
replacement we get
and
which means that we need a large
or a large sample fraction
in order to achieve a small relative variance
for
In many applications we will have a rather
small
and a small sampling fraction
and, thus, for many designs (that do not use
prior information which can explain to some extent if
or not) there will be a high relative variance
for
To illustrate the magnitude of the resulting
correlation between the estimator and its variance estimator an example for
simple random sampling without replacement follows.
Example 2: For this example we first
simulate a population of size
where
i.e.,
The 100 non-zero
values are
simulated from
with
and
We select samples of
size
with simple random
sampling, so
and
for
The observed
correlation between
and
was 0.974 for
samples, see
Figure 2.1 for the first 1,000 observations of
If we increase
to 0.3, the
correlation is still above 0.9. The results remain unchanged if the ratio
remains unchanged,
e.g., we get the same correlations if
and
Now,
assume we have access to more than one sample for the estimation of
As previously noted, with high positive
correlations between the estimators and their corresponding variance estimators
there is a risk of severe bias if we use a linear combination with estimated
variances. The interest of using combined information may be the largest for
small domains or rare properties, in which case the problem of high correlation
is the most likely. Next, we turn to alternative options for using combined
information from multiple samples.

Description for Figure 2.1
Scatter plot showing the relationship between the Horvitz-Thompson estimator and its variance estimator for a variable with 90% of zeros, for the first 1,000 observations. The variance estimate is on the y-axis, ranging from 20,000 to 70,000. The estimate is on the x-axis, ranging from 400 to 1,800. The observed correlation of 0.974 between the estimation and its variance is translated by a strong positive linear relationship between x and y.