On combining independent probability samples
Section 2. Combining separate estimates

Table of contents

We assume that we have $k \geq 2$ estimators, ${\hat{Y}}_{1}, {\hat{Y}}_{2}, \dots, {\hat{Y}}_{k}$ of a population total $Y,$ resulting from $k$ independent samples from the same population. Our options greatly depend on what information is available. If we have estimates and corresponding variance estimates, then a linear combination based on weights calculated from estimated variances may be an interesting option. We could also weight the estimators with respect to sample size, if available, but that is known to be far from optimal in some situations. We recall the theory for an optimal linear combination of independent unbiased estimators. The linear combination of ${\hat{Y}}_{1}, {\hat{Y}}_{2}, \dots, {\hat{Y}}_{k}$ with the smallest variance is

${\hat{Y}}_{L} = α_{1} {\hat{Y}}_{1} + α_{2} {\hat{Y}}_{2} + ... + α_{k} {\hat{Y}}_{k},$

where

$α_{i} = \frac{1 / V_{i} ({\hat{Y}}_{i})}{\sum_{j =1}^{k} 1 / V_{j} ({\hat{Y}}_{j})}$

are positive weights that sum to 1. The variance of ${\hat{Y}}_{L}$ is

$V ({\hat{Y}}_{L}) = \frac{1}{\sum_{j =1}^{k} 1 / V_{j} ({\hat{Y}}_{j})} .$

It is common that variance estimates are used in place of the unknown variances when calculating the $α$ -weights, see Cochran and Carroll (1953) and Cochran (1954). If the variance estimators are consistent, that approach will asymptotically provide the optimal weighting. Moreover, under the assumption that the variance estimators are independent of the estimators ${\hat{Y}}_{1}, {\hat{Y}}_{2}, \dots, {\hat{Y}}_{k},$ the resulting estimator

${\hat{Y}}_{L}^{*} = {\hat{α}}_{1} {\hat{Y}}_{1} + {\hat{α}}_{2} {\hat{Y}}_{2} + ... + {\hat{α}}_{k} {\hat{Y}}_{k},$

is unbiased and its variance depends only on the variance of ${\hat{Y}}_{L}$ and the MSEs of the ${\hat{α}}_{i} ’ s,$ see Rubin and Weisberg (1974). However, as we soon will illustrate, the assumption of independence is likely to be violated in many sampling applications. In case of positive correlations between the estimators and their variance estimators, we will on average put more weight on small estimates because they tend to have smaller estimated variances. Thus the combined estimator (using weights based on estimated variances) will be negatively biased and the negative bias can increase as the number of independent surveys we combine increases, see Example 1. The opposite holds as well, in case of negative correlation, but that is likely a rarer situation in sampling applications.

Example 1: A very simplistic example that illustrates that the bias can increase as the number of independent surveys we combine increase. Let the unbiased estimator $\hat{Y}$ for one sample take the values 1 or 2 with equal probabilities and let the variance estimator take values $c$ times the estimator (perfectly correlated) and let it be unbiased $(c = 1 / 6) .$ Clearly the expected value of $\hat{Y}$ is 1.5. Next, we consider the linear combination of two independent estimators $({\hat{Y}}_{1}, {\hat{Y}}_{2})$ of the same type as $\hat{Y}$ using estimated variances. The pair $({\hat{Y}}_{1}, {\hat{Y}}_{2})$ has the following four possible outcomes (1,1), (1,2), (2,1), (2,2), each with probability 1/4. The corresponding outcomes for the linear combination ${\hat{Y}}_{L}^{*}$ with estimated variances are 1, 4/3, 4/3, 2 with expectation $17 / 12 \approx 1.4167.$ It is negatively biased. If a third independent estimator of the same type is added we have the eight outcomes (1,1,1), (1,1,2), (1,2,1), (1,2,2), (2,1,1), (2,1,2), (2,2,1), (2,2,2), each with equal probability 1/8. The corresponding outcomes for ${\hat{Y}}_{L}^{*}$ are 1, 6/5, 6/5, 3/2, 6/5, 3/2, 3/2, 2, with expectation $111 / 80 =1.3875.$ It is even more negatively biased, and the bias continues to grow as more independent estimators of the same type are added in the combination.

2.1 Why positive correlation between estimator and variance estimator is common in sampling applications

The issue of positive correlation between the estimator of a total and its variance estimator has previously been noticed by e.g., Gregoire and Schabenberger (1999) when sampling skewed biological populations, but we show that a high correlation may appear in more general sampling applications. Assume that the target variable is non-negative and that $y_{i} >0$ for exactly $N^{'}$ units. The proportion of non-zero (positive) $y_{i} ’ s$ is denoted by $p = N^{'} / N .$ This is a very common situation in sampling and we get such a target variable if we estimate a domain total $(y_{i} =0$ outside of the domain) or if only a subset of the population has the property of interest.

The design-based unbiased Horvitz-Thompson (HT) estimator is given by

$\hat{Y} = \sum_{i \in S} \frac{y_{i}}{π_{i}},$

where $S$ denotes the random set of sampled units and $π_{i} = \Pr (i \in S) .$ Under fixed size designs the variance of $\hat{Y}$ is

$V (\hat{Y}) = - \frac{1}{2} \sum_{i =1}^{N} \sum_{j =1}^{N} (π_{i j} - π_{i} π_{j}) {(\frac{y_{i}}{π_{i}} - \frac{y_{j}}{π_{j}})}^{2},$

where $π_{i j} = \Pr (i \in S, j \in S)$ is the second order inclusion probability. The corresponding variance estimator is

$\hat{V} (\hat{Y}) = - \frac{1}{2} \sum_{i \in S} \sum_{j \in S} \frac{π_{i j} - π_{i} π_{j}}{π_{i j}} {(\frac{y_{i}}{π_{i}} - \frac{y_{j}}{π_{j}})}^{2} .$

Provided that all $π_{i j}$ are strictly positive, it follows that the variance estimator is an unbiased estimator of $V (\hat{Y}) .$

The number of non-zero $y_{i} ’ s$ in $S$ (and hence in $\hat{Y})$ is here denoted by $n^{'}$ and it will usually be a random number. It can be shown that the number of non-zero elements in $\hat{V} (\hat{Y})$ is approximately proportional to $n^{'}$ if $p$ is small, which indicates that there might be a strong correlation between $\hat{Y}$ and $\hat{V} (\hat{Y})$ in general if $p$ is small. To show that the number of non-zero terms in $\hat{V} (\hat{Y})$ is approximately proportional to $n^{'}$ we look at three cases, where the third case is the most general.

Case 1: Assume that all the non-zero $y_{i} / π_{i} ’ s$ are different, i.e., $y_{i} / π_{i} \neq y_{j} / π_{j}$ for $i \neq j,$ and $π_{i j} \neq π_{i} π_{j}$ for all $i, j .$ The double sum in $\hat{V} (\hat{Y})$ then contains $2 n^{'} (n - n^{'})$ non-zero terms of the form

$\frac{π_{i j} - π_{i} π_{j}}{π_{i j}} {(\frac{y_{k}}{π_{k}})}^{2},$

where $k$ is equal to $i$ or $j$ and $i \neq j .$ There are $n^{'} (n^{'} - 1)$ non-zero terms of the form

$\frac{π_{i j} - π_{i} π_{j}}{π_{i j}} {(\frac{y_{i}}{π_{i}} - \frac{y_{j}}{π_{j}})}^{2},$

where $i \neq j .$ In total the number of non-zero terms is $n^{'} (2 n - n^{'} - 1) .$ If $n$ is fairly large and $p$ is small, then $n^{'} < < n$ and roughly we have $n^{'} (2 n - n^{'} - 1) \approx 2 n^{'} n .$ The number of non-zero terms is approximately proportional to $n^{'} .$

Case 2: Assume that all the non-zero $y_{i} / π_{i} ’ s$ are equal, e.g., $y_{i}$ is an indicator variable and $π_{i} = n / N,$ and $π_{i j} \neq π_{i} π_{j}$ for all $i, j .$ Then the double sum in $\hat{V} (\hat{Y})$ contain $2 n^{'} (n - n^{'})$ non-zero terms of the form

$\frac{π_{i j} - π_{i} π_{j}}{π_{i j}} {(\frac{y_{k}}{π_{k}})}^{2},$

where $k$ is equal to $i$ or $j$ and $i \neq j .$ If $n$ is fairly large and $p$ is small, then $n^{'} < < n$ and roughly we have $2 n^{'} (n - n^{'}) \approx 2 n^{'} n .$ Thus, the number of non-zero terms is still approximately proportional to $n^{'} .$

Case 3: If some of the non-zero $y_{i} / π_{i} ’ s$ are equal and the rest are different, then the number of non-zero terms will be between $2 n^{'} (n - n^{'})$ (case 2) and $n^{'} (2 n - n^{'} - 1)$ (case 1). Thus, the number of non-zero terms in $\hat{V} (\hat{Y})$ is always approximately proportional to $n^{'}$ if $p$ is small.

If $π_{i j} \leq π_{i} π_{j}$ for all $i \neq j,$ then all non-zero terms are positive. This condition holds e.g., for simple random sampling (SRS) and high entropy unequal probability designs such as Conditional Poisson, Sampford and Pareto. More discussion about entropy of sampling designs can be found in e.g., Grafström (2010). The average size of the positive terms in $\hat{V} (\hat{Y}),$ or $\hat{Y},$ is not likely to depend much on $n^{'} .$ Thus, if $\hat{Y}$ contains $n^{'}$ positive terms, and $\hat{V} (\hat{Y})$ contains a number of positive terms that is proportional to $n^{'},$ their sizes are mainly determined by $n^{'} .$ A high relative variance in $n^{'}$ can cause a high correlation between $\hat{Y}$ and $\hat{V} (\hat{Y}),$ see Example 2.

Commonly used designs can produce a high relative variance for $n^{'} .$ If we do simple random sampling without replacement we get $n^{'} \sim Hyp (N, N^{'}, n)$ and $V (n^{'}) / E (n^{'}) = (1 - p) (N - n) / (N - 1) \approx (1 - p) (1 - n / N),$ which means that we need a large $p$ or a large sample fraction $n / N$ in order to achieve a small relative variance for $n^{'} .$ In many applications we will have a rather small $p$ and a small sampling fraction $n / N$ and, thus, for many designs (that do not use prior information which can explain to some extent if $y_{i} \neq 0$ or not) there will be a high relative variance for $n^{'} .$ To illustrate the magnitude of the resulting correlation between the estimator and its variance estimator an example for simple random sampling without replacement follows.

Example 2: For this example we first simulate a population of size $N = 1,000$ where $N^{'} =100,$ i.e., $p = 0 .1 .$ The 100 non-zero $y -$ values are simulated from $N (μ, σ^{2})$ with $μ =10$ and $σ =2.$ We select samples of size $n =200$ with simple random sampling, so $π_{i} = n / N$ and $π_{i j} = n (n - 1) / (N (N - 1))$ for $i \neq j .$ The observed correlation between $\hat{Y}$ and $\hat{V} (\hat{Y})$ was 0.974 for $10^{6}$ samples, see Figure 2.1 for the first 1,000 observations of $(\hat{Y}, \hat{V} (\hat{Y})) .$ If we increase $p$ to 0.3, the correlation is still above 0.9. The results remain unchanged if the ratio $σ / μ$ remains unchanged, e.g., we get the same correlations if $μ =100$ and $σ =20.$

Now, assume we have access to more than one sample for the estimation of $Y .$ As previously noted, with high positive correlations between the estimators and their corresponding variance estimators there is a risk of severe bias if we use a linear combination with estimated variances. The interest of using combined information may be the largest for small domains or rare properties, in which case the problem of high correlation is the most likely. Next, we turn to alternative options for using combined information from multiple samples.

Figure 2.1 Relationship between Horvitz-Thompson estimator and its variance estimator for a variable with 90% zeros

Description for Figure 2.1

Scatter plot showing the relationship between the Horvitz-Thompson estimator and its variance estimator for a variable with 90% of zeros, for the first 1,000 observations. The variance estimate is on the y-axis, ranging from 20,000 to 70,000. The estimate is on the x-axis, ranging from 400 to 1,800. The observed correlation of 0.974 between the estimation and its variance is translated by a strong positive linear relationship between x and y.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2019-07-04

Language selection

Search and menus

Search

On combining independent probability samples
Section 2. Combining separate estimates

2.1 Why positive correlation between estimator and variance estimator is common in sampling applications

On combining independent probability samples Section 2. Combining separate estimates

2.1 Why positive correlation between estimator and variance estimator is common in sampling applications

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

On combining independent probability samples
Section 2. Combining separate estimates