On combining independent probability samples
Section 1. Introduction
The idea of using all available information to produce better estimates is very appealing, but it is seldom clear how to proceed to achieve the best results. There is a vast literature on what has become known as meta-analysis, that builds on the idea of combining results of multiple studies. Cochran and Carroll (1953) and Cochran (1954) are two early papers that treat combination of estimates from different experiments. Koricheva, Gurevitch and Mengersen (2013) and Schmidt and Hunter (2014) are two books that provide an updated and more comprehensive treatment of meta-analysis. In this paper we do not treat combination of results from traditional experiments, but rather from multiple probability samples. We present all required design elements, such as inclusion probabilities of first and second order, for a general combination of multiple independent samples from different sampling designs. We also present new estimators for the variance of separate estimators based on the design of the combined samples. These suggested variance estimators can be thought of as general pooled variance estimators using all available information. In particular such pooled variance estimators can be used in a linear combination of separate estimators to reduce the mean square error (MSE) compared to using the separate, and thus independent, variance estimators.
A restriction is that we only treat combination of independent probability samples selected from the same population at the same point in time, or under the assumption that there has been a non-significant change in the target variable. Further, we assume that each sampling design is known to the extent that inclusion probabilities of first and second order are known for all units. In general we will also need to be able to uniquely identify each unit so that we can detect if the same unit is selected in more than one sample, or multiple times in the same sample. At least some of these assumptions may be quite restrictive as they may not hold in some practical circumstances.
Let be the set of labels of the units in the population. Our objective is to estimate the total of a target variable that takes value for unit Thus we wish to estimate We assume access to independent probability samples from where the samples may be from different sampling designs. Under these assumptions, we investigate different options for estimating the population total by use of all available information. Knowledge of what units have been included in multiple different samples is required in some cases. Such knowledge is more readily available today in environmental monitoring and natural resource surveys, following the widespread use of accurate satellite-based positioning systems (Næsset and Gjevestad, 2008). In environmental studies the units can often be considered as locations with given coordinates, so the situation is different from surveys of e.g., people that may be anonymous or unidentifiable. Further, in several countries landscape and forest monitoring programmes are performed (Tomppo, Gschwantner, Lawrence and McRoberts, 2009; Ståhl, Allard, Esseen, Glimskår, Ringvall, Svensson, Sundquist, Christensen, Gallegos Torell, Högström, Lagerqvist, Marklund, Nilsson and Inghe, 2011; Fridman, Holm, Nilsson, Nilsson, Ringvall and Ståhl, 2014) which sometimes need to be augmented by special sampling programmes in order to reach specific accuracy targets for certain regions or years (Christensen and Ringvall, 2013).
In Section 2 we first recall the theory for an optimal linear combination of separate independent estimators. Then, in Section 3, we present the theory for combining independent samples. As a unit may be included in more than one sample or multiple times in the same sample we need to choose between using single or multiple count of inclusion. By using single count the resulting design becomes a without replacement design and multiple count results in a form of with replacement design. Two examples comparing different alternatives for estimation are presented in Section 4. We end with a discussion in Section 5.
- Date modified: