On combining independent probability samples
Section 1. Introduction

Table of contents

The idea of using all available information to produce better estimates is very appealing, but it is seldom clear how to proceed to achieve the best results. There is a vast literature on what has become known as meta-analysis, that builds on the idea of combining results of multiple studies. Cochran and Carroll (1953) and Cochran (1954) are two early papers that treat combination of estimates from different experiments. Koricheva, Gurevitch and Mengersen (2013) and Schmidt and Hunter (2014) are two books that provide an updated and more comprehensive treatment of meta-analysis. In this paper we do not treat combination of results from traditional experiments, but rather from multiple probability samples. We present all required design elements, such as inclusion probabilities of first and second order, for a general combination of multiple independent samples from different sampling designs. We also present new estimators for the variance of separate estimators based on the design of the combined samples. These suggested variance estimators can be thought of as general pooled variance estimators using all available information. In particular such pooled variance estimators can be used in a linear combination of separate estimators to reduce the mean square error (MSE) compared to using the separate, and thus independent, variance estimators.

A restriction is that we only treat combination of independent probability samples selected from the same population at the same point in time, or under the assumption that there has been a non-significant change in the target variable. Further, we assume that each sampling design is known to the extent that inclusion probabilities of first and second order are known for all units. In general we will also need to be able to uniquely identify each unit so that we can detect if the same unit is selected in more than one sample, or multiple times in the same sample. At least some of these assumptions may be quite restrictive as they may not hold in some practical circumstances.

Let $U = {1, 2, \dots, N}$ be the set of labels of the $N$ units in the population. Our objective is to estimate the total of a target variable $y,$ that takes value $y_{i}$ for unit $i \in U .$ Thus we wish to estimate $Y = \sum_{i =1}^{N} y_{i} .$ We assume access to $k$ independent probability samples $S^{(l)} ,$ $l =1, \dots, k,$ from $U,$ where the samples may be from different sampling designs. Under these assumptions, we investigate different options for estimating the population total by use of all available information. Knowledge of what units have been included in multiple different samples is required in some cases. Such knowledge is more readily available today in environmental monitoring and natural resource surveys, following the widespread use of accurate satellite-based positioning systems (Næsset and Gjevestad, 2008). In environmental studies the units can often be considered as locations with given coordinates, so the situation is different from surveys of e.g., people that may be anonymous or unidentifiable. Further, in several countries landscape and forest monitoring programmes are performed (Tomppo, Gschwantner, Lawrence and McRoberts, 2009; Ståhl, Allard, Esseen, Glimskår, Ringvall, Svensson, Sundquist, Christensen, Gallegos Torell, Högström, Lagerqvist, Marklund, Nilsson and Inghe, 2011; Fridman, Holm, Nilsson, Nilsson, Ringvall and Ståhl, 2014) which sometimes need to be augmented by special sampling programmes in order to reach specific accuracy targets for certain regions or years (Christensen and Ringvall, 2013).

In Section 2 we first recall the theory for an optimal linear combination of separate independent estimators. Then, in Section 3, we present the theory for combining independent samples. As a unit may be included in more than one sample or multiple times in the same sample we need to choose between using single or multiple count of inclusion. By using single count the resulting design becomes a without replacement design and multiple count results in a form of with replacement design. Two examples comparing different alternatives for estimation are presented in Section 4. We end with a discussion in Section 5.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2019-07-04

Language selection

Search and menus

Search

On combining independent probability samples
Section 1. Introduction

On combining independent probability samples Section 1. Introduction

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

On combining independent probability samples
Section 1. Introduction