Comparison of the conditional bias and Kokic and Bell methods for Poisson and stratified sampling
Section 3. Review of methods based on conditional bias

3.1 Definition

The conditional bias of an estimator $\hat{θ}$ for the parameter $θ,$ for a unit $i \in U$ was defined in the framework of Sampling Theory by Moreno-Rebollo et al. (1999) as follows:

$B_{1 i}^{\hat{θ}} = E_{P} (\hat{θ} - θ | I_{i} =1), (3.1)$

$B_{0 i}^{\hat{θ}} = E_{P} (\hat{θ} - θ | I_{i} =0) . (3.2)$

The conditional bias of a sampled unit is equal to the average of the difference between $\hat{θ}$ and $θ$ on the set of samples containing that unit. Similarly, the conditional bias of an unsampled unit is equal to the average of the sampling error for all samples not containing that unit.

In the case of a one-phase sampling design, the conditional bias of the Horvitz-Thompson estimator $\hat{T} (X) = \sum_{i \in S} \frac{x_{i}}{π_{i}}$ associated with a sampled unit $i$ is defined by

$B_{1 i}^{\hat{T} (X)} = \sum_{j \in U} (\frac{π_{i j} - π_{i} π_{j}}{π_{i} π_{j}}) x_{j} (3.3)$

where $π_{i j}$ designates the joint inclusion probability of units $i$ and $j$ in the sample. Conditional bias (3.3) is, in general, unknown since the values of the variable of interest are only observed for the units in the sample. In practice, it is possible to estimate it without bias, or in a robust way, from the sample. We consider the conditionally unbiased estimator (see, for example, Beaumont et al., 2013):

${\hat{B}}_{1 i}^{\hat{T} (X)} = \sum_{j \in S} (\frac{π_{i j} - π_{i} π_{j}}{π_{j} π_{i j}}) x_{j} . (3.4)$

This estimator is conditionally unbiased in the sense that $E_{P} ({\hat{B}}_{1 i}^{\hat{T} (X)} | I_{i} =1) = B_{1 i}^{\hat{T} (X)}$ only if $π_{i j}$ are strictly positive. Moreover, conditional bias (3.3) and its estimator (3.4) depend on the inclusion probabilities $π_{i}$ and the joint inclusion probabilities $π_{i j} .$ In other words, conditional bias is a measure that takes the sampling design into account.

For a Poisson design, the conditional bias of the sampled unit $i$ is given by

$B_{i}^{\hat{T} (X)} (I_{i} =1) = (d_{i} - 1) x_{i} . (3.5)$

Unlike the case of other sampling designs, such as simple random sampling without replacement, conditional bias (3.5) is known directly for all sample units and does not require estimation from the sample because it does not depend on any parameter of the finite population.

Conditional bias, as demonstrated by Beaumont et al. (2013), is a direct measure of the influence of each unit on the estimation error, the second relation being verified for maximum entropy sampling designs:

$V [\hat{T} (X)] = \sum_{i \in U} B_{1 i}^{\hat{T} (X)} y_{i} (3.6)$

$\hat{T} (X) - T (X) \approx \sum_{i \in S} B_{1 i}^{\hat{T} (X)} + \sum_{i \in U - S} B_{0 i}^{\hat{T} (X)} . (3.7)$

3.2 A robust estimator based on conditional bias

As shown by formulas (3.6) and (3.7), the conditional bias (CB) measures the effect of each unit on the estimation error and the estimation variance. A robust estimator should be defined in such a way that observations of the sample have only controlled and limited values of their conditional bias. Based on this idea, Beaumont et al. (2013) suggested using an estimator of the form:

$\begin{array}{l} {\hat{T}}^{CB} (X) (c) & = \hat{T} (X) + \sum_{i \in S} Ψ_{c} [{\hat{B}}_{1 i}^{\hat{T} (X)}] - \sum_{i \in S} {\hat{B}}_{1 i}^{\hat{T} (X)} \\ = \hat{T} (X) - \sum_{i \in S} [{\hat{B}}_{1 i}^{\hat{T} (X)} - Ψ_{c} ({\hat{B}}_{1 i}^{\hat{T} (X)})] \end{array}$

with $Ψ_{c}$ the Huber function defined by

$Ψ_{c} (t) = {\begin{array}{l} c & if t \geq c \\ t & if - c < t < c \\ - c & if - c \leq t \end{array}$

and ${\hat{B}}_{1 i}^{\hat{T} (X)}$ the estimator defined in (3.4).

The Huber function is used to limit the influence of the most influential units by truncating their conditional bias. Parameter $c$ can be chosen according to various optimization criteria for the robust estimator. For example, $c$ can be chosen to obtain the estimate having, under the sample design, the smallest mean square error. However, it is relatively complex or sometimes impossible to obtain an analytical expression of $c$ for a given sample design.

Beaumont et al. (2013) suggest choosing $c^{*} \in {argmin}_{c} {argmax}_{i} | {\hat{B}}_{1 i}^{{\hat{T}}^{CB} (X)} (c) |,$ i.e., the value of the constant $c$ for which the largest absolute value of the estimated conditional bias for the sample observations on the robust estimator is the lowest. In this case, the robust estimator is equal to:

${\hat{T}}^{CB} (X) (c^{*}) = {\hat{T}}^{BHR} (X) = \hat{T} (X) - \frac{{min}_{i} {\hat{B}}_{1 i}^{\hat{T} (X)} + {max}_{i} {\hat{B}}_{1 i}^{\hat{T} (X)}}{2} . (3.8)$

The Beaumont, Haziza and Ruiz-Gazen estimator is thus simple to implement. Compared to the Kokic and Bell method, it is more general because it is valid for all sampling designs and does not require any information outside the sample to be determined. In addition, it does not rely on any hypotheses about the variable of interest. The resulting estimator is robust under the sample design, while the Kokic and Bell estimator considers the sampling design and the distribution of the variable of interest. However, it is not designed to have the smallest mean square error, but to obtain an estimator on which the influence of each unit is limited, by minimizing the influence of the most influential unit.

The method has been extended to integrate more elements of the sample design and to adapt to certain situations. Favre-Martinoz et al. (2016) extended the method for a two-phase sampling design, which makes it possible to take non-response into account when it is assimilated to a second phase of Poisson drawing; Favre-Martinoz et al. (2015) proposed a method for ensuring the consistency of the robust estimators obtained when the parameters of interest are the totals of a variable in different domains included in one another.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2018-12-20

Language selection

Search and menus

Search

Comparison of the conditional bias and Kokic and Bell methods for Poisson and stratified sampling
Section 3. Review of methods based on conditional bias

3.1 Definition

3.2 A robust estimator based on conditional bias

Comparison of the conditional bias and Kokic and Bell methods for Poisson and stratified sampling Section 3. Review of methods based on conditional bias

3.1 Definition

3.2 A robust estimator based on conditional bias

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Comparison of the conditional bias and Kokic and Bell methods for Poisson and stratified sampling
Section 3. Review of methods based on conditional bias