Browse by

2. Measure of influence: Conditional bias

Cyril Favre Martinoz, David Haziza and Jean-François Beaumont

Consider a finite population of individuals, denoted by $U,$ of size $N .$ We want to estimate the total for the variable of interest $y,$ denoted by $t = \sum_{i \in U} y_{i} .$ From the population we select a sample $S,$ of (expected) size $n,$ using the sampling design $p (S) .$ A classical estimator of $t$ is the expansion estimator, also known as the Horvitz-Thompson estimator, $\hat{t} = \sum_{i \in S} d_{i} y_{i},$ where $d_{i} = 1 / π_{i}$ is the sampling weight of unit $i$ and $π_{i}$ denotes its probability of inclusion in the sample. Although the expansion estimator, $\hat{t},$ is design-unbiased for $t,$ it can be highly unstable in the presence of influential values.

To measure the impact (or influence) that a sampled unit has on the expansion estimator, we use the concept of conditional bias of a unit; see Moreno-Rebollo, Muñoz-Reyez and Muñoz-Pichardo (1999), Moreno-Rebollo, Muñoz-Reyez, Jimenez-Gamero and Muñoz-Pichardo (2002) and Beaumont, Haziza and Ruiz-Gazen (2013). Let $I_{i}$ be the sample selection indicator variable for unit $i$ such that $I_{i} = 1$ if $i \in S$ and $I_{i} = 0,$ otherwise. The conditional bias of the estimator $\hat{t}$ associated with a sampled unit is defined as

$B_{1 i}^{HT} = E_{p} (\hat{t} | I_{i} =) - t = \sum_{j \in U} (\frac{π_{i j} - π_{i} π_{j}}{π_{i} π_{j}}) y_{j}, (2.1)$

where $π_{i j}$ is the joint probability of inclusion of units $i$ and $j$ in the sample. In general, the conditional bias (2.1) is unknown, since the values of the variable of interest are observed only for the sampled units. In practice, the conditional bias must be estimated. We consider the conditionally unbiased estimator (for example, see Beaumont et al. 2013):

$\begin{array}{l} {\hat{B}}_{1 i}^{HT} & = \sum_{j \in S} (\frac{π_{i j} - π_{i} π_{j}}{π_{j} π_{i j}}) y_{j} \\ = (d_{i} - 1) y_{i} + \sum_{j \in S, j \neq i} (\frac{π_{i j} - π_{i} π_{j}}{π_{j} π_{i j}}) y_{j} . (2.2) \end{array}$

This estimator is conditionally unbiased in the sense that $E_{p} ({\hat{B}}_{1 i}^{HT} | I_{i} = 1) = B_{1 i}^{HT} .$ We make the following remarks on the conditional bias and its estimator: (i) The conditional bias (2.1) and its estimator (2.2) depend on the inclusion probabilities $π_{i}$ and the joint inclusion probabilities $π_{i j} .$ In other words, the conditional bias is a measure that takes the sampling design into account. (ii) If $π_{i} = 1,$ then $B_{1 i}^{HT} = 0$ and, similarly, ${\hat{B}}_{1 i}^{HT} = 0.$ That is, when $π_{i} = 1,$ unit $i$ is selected in all possible samples, and consequently $E_{p} (\hat{t} | I_{i} = 1) - t = E_{p} (\hat{t}) - t = 0,$ since $\hat{t}$ is a design-unbiased estimator of $t .$ A unit selected systematically in the sample therefore has no influence and does not contribute to the variance of $\hat{t} .$ (iii) The estimated conditional bias (2.2) depends on the second-order inclusion probabilities, $π_{i j} .$ For some designs, these probabilities may be difficult to calculate, in which case approximations will be used. For sampling designs that belong to the class of high-entropy designs (e.g., Berger 1998), a number of approximations of the second-order inclusion probabilities have been proposed in the literature; for example, see Haziza, Mecatti and Rao (2008). An alternative solution is to calculate approximations of the $π_{i j}$ using Monte Carlo methods; see Fattorini (2006) and Thompson and Wu (2008).

For a stratified simple random sampling design, the conditional bias (2.1) associated with sampled unit $i$ in stratum $h$ is given by

$B_{1 i}^{HT} = \frac{N_{h}}{N_{h} - 1} (\frac{N_{h}}{n_{h}} - 1) (y_{i} - {\bar{y}}_{U h}), (2.3)$

where $n_{h}$ denotes the size of the sample selected in stratum $h, {\bar{y}}_{U h} = N_{h}^{- 1} \sum_{i \in U_{h}} y_{i},$ and $U_{h}$ denotes the population of units in stratum $h$ of size $N_{h}, h = 1, \dots, H .$ The estimator of the conditional bias (2.2) reduces to

${\hat{B}}_{1 i}^{HT} = \frac{n_{h}}{n_{h} - 1} (\frac{N_{h}}{n_{h}} - 1) (y_{i} - {\bar{y}}_{S h}),$

where ${\bar{y}}_{S h} = n_{h}^{- 1} \sum_{i \in S_{h}} y_{i}$ and $S_{h}$ is the sample in stratum $h .$

For a Poisson design, the conditional bias of sampled unit $i$ is given by

$B_{i}^{HT} (I_{i} = 1) = (d_{i} - 1) y_{i} . (2.4)$

In contrast to the simple random sampling without-replacement design, the conditional bias (2.4) is known for all units in the sample, since it does not depend on finite population parameters.

Previous | Next

Date modified:: 2015-11-27

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

2. Measure of influence: Conditional bias