2. Measure of influence: Conditional bias
Cyril Favre Martinoz, David Haziza and Jean-François Beaumont
Previous | Next
Consider a finite
population of individuals, denoted by
of size
We want to estimate the total for
the variable of interest
denoted by
From the population we select a
sample
of (expected) size
using the sampling design
A classical estimator of
is the expansion estimator, also
known as the Horvitz-Thompson estimator,
where
is the sampling weight of unit
and
denotes its probability of inclusion
in the sample. Although the expansion estimator,
is design-unbiased for
it can be highly unstable in the
presence of influential values.
To measure the
impact (or influence) that a sampled unit has on the expansion estimator, we
use the concept of conditional bias of a unit; see Moreno-Rebollo, Muñoz-Reyez
and Muñoz-Pichardo (1999), Moreno-Rebollo, Muñoz-Reyez, Jimenez-Gamero and
Muñoz-Pichardo (2002) and Beaumont, Haziza and Ruiz-Gazen (2013). Let
be the sample selection indicator
variable for unit
such that
if
and
otherwise. The conditional bias
of the estimator
associated with a sampled unit is
defined as
where
is the joint probability of
inclusion of units
and
in the sample. In general, the
conditional bias (2.1) is unknown, since the values of the variable of interest
are observed only for the sampled units. In practice, the conditional bias must
be estimated. We consider the conditionally unbiased estimator (for example,
see Beaumont et al. 2013):
This estimator is
conditionally unbiased in the sense that
We make the following remarks on the
conditional bias and its estimator: (i) The conditional bias (2.1) and its
estimator (2.2) depend on the inclusion probabilities
and the joint inclusion
probabilities
In other words, the conditional
bias is a measure that takes the sampling design into account. (ii) If
then
and, similarly,
That is, when
unit
is selected in all possible
samples, and consequently
since
is a design-unbiased estimator of
A unit selected systematically in
the sample therefore has no influence and does not contribute to the variance
of
(iii) The estimated conditional
bias (2.2) depends on the second-order inclusion probabilities,
For some designs, these
probabilities may be difficult to calculate, in which case approximations will
be used. For sampling designs that belong to the class of high-entropy designs
(e.g., Berger 1998), a number of approximations of the second-order inclusion
probabilities have been proposed in the literature; for example, see Haziza,
Mecatti and Rao (2008). An alternative solution is to calculate approximations
of the
using Monte Carlo methods; see
Fattorini (2006) and Thompson and Wu (2008).
For a stratified
simple random sampling design, the conditional bias (2.1) associated with
sampled unit
in stratum
is given by
where
denotes the size of the sample
selected in stratum
and
denotes the population of units
in stratum
of size
The estimator of the conditional
bias (2.2) reduces to
where
and
is the sample in stratum
For a Poisson
design, the conditional bias of sampled unit
is given by
In contrast
to the simple random sampling without-replacement design, the conditional bias
(2.4) is known for all units in the sample, since it does not depend on finite
population parameters.
Previous | Next