4. Application to winsorized estimators
Cyril Favre Martinoz, David Haziza and Jean-François Beaumont
Previous | Next
Estimator (3.5)
can be written in alternative forms, which can make it easier to implement in
some cases. We consider the winsorized form. This form has been widely studied
in the literature. As mentioned in Section 1, standard winsorization is
distinguished from Dalén-Tambay winsorization.
Standard
winsorization involves decreasing the value of units that are above a
particular threshold, taking their weight into account. Let
be the value of variable
for unit
after winsorization. We have
where
is the winsorization threshold.
The standard winsorized estimator of the total
is given by
where
Hence, the estimator
(4.2) can be written in the form (3.1). An alternative is to express
as a weighted sum of the initial
values using modified weights:
where
If
(that is, if unit
is not influential), then
Thus, the weight of a
non-influential unit is not modified. In contrast, the modified weight of an
influential unit is less than
and may even be less than 1. It
is worth noting that a unit with a value of
presents no particular problems,
since its contribution to the estimated total,
is zero. In this case, an
arbitrary value can be assigned to the modified weight
In the case of
Dalén-Tambay winsorization, the values of the variable of interest after
winsorization are defined by
This leads to the
winsorized estimator of the total
where
Estimator (4.5)
can also be written in the form (3.1). As in the case of
an alternative is to express
as a weighted sum of the initial
values using modified weights:
where
As in the
case of the standard winsorized estimator, the weight of a non-influential unit
is not modified. Unlike standard winsorization, Dalén-Tambay winsorization
guarantees that the modified weights will not be less than 1. Once again, a
unit with a value of
presents no particular problems,
since its contribution to the estimated total,
is zero. In this case, an
arbitrary value can be assigned to the modified weight
Since the standard
and Dalén-Tambay winsorized estimators are of the form (3.1), the optimal
constant
that minimizes (3.2) is obtained
by solving
or
where
in the case of
and
in the case of
It is shown in the Appendix that
a solution to equation (4.7) exists under the following conditions:
Condition 1
is satisfied for most one-stage designs used in practice, such as stratified
simple random sampling and Poisson sampling. Condition 2 implies that
must be less than or equal to
since by construction, a
winsorized estimator cannot be greater than the Horvitz-Thompson estimator. It
is generally expected that Condition 2 will be satisfied in most skewed
populations encountered in business surveys and social surveys. It is also
shown in the Appendix that the solution to equation (4.7) is unique if the
above conditions are met and if
for
The Appendix contains a brief
description of an algorithm for finding the solution to equation (4.7).
It should be noted
that while the value
is different for each type of
winsorized estimator used, the resulting robust estimators are identical. In
other words, we have
To compare the
influence of each population unit with respect to the (non-robust) expansion
estimator,
and its robust version (4.8), we
carried out a simulation study. For that purpose, we generated two populations,
each of size
One population was generated
according to a normal distribution with mean 4,108 and standard deviation
1,500, and the other was generated according to a lognormal distribution with
mean 4,108 and standard deviation 7,373. From each population we selected
samples according to two sampling
designs: (i) a simple random sampling without-replacement design of size
and (ii) a Bernoulli design of
expected size
First, we calculated the conditional
bias of the Horvitz-Thompson estimator for a simple random sampling
without-replacement design, given in (2.3) and for a Bernoulli design, given in
(2.4). Note that the conditional bias of the Horvitz-Thompson estimator does
not have to be approximated by simulation since all the population parameters
are known. The conditional bias associated with unit
of the robust estimator given in
(3.3) was approximated as follows: Out of the 500,000 selected samples, we
identified those which contained unit
In each of these samples, we
calculated the error,
Finally, we calculated the
average value of
over all the samples containing
unit
The results for
the simple random sampling without-replacement design for the normal and
lognormal distributions are shown in Figures 4.1 (a) and 4.1 (b) respectively.
The results for the Bernoulli sampling design for the normal and lognormal
distributions are shown in Figures 4.1 (c) and 4.1 (d) respectively. In each
figure, the absolute value of the conditional bias of
is shown in relation to the
absolute value of the conditional bias of
for each population unit. The
units above the first bisectrix have a conditional bias associated with
whose absolute value is greater
than that of the conditional bias associated with
Looking first at the results for
simple random sampling without replacement, we see that the behaviour of the
absolute value of the conditional bias of
is similar to that of the
absolute value of the conditional bias of
which indicates that the
influence of the units is not altered significantly after robustification of
the expansion estimator. This result is not surprising since the population
does not contain any highly influential units. In the case of the lognormal
distribution, we see that the influence of the values that have a high
conditional bias associated with
has been reduced significantly.
On the other hand, we note that for the majority of the data, the conditional
bias of
is slightly higher than that of
Turning to the results for Bernoulli
sampling, we see that in the case of the normal population, the influence of
most units has been reduced, since the absolute value of the conditional bias
of
is significantly lower than the
absolute value of the conditional bias of
In the case of the lognormal
distribution, the results are similar to those obtained with simple random
sampling without replacement for the same distribution.
Figure 4.1 Absolute value of the conditional
biases of the robust and non-robust estimators
Description for Figure 4.1
Previous | Next