5 Variance estimation and weight sharing
Anne Massiani
Previous | Next
The calculations presented in this section are
adaptations of the techniques developed by Lavallée (2002, Chapter 8.5) for the
treatment of cluster non-response (CNR) in the context of indirect sampling.
The results are given in the fictitious case where the probabilities of
response at the different stages of the survey are known. The quantities as well as the quantities defined by (4.4) are also assumed to be known.
All these quantities will be replaced by estimates in Section 7. Let denote the indicator variable that is equal to
1 if household present in survey year is included in sample responding in the wave. Conditional upon the fact that an
attempt was made to contact it via the longitudinals that it contains,
household belongs to if it responded to the grid and then to the
questionnaire in survey year Therefore, we have:
(5.1)
Using theorem 8.1 of Lavallée (2002, page 151) and
Note 1, we can easily verify that the estimator (4.7) can be rewritten in the
following form:
where we have noted, for any longitudinal belonging to household in the survey year:
The estimator is consequently reduced to a sum over
individuals selected directly. We now decompose the variance of in a standard way by conditioning on
(5.4)
For any individual who is present in the population during the
year in which is drawn and who belongs to household during the survey year, let
Using (5.1), we verify that for all longitudinal and included in and belonging to households and respectively during the survey year, we have:
(5.6)
and
Formula (5.4) then becomes:
where designates the number of persons present in
the population during the year in which is drawn. The first term is the portion of the variance due to the
mechanism for selecting the longitudinals in while the second term is the portion due to households' non-response
to the grid and then to the household questionnaire in year which constitutes cluster non-response.
To obtain an estimator of the variance of we adapt the variance estimation formula
(8.37) of Lavallée (2002, page 154). The following differences should be noted.
First, we are ignoring the fact that in practice, the response probabilities will
have to be estimated, whereas Lavallée (2002) takes this into account. Second,
the estimation method proposed by Lavallée (2002) provides biased estimates,
even when it is applied in a situation where the response probabilities are
known. Consequently, we have adapted his method so as to obtain an unbiased
estimator of the variance. To justify our approach, we first explain below the
bias obtained by applying the method of Lavallée (2002) in a case where
response probabilities are known. His method consists in estimating by an unbiased estimator then estimating by where:
is the Horvitz-Thompson estimator of the variance
of
This leads to the variance estimator:
(5.11)
The use of which is constructed by replacing the vales that appear in (5.9) by is motivated by the fact that the values are not known for all the longitudinals
in but only for the longitudinals who, in the
survey year, belong to a household that responded to the questionnaire, that is,
a household belonging to The use of values makes it possible to assign more weight
to the longitudinals of for which the values are known. The problem is that the estimator thus constructed does not provide an unbiased
estimate of This may easily be seen by observing what
happens for the diagonal terms: for any longitudinal belonging to household during the survey year, the quantity appearing in (5.9) is replaced by while a weight increase of only a factor
of is probably a better choice. The same type of
problem occurs for the product when longitudinal and belong to the same household during the survey
year. More specifically, we have, for all longitudinals and belonging to
(5.12)
which implies
(5.13)
Since on the other hand is an unbiased estimator of we have:
(5.15)
The bias depends in particular on the term defined by (5.7), and hence on the
probabilities and of responding to the grid and the
questionnaire in the survey year. Consider the simple case of the panel
responding in the first wave, in which the composition of the households has
not yet begun to evolve. The quantity defined by (5.7) is positive if longitudinal and belong to the same household and is otherwise nil. Also, for all
longitudinals and belonging to the same household we have, in accordance with the relation (2.4):
Since the latter quantity is also positive, the
expression of bias given by formula (5.16) means that is positive. On the other hand, the
probabilities of inclusion are very low, and therefore Using (5.16) and (5.17), we are therefore able
to make the following approximation:
where, as noted above, is defined by (5.8) and corresponds in the
first wave to the portion of the variance due to the non-response observed
between the grid and the questionnaire. Consequently, the estimator overestimates the variance of and the error committed is of the order of
magnitude of The bias may be relatively large if the
probabilities of response to the questionnaire are low. As regards the other
waves, the quantity that appears in (5.16) depends on the
households and to which the longitudinals and belonged during the year of their selection,
and it is no longer easy to obtain an order of magnitude of the bias
We introduce a term correcting the bias and we give our variance estimation formula in
proposition 1, below. Keeping in mind that designates all persons who comprise household during survey year (cf.
page 11), let be the set, of cardinal consisting of the longitudinals belonging to
Proposition 1: An unbiased estimate of the variance of is given by
(5.19)
where
and
The demonstration of proposition 1 is provided in
Appendix B.
Note 3: The estimator is the sum of two biased estimators whose
biases are brought into balance by construction, with the result that gives an unbiased estimate of the variance of
Note 4: Proposition 1 is based on the assumption
that the values and the response probabilities are known,
which enables us to conclude that the estimator given by formula (5.19) is unbiased. In
practice, these quantities must be estimated. The consequence of this is that
the estimator of variance thus obtained is no longer unbiased but only
asymptotically unbiased, provided that the non-response models can be
considered correct and that their parameters are estimated by an appropriate
method.
Previous | Next