Robust variance estimators for generalized regression estimators in cluster samples
Section 2. Theoretical results
Suppose that the population has
clusters. In cluster
there are
elements so that there are
elements in the population. The universe of
clusters is denoted as
and the universe of elements in cluster
is
An analysis variable
is associated with element
in cluster
The population total of
is
Each population element also has a
-vector of auxiliary variables,
that can be used in estimation. A two-stage
sample is selected without replacement at the first and second stages. The
selection probability of cluster
is
and
is the conditional selection probability of element
in cluster
The overall selection probability of element
is
Denote the set of sample clusters by
and the set of sample elements within cluster
by
The number of sample clusters is
while the number of sample elements selected
from sample cluster
is
The total sample size of elements is
As a working model, suppose that
the
-vector of analysis variables, follows the
following linear model:
where the subscript
denotes expectation with respect to a model; is the
matrix of auxiliaries with
being the
matrix of auxiliaries for the
elements in cluster
and
is a parameter vector of length
Elements within clusters are assumed to be
correlated while elements in different clusters are independent under the
model. Thus, the covariance matrix
is an
block diagonal matrix with diagonal matrices
A key feature of the variance estimators we
propose is that the particular form of
does not have to be known to construct
variance estimators. The proposed variance estimators will be consistent
regardless of the form of
Särndal et al. (1992, Chapter 8) discuss three
different GREG estimators that can be used in clustered samples. These three
estimators depend on the available data. We consider their case B which occurs
when unit-level data are available for the complete sample and control totals
are available for the population. In this case, the GREG estimator is
where
is the
-vector of
for the sample elements,
is the
-estimator of the total of the
is the
-vector of population totals of the
is the
-estimator of
and (if
is known)
with
the matrix of sample auxiliaries, and
is the part of
associated with the sample elements; and
where
is a vector of
1’s.
The component of the
-weight for sample cluster
is
with
being the
matrix of auxiliaries for sample elements in
sample cluster
is the
part of
for sample elements in sample cluster
and
is a vector of
1’s. Since
is generally unknown, a surrogate value
may be used for
is a common choice. Below, we assume that a
general
is used in the GREG rather than
2.1 Current variance estimators
Särndal et al. (1992, Result 8.9.1) present an
estimator of the design variance of
which involves joint selection probabilities
of clusters and elements within clusters. In the case of Poisson sampling at
both stages, their estimator is
where
is the
component of the
vector, and
This estimator is computationally simpler than
the general form that uses joint selection probabilities and may perform
reasonably well for
designs where the variance of estimators can
be approximated by formulas that assume independence between selections.
An estimator that is appropriate if the first-stage
sample is selected with replacement is
with
and
The jackknife linearization estimator is (Yung
and Rao, 1996)
where
and
with
being the
component of the
vector.
The jackknife is another popular variance estimation
technique. Krewski and Rao (1981) present several asymptotically equivalent
ways of writing the jackknife. The following form of the jackknife estimator is
a convenient starting point for the calculations that follow:
where
is the value of the GREG estimator after
removing cluster
and
is the average of all
estimates. Using (2.6) can be computationally
demanding because
different estimates of
must be computed. The estimators,
and
are all design-consistent under the conditions
in Krewski and Rao (1981) and Yung and Rao (1996). One of their key conditions
is that clusters be selected with replacement. This assumption simplifies
theoretical calculations but is only a convenience since the theoretical
results have been shown in many empirical studies to be good predictors of
estimator performance in without-replacement designs as long as the first-stage
sampling fraction is small.
2.2 New variance estimators
We use the model-based framework to construct new
variance estimators. First, we derive the model-based variance of
Assume that model (2.1) holds and that
sampling is ignorable in the sense that the probability of a unit’s being in
the sample given
and
depends only on
(e.g., see discussion in Valliant, Dorfman and
Royall, 2000, Section 2.6.2 and the additional references therein). Then,
we construct estimators of the model variance, using hat-matrix adjustments to
account for heterogeneity in the data. We evaluate the design-based properties
of the new variance estimators in a simulation.
To calculate the model variance of
define
as the population vector of analysis variables
for cluster
and
as the vector for sample elements. As shown in
Appendix A.2, under model
(2.1) the model-based variance of
is
where
the part of
associated with elements in
and
and
are vectors of
and
1’s.
The model-based error variance of
requires knowledge of
for the full population. Without some strong
assumptions that link the sample and nonsample covariance structures,
components of
associated with the nonsample cannot be
estimated from the sample. However, as shown in Appendix A.2, under some reasonable conditions the
orders of the terms are
and
so that
dominates the variance as the number of sample
and population clusters increase. Thus,
where
denotes asymptotic model variance
under the assumptions in Appendix A.1. A robust estimator of the
right-hand side of (2.7) can be formed even when
is unknown. On the other hand, if
the number of population clusters increases at the same rate as sample
clusters, (i.e.,
converges to a non-zero constant),
then
and
may all contribute importantly to
the asymptotic variance. In this paper, we will only consider estimation of
Unless
the true variance matrix of
is known,
must be estimated. In Appendix A.3 we show
that in large samples
where
with
and
being the
matrix of auxiliaries for sample elements in
sample cluster
Substituting
for
in (2.7) yields the sandwich estimator
Based on results in Appendix A.3,
is approximately unbiased for
in large samples. This sandwich
estimator is also closely related to the design-based, ultimate cluster
estimator for a sample design in which clusters are selected with replacement,
which is, in turn, similar to both
and
in with replacement sampling.
Consequently,
has both desirable design-based and
model-based properties.
In small
to moderate-sized samples,
will be model-biased and will often
underestimate the true variance. A hat-matrix adjustment can be made as a
correction. As shown in Appendix A.3,
where
with
and
being the
parts of
and
associated with sample cluster
As in (Li and Valliant, 2009; Valliant, 2002),
the
can be collected into a survey weighted hat
matrix:
Based on the assumptions in Appendix A.1,
from which we conclude that
The diagonal submatrices
are matrix analogs to leverages in
single-stage sampling. In ordinary least squares regression, the vector of
predicted values can be written as
with
Leverages are diagonals of the hat
matrix,
and can be used to correct for a
small sample bias in
as an estimator of
We use the
in an analogous way below.
To adjust
for the fact that
is model-biased for small to moderate samples,
we make leverage-like adjustments to
If
and the sample is self-weighting (i.e.,
for some
then
(see Appendix A.3). Solving for
and substituting into (2.8) gives the variance
estimator:
which, in this special case, is also
approximately unbiased since
One undesirable feature of
is that it can be negative or can
have negative contributions from some clusters if
For such clusters, replacing
with
will assure a positive variance
estimator. This adjustment is used in the simulation in Section 3.
In
Appendices A.4 and A.5, we show that the jackknife variance
estimator can be written exactly as
where
This form of
results in a significant reduction in
computations since only one GREG estimate is needed, rather than
estimates. (Of course, recomputing the GREG
for every jackknife replicate may still be advantageous if an elaborate
nonresponse adjustment affects the size of the true variance.)
In large samples
can be approximated by
or by
The estimators,
and
are clustered versions of the single-stage
approximations to the jackknife in Valliant (2002, equations (3.5), (3.6)).
As sketched in Appendix A.6,
and
are all asymptotically equivalent as
Since
and
are design-consistent, the alternative
estimators above can be expected to perform well over repeated samples when the
size of the first-stage sample is large, and when model (2.1) is approximately
correct. One caveat is that the sampling fraction of clusters must be small so
that estimators made from a without-replacement, first-stage sample will
perform as if the sample had been selected with-replacement.
None of these sandwich-like estimators includes finite
population correction factors. Thus, they may tend to overestimate the sampling
variance when a large proportion of the sample clusters is selected. To account
for this, we can further adjust all of the variance estimators in an ad hoc fashion by multiplying
the variance estimators by a finite population correction factor, denoted
as developed by Kott (1988). This results in
the following adjusted estimators:
When a simple random sample is selected in the first stage,
According to Kott (1988), an appropriate
correction when the first stage is selected with varying probabilities is
where
is the single draw probability for cluster
i.e., the probability that cluster
would be selected in a sample of size 1.