Robust variance estimators for generalized regression estimators in cluster samples
Section 2. Theoretical results

Table of contents

Suppose that the population has $i =1, 2, \dots, M$ clusters. In cluster $i$ there are $N_{i}$ elements so that there are $N = \sum_{i =1}^{M} N_{i}$ elements in the population. The universe of clusters is denoted as $U$ and the universe of elements in cluster $i$ is $U_{i} .$ An analysis variable $y_{i k}$ is associated with element $k$ in cluster $i .$ The population total of $y$ is $t_{U y} = \sum_{i =1}^{M} \sum_{k =1}^{N_{i}} y_{i k} .$ Each population element also has a $p$ -vector of auxiliary variables, $x_{i k},$ that can be used in estimation. A two-stage sample is selected without replacement at the first and second stages. The selection probability of cluster $i$ is $π_{i},$ and $π_{k | i}$ is the conditional selection probability of element $k$ in cluster $i .$ The overall selection probability of element $i k$ is $π_{i k} = π_{i} π_{k | i} .$ Denote the set of sample clusters by $s$ and the set of sample elements within cluster $i$ by $s_{i} .$ The number of sample clusters is $m$ while the number of sample elements selected from sample cluster $i$ is $n_{i} .$ The total sample size of elements is $n = \sum_{i \in s} n_{i} .$

As a working model, suppose that $Y_{U},$ the $N$ -vector of analysis variables, follows the following linear model:

$\begin{array}{l} E_{ξ} (Y_{U}) & = X β (2.1) \\ {cov}_{ξ} (Y_{U}) & = Ψ \end{array}$

where the subscript $ξ$ denotes expectation with respect to a model; $X = {[X_{1}^{⊤}, X_{2}^{⊤}, \dots, X_{M}^{⊤}]}^{⊤}$ is the $N \times p$ matrix of auxiliaries with $X_{i}$ being the $N_{i} \times p$ matrix of auxiliaries for the $N_{i}$ elements in cluster $i;$ and $β$ is a parameter vector of length $p .$ Elements within clusters are assumed to be correlated while elements in different clusters are independent under the model. Thus, the covariance matrix $Ψ$ is an $N \times N$ block diagonal matrix with diagonal matrices $Ψ_{i} = {[ψ_{i k}]}_{N_{i} \times N_{i}} .$ A key feature of the variance estimators we propose is that the particular form of $ψ_{i k}$ does not have to be known to construct variance estimators. The proposed variance estimators will be consistent regardless of the form of $Ψ .$

Särndal et al. (1992, Chapter 8) discuss three different GREG estimators that can be used in clustered samples. These three estimators depend on the available data. We consider their case B which occurs when unit-level data are available for the complete sample and control totals are available for the population. In this case, the GREG estimator is

$\begin{array}{l} {\hat{t}}_{y}^{g r} & = {\hat{t}}_{y π} + {\hat{B}}^{⊤} (t_{U x} - {\hat{t}}_{x π}) \\ = g^{⊤} Π^{- 1} y_{s} (2.2) \end{array}$

where $y_{s}$ is the $n$ -vector of $y ’ s$ for the sample elements, ${\hat{t}}_{y π}$ is the $π$ -estimator of the total of the $y ’ s,$ $t_{U x}$ is the $p$ -vector of population totals of the $x ’ s,$ ${\hat{t}}_{x π}$ is the $π$ -estimator of $t_{U x},$ and (if $Ψ$ is known) $\hat{B} = A^{- 1} X_{s}^{⊤} Ψ_{s}^{- 1} Π^{- 1} y_{s}$ with $A = X_{s}^{⊤} Ψ_{s}^{- 1} Π^{- 1} X_{s},$ $X_{s}$ the matrix of sample auxiliaries, and $Π = diag [π_{i k}]$ $(i \in s, k \in s_{i});$ $Ψ_{s}$ is the part of $Ψ$ associated with the sample elements; and $g^{⊤} = 1_{n}^{⊤} + {(t_{U x} - {\hat{t}}_{x π})}^{⊤} A^{- 1} X_{s}^{⊤} Ψ_{s}^{- 1}$ where $1_{n}$ is a vector of $n$ 1’s.

The component of the $g$ -weight for sample cluster $i$ is $g_{i}^{⊤} = 1_{n_{i}}^{⊤} + {(t_{U x} - {\hat{t}}_{x π})}^{⊤} A^{- 1} X_{s i}^{⊤} Ψ_{s i}^{- 1}$ with $X_{s i}^{⊤} = [x_{i 1}, \dots, x_{i n_{i}}]$ being the $p \times n_{i}$ matrix of auxiliaries for sample elements in sample cluster $i,$ $Ψ_{s i}$ is the $n_{i} \times n_{i}$ part of $Ψ_{i}$ for sample elements in sample cluster $i,$ and $1_{n_{i}}$ is a vector of $n_{i}$ 1’s. Since $Ψ$ is generally unknown, a surrogate value $Q$ may be used for $Ψ_{s}^{- 1};$ $Q = I$ is a common choice. Below, we assume that a general $Q$ is used in the GREG rather than $Ψ_{s}^{- 1} .$

2.1 Current variance estimators

Särndal et al. (1992, Result 8.9.1) present an estimator of the design variance of ${\hat{t}}_{y}^{g r},$ which involves joint selection probabilities of clusters and elements within clusters. In the case of Poisson sampling at both stages, their estimator is

$υ_{g} = \sum_{i \in s} \frac{(1 - π_{i})}{π_{i}^{2}} {({\hat{t}}_{e , i}^{g})}^{2} + \sum_{i \in s} \frac{1}{π_{i}} \sum_{k \in s_{i}} \frac{(1 - π_{k | i})}{π_{k | i}^{2}} g_{i k}^{2} e_{i k}^{2} (2.3)$

where ${\hat{t}}_{e , i}^{g} = \sum_{s_{i}} g_{i k} e_{i k} / π_{k | i},$ $g_{i k}$ is the $k^{th}$ component of the $g_{i}$ vector, and $e_{i k} = y_{i k} - x_{i k}^{⊤} \hat{B} .$ This estimator is computationally simpler than the general form that uses joint selection probabilities and may perform reasonably well for $π ps$ designs where the variance of estimators can be approximated by formulas that assume independence between selections.

An estimator that is appropriate if the first-stage sample is selected with replacement is

$υ_{w r} = \frac{m}{m - 1} \sum_{i \in s} {(e_{1 i} - {\bar{e}}_{1})}^{2} (2.4)$

with $e_{1 i} = \sum_{k \in s_{i}} e_{i k} / π_{i k}$ and ${\bar{e}}_{1} = m^{- 1} \sum_{i \in s} e_{1 i} .$ The jackknife linearization estimator is (Yung and Rao, 1996)

$υ_{J L} = \frac{m - 1}{m} \sum_{i \in s} {(e_{2 i} - {\bar{e}}_{2})}^{2} (2.5)$

where $e_{2 i} = \sum_{k \in s_{i}} g_{i k} e_{i k} / π_{i k}$ and ${\bar{e}}_{2} = m^{- 1} \sum_{i \in s} e_{2 i}$ with $g_{i k}$ being the $k^{th}$ component of the $g_{i}$ vector.

The jackknife is another popular variance estimation technique. Krewski and Rao (1981) present several asymptotically equivalent ways of writing the jackknife. The following form of the jackknife estimator is a convenient starting point for the calculations that follow:

$υ_{Jack} = \frac{m - 1}{m} \sum_{i \in s} {({\hat{t}}_{y (i)}^{g r} - {\hat{t}}_{y (\cdot)}^{g r})}^{2} (2.6)$

where ${\hat{t}}_{y (i)}^{g r}$ is the value of the GREG estimator after removing cluster $i$ and ${\hat{t}}_{y (\cdot)}^{g r}$ is the average of all ${\hat{t}}_{y (i)}^{g r}$ estimates. Using (2.6) can be computationally demanding because $m$ different estimates of ${\hat{t}}_{y (i)}^{g r}$ must be computed. The estimators, $υ_{Jack},$ $υ_{w r},$ and $υ_{J L}$ are all design-consistent under the conditions in Krewski and Rao (1981) and Yung and Rao (1996). One of their key conditions is that clusters be selected with replacement. This assumption simplifies theoretical calculations but is only a convenience since the theoretical results have been shown in many empirical studies to be good predictors of estimator performance in without-replacement designs as long as the first-stage sampling fraction is small.

2.2 New variance estimators

We use the model-based framework to construct new variance estimators. First, we derive the model-based variance of ${\hat{t}}_{y}^{g r} .$ Assume that model (2.1) holds and that sampling is ignorable in the sense that the probability of a unit’s being in the sample given $Y_{U}$ and $X$ depends only on $X$ (e.g., see discussion in Valliant, Dorfman and Royall, 2000, Section 2.6.2 and the additional references therein). Then, we construct estimators of the model variance, using hat-matrix adjustments to account for heterogeneity in the data. We evaluate the design-based properties of the new variance estimators in a simulation.

To calculate the model variance of ${\hat{t}}_{y}^{g r},$ define $y_{i}$ as the population vector of analysis variables for cluster $i,$ and $y_{s i}$ as the vector for sample elements. As shown in Appendix A.2, under model (2.1) the model-based variance of ${\hat{t}}_{y}^{g r}$ is

$\begin{array}{l} {var}_{ξ} ({\hat{t}}_{y}^{g r} - t_{U y}) & = \sum_{i \in s} g_{i}^{⊤} Π_{i}^{- 1} Ψ_{s i} Π_{i}^{- 1} g_{i} - 2 \sum_{i \in s} [g_{i}^{⊤} Π_{i}^{- 1} {cov}_{ξ} (y_{s i}, y_{i}) 1_{N_{i}}] + 1_{N}^{⊤} Ψ 1_{N} \\ = L_{1} - 2 L_{2} + L_{3} \end{array}$

where ${var}_{ξ} (y_{s i}) = Ψ_{s i},$ the part of $Ψ$ associated with elements in $s_{i},$ and $1_{N_{i}}$ and $1_{N}$ are vectors of $N_{i}$ and $N$ 1’s.

The model-based error variance of ${\hat{t}}_{y}^{g r}$ requires knowledge of $Ψ$ for the full population. Without some strong assumptions that link the sample and nonsample covariance structures, components of $Ψ$ associated with the nonsample cannot be estimated from the sample. However, as shown in Appendix A.2, under some reasonable conditions the orders of the terms are $L_{1} = O (M^{2} / m)$ and $L_{2} = L_{3} = O (M)$ so that $L_{1}$ dominates the variance as the number of sample and population clusters increase. Thus,

${av}_{ξ} ({\hat{t}}_{y}^{g r} - t_{U y}) = \sum_{i \in s} g_{i}^{⊤} Π_{i}^{- 1} Ψ_{s i} Π_{i}^{- 1} g_{i} (2.7)$

where ${av}_{ξ}$ denotes asymptotic model variance under the assumptions in Appendix A.1. A robust estimator of the right-hand side of (2.7) can be formed even when $Ψ_{s i}$ is unknown. On the other hand, if the number of population clusters increases at the same rate as sample clusters, (i.e., $f = m / M$ converges to a non-zero constant), then $L_{1},$ $L_{2},$ and $L_{3}$ may all contribute importantly to the asymptotic variance. In this paper, we will only consider estimation of $L_{1} .$

Unless the true variance matrix of $y_{s}$ is known, $Ψ_{i}$ must be estimated. In Appendix A.3 we show that in large samples ${var}_{ξ} (e_{i}) \approx Ψ_{i}$ where $e_{i} = y_{s i} - {\hat{y}}_{s i}$ with ${\hat{y}}_{s i} = X_{s i} \hat{B}$ and $X_{s i}$ being the $n_{i} \times p$ matrix of auxiliaries for sample elements in sample cluster $i .$ Substituting $e_{i} e_{i}^{⊤}$ for $Ψ_{s i}$ in (2.7) yields the sandwich estimator

$υ_{R} = \sum_{i \in s} g_{i}^{⊤} Π_{i}^{- 1} e_{i} e_{i}^{⊤} Π_{i}^{- 1} g_{i} . (2.8)$

Based on results in Appendix A.3, $υ_{R}$ is approximately unbiased for ${av}_{ξ} ({\hat{t}}_{y}^{g r} - t_{U y})$ in large samples. This sandwich estimator is also closely related to the design-based, ultimate cluster estimator for a sample design in which clusters are selected with replacement, which is, in turn, similar to both $υ_{g}$ and $υ_{J L}$ in with replacement sampling. Consequently, $υ_{R}$ has both desirable design-based and model-based properties.

In small to moderate-sized samples, $υ_{R}$ will be model-biased and will often underestimate the true variance. A hat-matrix adjustment can be made as a correction. As shown in Appendix A.3,

$E_{ξ} (e_{i} e_{i}^{⊤}) = {var}_{ξ} (e_{i}) = (I_{n_{i}} - H_{i i}) Ψ_{s i} {(I_{n_{i}} - H_{i i})}^{⊤} + \sum_{j \neq i; i , j \in s} H_{i j} Ψ_{s j} H_{i j}^{⊤} (2.9)$

where $H_{i j} = X_{s i}^{⊤} A^{- 1} X_{s j} Q_{j} Π_{j}^{- 1}$ $(i, j = 1, \dots, m)$ with $Q_{j}$ and $Π_{j}$ being the $n_{j} \times n_{j}$ parts of $Q$ and $Π$ associated with sample cluster $j .$ As in (Li and Valliant, 2009; Valliant, 2002), the $H_{i j}$ can be collected into a survey weighted hat matrix:

$\begin{array}{l} H & = X_{s} A^{- 1} X_{s}^{⊤} Q Π^{- 1} \\ = [\begin{matrix} X_{s 1} A^{- 1} X_{s 1}^{⊤} Q_{1} Π_{1}^{- 1} & \dots & X_{s 1} A^{- 1} X_{s m}^{⊤} Q_{m} Π_{m}^{- 1} \\ ⋮ & ⋱ & ⋮ \\ X_{s m} A^{- 1} X_{s 1}^{⊤} Q_{1} Π_{1}^{- 1} & \dots & X_{s m} A^{- 1} X_{s m}^{⊤} Q_{m} Π_{m}^{- 1} \end{matrix}] . (2.10) \end{array}$

Based on the assumptions in Appendix A.1, $H = O (m^{- 1}),$ from which we conclude that ${var}_{ξ} (e_{i}) \approx Ψ_{s i} .$ The diagonal submatrices $H_{i i}$ are matrix analogs to leverages in single-stage sampling. In ordinary least squares regression, the vector of predicted values can be written as $\hat{y} = H_{OLS} y$ with $H_{OLS} = X {(X^{T} X)}^{- 1} X^{T} .$ Leverages are diagonals of the hat matrix, $H_{OLS},$ and can be used to correct for a small sample bias in $e_{i}^{2} = {(y_{i} - {\hat{y}}_{i})}^{2}$ as an estimator of ${var}_{ξ} (y_{i}) .$ We use the $H_{i i}$ in an analogous way below.

To adjust for the fact that $e_{i} e_{i}^{⊤}$ is model-biased for small to moderate samples, we make leverage-like adjustments to $e_{i} e_{i}^{⊤} .$ If $Q = I$ and the sample is self-weighting (i.e., $Π = c I$ for some $0< c <1),$ then ${var}_{ξ} (e_{i}) = (I_{n_{i}} - H_{i i}) Ψ_{s i}$ (see Appendix A.3). Solving for $Ψ_{s i}$ and substituting into (2.8) gives the variance estimator:

$υ_{D} = \sum_{i \in s} g_{i}^{⊤} Π_{i}^{- 1} {(I_{n_{i}} - H_{i i})}^{- 1} e_{i} e_{i}^{⊤} Π_{i}^{- 1} g_{i} (2.11)$

which, in this special case, is also approximately unbiased since $H_{i i} = O (m^{- 1}) .$ One undesirable feature of $υ_{D}$ is that it can be negative or can have negative contributions from some clusters if $υ_{D i} = g_{i}^{⊤} Π_{i}^{- 1} {(I_{n_{i}} - H_{i i})}^{- 1} e_{i} e_{i}^{⊤} Π_{i}^{- 1} g_{i} <0.$ For such clusters, replacing $υ_{D i}$ with $υ_{R i} = g_{i}^{⊤} Π_{i}^{- 1} e_{i} e_{i}^{⊤} Π_{i}^{- 1} g_{i}$ will assure a positive variance estimator. This adjustment is used in the simulation in Section 3.

In Appendices A.4 and A.5, we show that the jackknife variance estimator can be written exactly as

$υ_{Jack} = \frac{m - 1}{m} [\sum_{i \in s} {(D_{i} - \bar{D})}^{2} - 2 \sum_{i \in s} (D_{i} - \bar{D}) F_{i} + \sum_{i \in s} F_{i}^{2}] (2.12)$

where

$\begin{array}{l} F_{i} & = (G_{i} - \bar{G}) - \frac{1}{n} (K_{i} - \bar{K}) \\ D_{i} & = g_{i}^{⊤} Π_{i}^{- 1} {(I_{n_{i}} - H_{i i})}^{- 1} e_{i} \\ K_{i} & = (1_{N}^{⊤} X_{U} - m 1_{n_{i}}^{⊤} Π_{i}^{- 1} X_{s i}) (\hat{B} - R_{i}); \bar{K} = m^{- 1} \sum_{i \in s} K_{i} \\ G_{i} & = 1_{n_{i}}^{⊤} Π_{i}^{- 1} {(I_{n_{i}} - H_{i i})}^{- 1} [H_{i i} y_{s i} - {\hat{y}}_{s i}]; \bar{G} = m^{- 1} \sum_{i \in s} G_{i} \\ R_{i} & = A^{- 1} X_{s i}^{⊤} Q_{i} Π_{i}^{- 1} {(I_{n_{i}} - H_{i i})}^{- 1} e_{i} . \end{array}$

This form of $υ_{Jack}$ results in a significant reduction in computations since only one GREG estimate is needed, rather than $m$ estimates. (Of course, recomputing the GREG for every jackknife replicate may still be advantageous if an elaborate nonresponse adjustment affects the size of the true variance.)

In large samples $υ_{Jack}$ can be approximated by

$υ_{J 1} = \frac{m - 1}{m} \sum_{i \in s} {(D_{i} - \bar{D})}^{2} (2.13)$

or by

$\begin{array}{l} υ_{J 2} & = \frac{m - 1}{m} \sum_{i \in s} D_{i}^{2} \\ = \frac{m - 1}{m} \sum_{i \in s} g_{i}^{⊤} Π_{i}^{- 1} {(I_{n_{i}} - H_{i i})}^{- 1} e_{i} e_{i}^{⊤} {(I_{n_{i}} - H_{i i})}^{- 1} Π_{i}^{- 1} g_{i} . (2.14) \end{array}$

The estimators, $υ_{J 1}$ and $υ_{J 2}$ are clustered versions of the single-stage approximations to the jackknife in Valliant (2002, equations (3.5), (3.6)).

As sketched in Appendix A.6, $υ_{Jack},$ $υ_{J L},$ $υ_{J 1},$ $υ_{J 2},$ $υ_{D},$ and $υ_{R}$ are all asymptotically equivalent as $m \to \infty .$ Since $υ_{Jack}$ and $υ_{J L}$ are design-consistent, the alternative estimators above can be expected to perform well over repeated samples when the size of the first-stage sample is large, and when model (2.1) is approximately correct. One caveat is that the sampling fraction of clusters must be small so that estimators made from a without-replacement, first-stage sample will perform as if the sample had been selected with-replacement.

None of these sandwich-like estimators includes finite population correction factors. Thus, they may tend to overestimate the sampling variance when a large proportion of the sample clusters is selected. To account for this, we can further adjust all of the variance estimators in an ad hoc fashion by multiplying the variance estimators by a finite population correction factor, denoted $f_{p c},$ as developed by Kott (1988). This results in the following adjusted estimators:

$\begin{array}{l} υ_{R}^{*} & = f_{p c} \sum_{i \in s} g_{i}^{⊤} Π_{i}^{- 1} e_{i} e_{i}^{⊤} Π_{i}^{- 1} g_{i} \\ υ_{D}^{*} & = f_{p c} \sum_{i \in s} g_{i}^{⊤} Π_{i}^{- 1} {(I_{n_{i}} - H_{i i})}^{- 1} e_{i} e_{i}^{⊤} Π_{i}^{- 1} g_{i} \\ υ_{Jack}^{*} & = f_{p c} \frac{m}{m - 1} [\sum_{i \in s} {(D_{i} - \bar{D})}^{2} - 2 \sum_{i \in s} (D_{i} - \bar{D}) F_{i} + \sum_{i \in s} F_{i}^{2}] \\ υ_{J 1}^{*} & = f_{p c} \frac{m}{m - 1} \sum_{i \in s} {(D_{i} - \bar{D})}^{2} \\ υ_{J 2}^{*} & = f_{p c} \sum_{i \in s} g_{i}^{⊤} Π_{i}^{- 1} {(I_{n_{i}} - H_{i i})}^{- 1} e_{i} e_{i}^{⊤} {(I_{n_{i}} - H_{i i})}^{- 1} Π_{i}^{- 1} g_{i} . \end{array}$

When a simple random sample is selected in the first stage, $f_{p c} = 1 - m / M .$ According to Kott (1988), an appropriate correction when the first stage is selected with varying probabilities is $f_{p c} =1 - m \sum_{i =1}^{M} p_{i}^{2}$ where $p_{i}$ is the single draw probability for cluster $i,$ i.e., the probability that cluster $i$ would be selected in a sample of size 1.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2019-12-17

Language selection

Search and menus

Search

Robust variance estimators for generalized regression estimators in cluster samples
Section 2. Theoretical results

2.1 Current variance estimators

2.2 New variance estimators

Robust variance estimators for generalized regression estimators in cluster samples Section 2. Theoretical results

2.1 Current variance estimators

2.2 New variance estimators

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Robust variance estimators for generalized regression estimators in cluster samples
Section 2. Theoretical results