Robust variance estimators for generalized regression estimators in cluster samples
Section 1. Introduction

Generalized regression (GREG) estimation is a common technique used to calibrate estimates, reduce sampling errors, and correct for nonsampling errors. Official surveys of households often use generalized regression to calibrate sample-based estimates to population controls, assure consistent estimates of demographic characteristics across surveys, and reduce nonresponse and undercoverage errors. GREG estimation is also frequently used because it draws strength from auxiliary data, resulting in smaller sampling errors than other design-based estimators.

Popular techniques used to estimate the sampling errors of calibrated estimators from complex samples either require extensive computational resources or tend to underestimate the true sampling errors, especially with small to moderate sample sizes. Two popular techniques used to estimate the sampling variance of GREG estimators are linearization and replication. Linearization estimators (Särndal, Swensson and Wretman, 1989) may not converge to the true sampling error fast enough to produce accurate results in small to moderate samples. Särndal, Swensson and Wretman (1992, page 176) remark that “For complex statistics such as an estimator of a population variance, covariance, or correlation coefficient, fairly large samples may be required before the bias is negligible.” On the other hand, alternative replication techniques such as the jackknife and the bootstrap that generally produce larger variance estimates can be computationally demanding.

Leverage-adjusted sandwich estimators provide an alternative approach to estimating design-based sampling errors that also have model-based justifications. Royall and Cumberland (1978) applied this approach to develop estimators of the prediction variance of estimators of finite population totals. From a model-based framework, Long and Ervin (2000) and MacKinnon and White (1985) demonstrated how the sandwich estimator could be used for variance estimation for estimators of regression parameters even when the variance component of the working model was misspecified. Valliant (2002) took this approach to estimate the design-based variance of GREG estimators under one stage of sampling. This paper extends Valliant’s work to clustered sample designs.

In Section 2, we introduce the GREG estimator and present several alternative variance estimators for it. All derivations are contained in the Appendix. In Section 3, we show how the new variance estimators perform in several simulations. In Section 4, we summarize our findings with a conclusion.


Date modified: