1 Introduction
J.N.K. Rao, F. Verret and M.A. Hidiroglou
Data collected from large-scale socio-economic, health and other surveys are extensively used for analysis purposes, such as inference on the regression parameters of linear and logistic linear regression population models. Ignoring the survey design features (such as stratification, clustering and unequal selection probabilities) can lead to erroneous inferences on model parameters because of sample selection bias caused by informative sampling. It is tempting to expand the models by including among the auxiliary variables all the design variables that define the selection process at the various levels and then ignore the design and apply standard methods to the expanded model. The main difficulties with this approach are the following (Pfeffermann and Sverchkov 2003): (1) Not all design variables may be known or accessible to the analyst; (2) Too many design variables can lead to difficulties in making inference from the expanded model; (3) The expanded model may no longer be of scientific interest to the analyst. On the other hand, the design-based approach can provide asymptotically valid repeated sampling inferences without changing the analyst's model. A unified approach based on the survey weighted estimating equations leads to design-consistent estimators of the "census� or finite population parameters which in turn estimate the associated model parameters. Further, re-sampling methods, such as the jackknife and the bootstrap for survey data, can provide valid variance estimators and associated inferences on the census parameters. The same methods may also be applicable to inference on the model parameters, in many cases of large-scale surveys. In other cases, it is necessary to estimate the model variance of the census parameters from the sample. The estimator of the total variance is then given by the sum of this estimator and the re-sampling variance estimator. Beaumont and Charest (2010) extended the bootstrap to estimate the total variance associated with the model parameters. We refer the reader to Rao et al. (2010) for an overview of methods for making inference on regression parameters from complex survey data.
In this paper, our focus is on making design-based inference on the variance component parameters and regression parameters of multi-level models from data obtained from multi-stage sampling designs corresponding to the levels of the model. For example, in an education study of students, schools (first-stage sampling units) may be selected with probabilities proportional to school size and students (second-stage units) within selected schools by stratified random sampling. Again, ignoring the survey design and using traditional methods for multi-level models can lead to erroneous inferences in the presence of sample selection bias. In the design-based approach, estimation of variance component parameters of the model is more difficult than that of regression parameters. Past work on multi-level models for survey data is summarized in Section 2. Our main purpose is to present a unified approach to making inference for general multi-level models from survey data, based on a weighted log composite likelihood approach (Section 4). The proposed methods lead to asymptotically valid inferences on the variance component parameters even when the within-cluster sample sizes are small, provided the number of sample clusters is large, unlike some of the existing methods summarized in Section 2. Limited simulation results are presented in Section 5.
- Date modified: