Analysis of complex survey data using inverse sampling
Practitioners often use data collected from complex surveys (such as labour force and health surveys involving stratified cluster sampling) to fit logistic regression and other models of interest. A great deal of effort over the last two decades has been spent on developing methods to analyse survey data that take account of design features. This paper looks at an alternative method known as inverse sampling.
Specialized programs, such as SUDAAN and WESVAR, are also available to implement some of the methods developed to take into account the design features. However, these methods require additional information such as survey weights, design effects or cluster identification of microdata and thus, another method is necessary.
Inverse sampling (Hinkins et al., Survey Methodology, 1977) provides an alternative approach by undoing the complex data structures so that standard methods can be applied. Repeated subsamples with simple random structure are drawn and each subsample is analysed by standard methods and is combined to increase the efficiency. Although computer-intensive, this method has the potential to preserve confidentiality of microdata files. A drawback of the method is that it can lead to biased estimates of regression parameters when the subsample sizes are small (as in the case of stratified cluster sampling).
In this paper, we propose using the estimating equation approach that combines the subsamples before estimation and thus leads to nearly unbiased estimates of regression parameters regardless of subsample sizes. This method is computationally less intensive than the original method. We apply the method to cluster-correlated data generated from a nested error linear regression model to illustrate its advantages. A real dataset from a Statistics Canada survey will also be analysed using the estimating equation method.
| Format | Release date | More information |
|---|---|---|
| CD-ROM | September 13, 2004 | |
| September 13, 2004 |