Survey Methodology

Release date: June 24, 2021

The journal Survey Methodology Volume 47, Number 1 (June 2021) contains the following nine papers:

Waksberg Invited Paper Series

Science and survey management

by Roger Tourangeau

It is now possible to manage surveys using statistical models and other tools that can be applied in real time. This paper focuses on three developments that reflect the attempt to take a more scientific approach to the management of survey field work: 1) the use of responsive and adaptive designs to reduce nonresponse bias, other sources of error, or costs; 2) optimal routing of interviewer travel to reduce costs; and 3) rapid feedback to interviewers to reduce measurement error. The article begins by reviewing experiments and simulation studies examining the effectiveness of responsive and adaptive designs. These studies suggest that these designs can produce modest gains in the representativeness of survey samples or modest cost savings, but can also backfire. The next section of the paper examines efforts to provide interviewers with a recommended route for their next trip to the field. The aim is to bring interviewers’ field work into closer alignment with research priorities while reducing travel time. However, a study testing this strategy found that interviewers often ignore such instructions. Then, the paper describes attempts to give rapid feedback to interviewers, based on automated recordings of their interviews. Interviewers often read questions in ways that affect respondents’ answers; correcting these problems quickly yielded marked improvements in data quality. All of the methods are efforts to replace the judgment of interviewers, field supervisors, and survey managers with statistical models and scientific findings.

Full article  PDF version

Regular Papers

Integration of data from probability surveys and big found data for finite population inference using mass imputation

by Shu Yang, Jae Kwang Kim and Youngdeok Hwang

Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining data from a probability survey and big found data. We focus on the case when the study variable is observed in the big data only, but the other auxiliary variables are commonly observed in both data. Unlike the usual imputation for missing data analysis, we create imputed values for all units in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency.

Full article  PDF version

Sample empirical likelihood approach under complex survey design with scrambled responses

by Sixia Chen, Yichuan Zhao and Yuke Wang

One effective way to conduct statistical disclosure control is to use scrambled responses. Scrambled responses can be generated by using a controlled random device. In this paper, we propose using the sample empirical likelihood approach to conduct statistical inference under complex survey design with scrambled responses. Specifically, we propose using a Wilk-type confidence interval for statistical inference. Our proposed method can be used as a general tool for inference with confidential public use survey data files. Asymptotic properties are derived, and the limited simulation study verifies the validity of theory. We further apply the proposed method to some real applications.

Full article  PDF version

A method to find an efficient and robust sampling strategy under model uncertainty

by Edgar Bueno and Dan Hedlin

We consider the problem of deciding on sampling strategy, in particular sampling design. We propose a risk measure, whose minimizing value guides the choice. The method makes use of a superpopulation model and takes into account uncertainty about its parameters through a prior distribution. The method is illustrated with a real dataset, yielding satisfactory results. As a baseline, we use the strategy that couples probability proportional-to-size sampling with the difference estimator, as it is known to be optimal when the superpopulation model is fully known. We show that, even under moderate misspecifications of the model, this strategy is not robust and can be outperformed by some alternatives.

Full article  PDF version

Bayesian predictive inference of small area proportions under selection bias

by Seongmi Choi, Balgobin Nandram and Dalho Kim

In a previous paper, we developed a model to make inference about small area proportions under selection bias in which the binary responses and the selection probabilities are correlated. This is the homogeneous nonignorable selection model; nonignorable selection means that the selection probabilities and the binary responses are correlated. The homogeneous nonignorable selection model was shown to perform better than a baseline ignorable selection model. However, one limitation of the homogeneous nonignorable selection model is that the distributions of the selection probabilities are assumed to be identical across areas. Therefore, we introduce a more general model, the heterogeneous nonignorable selection model, in which the selection probabilities are not identically distributed over areas. We used Markov chain Monte Carlo methods to fit the three models. We illustrate our methodology and compare our models using an example on severe activity limitation of the U.S. National Health Interview Survey. We also perform a simulation study to demonstrate that our heterogeneous nonignorable selection model is needed when there is moderate to strong selection bias.

Full article  PDF version

Small area benchmarked estimation under the basic unit level model when the sampling rates are non-negligible

by Marius Stefan and Michael A. Hidiroglou

We consider the estimation of a small area mean under the basic unit-level model. The sum of the resulting model-dependent estimators may not add up to estimates obtained with a direct survey estimator that is deemed to be accurate for the union of these small areas. Benchmarking forces the model-based estimators to agree with the direct estimator at the aggregated area level. The generalized regression estimator is the direct estimator that we benchmark to. In this paper we compare small area benchmarked estimators based on four procedures. The first procedure produces benchmarked estimators by ratio adjustment. The second procedure is based on the empirical best linear unbiased estimator obtained under the unit-level model augmented with a suitable variable that ensures benchmarking. The third procedure uses pseudo-empirical estimators constructed with suitably chosen sampling weights so that, when aggregated, they agree with the reliable direct estimator for the larger area. The fourth procedure produces benchmarked estimators that are the result of a minimization problem subject to the constraint given by the benchmark condition. These benchmark procedures are applied to the small area estimators when the sampling rates are non-negligible. The resulting benchmarked estimators are compared in terms of relative bias and mean squared error using both a design-based simulation study as well as an example with real survey data.

Full article  PDF version

Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models

by Jan A. van den Brakel and Harm-Jan Boonstra

Changes in the design of a repeated survey generally result in systematic effects in the sample estimates, which are further referred to as discontinuities. To avoid confounding real period-to-period change with the effects of a redesign, discontinuities are often quantified by conducting the old and the new design in parallel for some period of time. Sample sizes of such parallel runs are generally too small to apply direct estimators for domain discontinuities. A bivariate hierarchical Bayesian Fay-Herriot (FH) model is proposed to obtain more precise predictions for domain discontinuities and is applied to a redesign of the Dutch Crime Victimization Survey. This method is compared with a univariate FH model where the direct estimates under the regular approach are used as covariates in a FH model for the alternative approach conducted on a reduced sample size and a univariate FH model where the direct estimates for the discontinuities are modeled directly. An adjusted step forward selection procedure is proposed that minimizes the WAIC until the reduction of the WAIC is smaller than the standard error of this criteria. With this approach more parsimonious models are selected, which prevents selecting complex models that tend to overfit the data.

Full article  PDF version

Bayesian pooling for analyzing categorical data from small areas

by Aejeong Jo, Balgobin Nandram and Dal Ho Kim

Bayesian pooling strategies are used to solve precision problems related to statistical analyses of data from small areas. In such cases, the subpopulation samples are usually small, even though the population might not be. As an alternative, similar data can be pooled in order to reduce the number of parameters in the model. Many surveys consist of categorical data on each area, collected into a contingency table. We consider hierarchical Bayesian pooling models with a Dirichlet process prior for analyzing categorical data based on small areas. However, the prior used to pool such data frequently results in an overshrinkage problem. To mitigate for this problem, the parameters are separated into global and local effects. This study focuses on data pooling using a Dirichlet process prior. We compare the pooling models using bone mineral density (BMD) data taken from the Third National Health and Nutrition Examination Survey for the period 1988 to 1994 in the United States. Our analyses of the BMD data are performed using a Gibbs sampler and slice sampling to carry out the posterior computations.

Full article  PDF version

Short note

A note on multiply robust predictive mean matching imputation with complex survey data

by Sixia Chen, David Haziza and Alexander Stubblefield

Predictive mean matching is a commonly used imputation procedure for addressing the problem of item nonresponse in surveys. The customary approach relies upon the specification of a single outcome regression model. In this note, we propose a novel predictive mean matching procedure that allows the user to specify multiple outcome regression models. The resulting estimator is multiply robust in the sense that it remains consistent if one of the specified outcome regression models is correctly specified. The results from a simulation study suggest that the proposed method performs well in terms of bias and efficiency.

Full article  PDF version


Date modified: