Survey Methodology

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

December 2012

Content note: At this moment, full content is available in PDF only. A print version can also be ordered.

To access the PDF publication, please use the "Full content in PDF" link on the sidebar (on the left-hand side of this page).

Accessibility notice: If the following information is not accessible to you, please contact us for assistance or to obtain it in alternate formats.

The journal Survey Methodology Volume 38, Number 2 (December 2012) contains the following 9 papers:

Waksberg Invited Paper Series:

Survey Quality

Lars Lyberg


Survey quality is a multi-faceted concept that originates from two different development paths. One path is the total survey error paradigm that rests on four pillars providing principles that guide survey design, survey implementation, survey evaluation, and survey data analysis. We should design surveys so that the mean squared error of an estimate is minimized given budget and other constraints. It is important to take all known error sources into account, to monitor major error sources during implementation, to periodically evaluate major error sources and combinations of these sources after the survey is completed, and to study the effects of errors on the survey analysis. In this context survey quality can be measured by the mean squared error and controlled by observations made during implementation and improved by evaluation studies. The paradigm has both strengths and weaknesses. One strength is that research can be defined by error sources and one weakness is that most total survey error assessments are incomplete in the sense that it is not possible to include the effects of all the error sources. The second path is influenced by ideas from the quality management sciences. These sciences concern business excellence in providing products and services with a focus on customers and competition from other providers. These ideas have had a great influence on many statistical organizations. One effect is the acceptance among data providers that product quality cannot be achieved without a sufficient underlying process quality and process quality cannot be achieved without a good organizational quality. These levels can be controlled and evaluated by service level agreements, customer surveys, paradata analysis using statistical process control, and organizational assessment using business excellence models or other sets of criteria. All levels can be improved by conducting improvement projects chosen by means of priority functions. The ultimate goal of improvement projects is that the processes involved should gradually approach a state where they are error-free. Of course, this might be an unattainable goal, albeit one to strive for. It is not realistic to hope for continuous measurements of the total survey error using the mean squared error. Instead one can hope that continuous quality improvement using management science ideas and statistical methods can minimize biases and other survey process problems so that the variance becomes an approximation of the mean squared error. If that can be achieved we have made the two development paths approximately coincide.

Regular Papers:

Data collection: Experiences and lessons learned by asking sensitive questions in a remote coca growing region in Peru

Jaqueline Garcia-Yi and Ulrike Grote


Coca is a native bush from the Amazon rainforest from which cocaine, an illegal alkaloid, is extracted. Asking farmers about the extent of their coca cultivation areas is considered a sensitive question in remote coca growing regions in Peru. As a consequence, farmers tend not to participate in surveys, do not respond to the sensitive question(s), or underreport their individual coca cultivation areas. There is a political and policy concern in accurately and reliably measuring coca growing areas, therefore survey methodologists need to determine how to encourage response and truthful reporting of sensitive questions related to coca growing. Specific survey strategies applied in our case study included establishment of trust with farmers, confidentiality assurance, matching interviewer-respondent characteristics, changing the format of the sensitive question(s), and non enforcement of absolute isolation of respondents during the survey. The survey results were validated using satellite data. They suggest that farmers tend to underreport their coca areas to 35 to 40% of their true extent.

Imputation for nonmonotone nonresponse in the survey of industrial research and development

Jun Shao, Martin Klein and Jing Xu


Nonresponse in longitudinal studies often occurs in a nonmonotone pattern. In the Survey of Industrial Research and Development (SIRD), it is reasonable to assume that the nonresponse mechanism is past-value-dependent in the sense that the response propensity of a study variable at time point t depends on response status and observed or missing values of the same variable at time points prior to t. Since this nonresponse is nonignorable, the parametric likelihood approach is sensitive to the specification of parametric models on both the joint distribution of variables at different time points and the nonresponse mechanism. The nonmonotone nonresponse also limits the application of inverse propensity weighting methods. By discarding all observed data from a subject after its first missing value, one can create a dataset with a monotone ignorable nonresponse and then apply established methods for ignorable nonresponse. However, discarding observed data is not desirable and it may result in inefficient estimators when many observed data are discarded. We propose to impute nonrespondents through regression under imputation models carefully created under the past-value-dependent nonresponse mechanism. This method does not require any parametric model on the joint distribution of the variables across time points or the nonresponse mechanism. Performance of the estimated means based on the proposed imputation method is investigated through some simulation studies and empirical analysis of the SIRD data.

Some theory for propensity-score-adjustment estimators in survey sampling

Jae Kwang Kim and Minsun Kim Riddles


The propensity-scoring-adjustment approach is commonly used to handle selection bias in survey sampling applications, including unit nonresponse and undercoverage. The propensity score is computed using auxiliary variables observed throughout the sample. We discuss some asymptotic properties of propensity-score-adjusted estimators and derive optimal estimators based on a regression model for the finite population. An optimal propensity-score-adjusted estimator can be implemented using an augmented propensity model. Variance estimation is discussed and the results from two simulation studies are presented.

Assessing the accuracy of response propensity models in longitudinal studies

Ian Plewis, Sosthenes Ketende and Lisa Calderwood


Non-response in longitudinal studies is addressed by assessing the accuracy of response propensity models constructed to discriminate between and predict different types of non-response. Particular attention is paid to summary measures derived from receiver operating characteristic (ROC) curves and logit rank plots. The ideas are applied to data from the UK Millennium Cohort Study. The results suggest that the ability to discriminate between and predict non-respondents is not high. Weights generated from the response propensity models lead to only small adjustments in employment transitions. Conclusions are drawn in terms of the potential of interventions to prevent non-response.

Confidence interval estimation of small area parameters shrinking both means and variances

Sarat C. Dass, Tapabrata Maiti, Hao Ren and Samiran Sinha


We propose a new approach to small area estimation based on joint modelling of means and variances. The proposed model and methodology not only improve small area estimators but also yield "smoothed" estimators of the true sampling variances. Maximum likelihood estimation of model parameters is carried out using EM algorithm due to the non-standard form of the likelihood function. Confidence intervals of small area parameters are derived using a more general decision theory approach, unlike the traditional way based on minimizing the squared error loss. Numerical properties of the proposed method are investigated via simulation studies and compared with other competitive methods in the literature. Theoretical justification for the effective performance of the resulting estimators and confidence intervals is also provided.

Condition indexes and variance decompositions for diagnosing collinearity in linear model analysis of survey data

Dan Liao and Richard Valliant


Collinearities among explanatory variables in linear regression models affect estimates from survey data just as they do in non-survey data. Undesirable effects are unnecessarily inflated standard errors, spuriously low or high t-statistics, and parameter estimates with illogical signs. The available collinearity diagnostics are not generally appropriate for survey data because the variance estimators they incorporate do not properly account for stratification, clustering, and survey weights. In this article, we derive condition indexes and variance decompositions to diagnose collinearity problems in complex survey data. The adapted diagnostics are illustrated with data based on a survey of health characteristics.

Bayesian inference for finite population quantiles from unequal probability samples

Qixuan Chen, Michael R. Elliott and Roderick J.A. Little


This paper develops two Bayesian methods for inference about finite population quantiles of continuous survey variables from unequal probability sampling. The first method estimates cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function. This method is quite computationally demanding. The second method predicts non-sampled values by assuming a smoothly-varying relationship between the continuous survey variable and the probability of inclusion, by modeling both the mean function and the variance function using splines. The two Bayesian spline-model-based estimators yield a desirable balance between robustness and efficiency. Simulation studies show that both methods yield smaller root mean squared errors than the sample-weighted estimator and the ratio and difference estimators described by Rao, Kovar, and Mantel (RKM 1990), and are more robust to model misspecification than the regression through the origin model-based estimator described in Chambers and Dunstan (1986). When the sample size is small, the 95% credible intervals of the two new methods have closer to nominal confidence coverage than the sample-weighted estimator.

Short Notes:

Multiple imputation with census data

Satkartar K. Kinney


A benefit of multiple imputation is that it allows users to make valid inferences using standard methods with simple combining rules. Existing combining rules for multivariate hypothesis tests fail when the sampling error is zero. This paper proposes modified tests for use with finite population analyses of multiply imputed census data for the applications of disclosure limitation and missing data and evaluates their frequentist properties through simulation.

Date modified: