Survey Methodology
Archived Content
Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
Release date: December 20, 2018
In this issue
Dear readers,
We are pleased to be the co-editors of this special issue of Survey Methodology. It contains 10 articles selected from all the presentations given at the 9th Colloque francophone sur les sondages, held in Gatineau from October 11 to 14, 2016.
The first three articles of this issue discuss various aspects of small area estimation. The article by Rao, Rubin-Bleuer and Estevao proposes an estimator of the design mean square error and studies its properties. In their article, Bertarelli, Ranalli, Bartolucci, D’Alo and Solari consider a latent Markov model to estimate the number of employed and unemployed people for various small areas and apply their model to data from the Italian Labour Force Survey. Finally, the article by De Moliner and Goga compares four methods for estimating mean electricity consumption curves for small areas.
The next three articles deal with sampling problems. The article by Grafström and Matei introduces sample coordination procedures for spatially balanced sampling designs. The article by Ida, Rivest and Daigle reviews two balanced sampling methods and compares them by means of a simulation study. Rebecq and Merly-Alpa study the problem of sample allocation for stratified sampling designs with simple random sampling in each stratum. The authors propose a compromise between optimal allocation and proportional allocation that leads to weakly dispersed weights.
The last four articles in this issue examine different aspects of survey sampling methods. The article by Juillard and Chauvet studies the problem of point and variance estimation in the presence of unit non-response in panel surveys. In their article, Bosa, Godbout, Mills and Picard propose a decomposition of the variance in the presence of imputation, which is used to quantify the effect of converting a non-respondent to a respondent. They also evaluate their method through a simulation. Deroyon and Favre-Martinoz extend two methods for determining the winsorization threshold to the case of Poisson sampling designs and compares them empirically. Finally, the article by Tirari and Hdioud proposes a weighting effect to quantify the impact of calibration on accuracy using an approach based on the design and the model.
We hope you enjoy this issue!
Jean-Francois Beaumont and David Haziza.
Guest co-editors of this special issue
Invited papers
Measuring uncertainty associated with model-based small area estimators
by J.N.K. Rao, Susana Rubin-Bleuer and Victor M. Estevao
Domains (or subpopulations) with small sample sizes are called small areas. Traditional direct estimators for small areas do not provide adequate precision because the area-specific sample sizes are small. On the other hand, demand for reliable small area statistics has greatly increased. Model-based indirect estimators of small area means or totals are currently used to address difficulties with direct estimation. These estimators are based on linking models that borrow information across areas to increase the efficiency. In particular, empirical best (EB) estimators under area level and unit level linear regression models with random small area effects have received a lot of attention in the literature. Model mean squared error (MSE) of EB estimators is often used to measure the variability of the estimators. Linearization-based estimators of model MSE as well as jackknife and bootstrap estimators are widely used. On the other hand, National Statistical Agencies are often interested in estimating the design MSE of EB estimators in line with traditional design MSE estimators associated with direct estimators for large areas with adequate sample sizes. Estimators of design MSE of EB estimators can be obtained for area level models but they tend to be unstable when the area sample size is small. Composite MSE estimators are proposed in this paper and they are obtained by taking a weighted sum of the design MSE estimator and the model MSE estimator. Properties of the MSE estimators under the area level model are studied in terms of design bias, relative root mean squared error and coverage rate of confidence intervals. The case of a unit level model is also examined under simple random sampling within each area. Results of a simulation study show that the proposed composite MSE estimators provide a good compromise in estimating the design MSE.
Small area estimation for unemployment using latent Markov models
by Gaia Bertarelli, M. Giovanna Ranalli, Francesco Bartolucci, Michele D’Alò and Fabrizio Solari
In Italy, the Labor Force Survey (LFS) is conducted quarterly by the National Statistical Institute (ISTAT) to produce estimates of the labor force status of the population at different geographical levels. In particular, ISTAT provides LFS estimates of employed and unemployed counts for local Labor Market Areas (LMAs). LMAs are 611 sub-regional clusters of municipalities and are unplanned domains for which direct estimates have overly large sampling errors. This implies the need of Small Area Estimation (SAE) methods. In this paper we develop a new area level SAE method that uses a Latent Markov Model (LMM) as linking model. In LMMs, the characteristic of interest, and its evolution in time, is represented by a latent process that follows a Markov chain, usually of first order. Therefore, areas are allowed to change their latent state across time. The proposed model is applied to quarterly data from the LFS for the period 2004 to 2014 and fitted within a hierarchical Bayesian framework using a data augmentation Gibbs sampler. Estimates are compared with those obtained by the classical Fay-Herriot model, by a time-series area level SAE model, and on the basis of data coming from the 2011 Population Census.
Sample-based estimation of mean electricity consumption curves for small domains
by Anne De Moliner and Camelia Goga
Many studies conducted by various electric utilities around the world are based on the analysis of mean electricity consumption curves for various subpopulations, particularly geographic in nature. Those mean curves are estimated from samples of thousands of curves measured at very short intervals over long periods. Estimation for small subpopulations, also called small domains, is a very timely topic in sampling theory.
In this article, we will examine this problem based on functional data and we will try to estimate the mean curves for small domains. For this, we propose four methods: functional linear regression; modelling the scores of a principal component analysis by unit-level linear mixed models; and two non-parametric estimators, with one based on regression trees and the other on random forests, adapted to the curves. All these methods have been tested and compared using real electricity consumption data for households in France.
Coordination of spatially balanced samples
by Anton Grafström and Alina Matei
Sample coordination seeks to create a probabilistic dependence between the selection of two or more samples drawn from the same population or from overlapping populations. Positive coordination increases the expected sample overlap, while negative coordination decreases it. There are numerous applications for sample coordination with varying objectives. A spatially balanced sample is a sample that is well-spread in some space. Forcing a spread within the selected samples is a general and very efficient variance reduction technique for the Horvitz-Thompson estimator. The local pivotal method and the spatially correlated Poisson sampling are two general schemes for achieving well-spread samples. We aim to introduce coordination for these sampling methods based on the concept of permanent random numbers. The goal is to coordinate such samples while preserving spatial balance. The proposed methods are motivated by examples from forestry, environmental studies, and official statistics.
Using balanced sampling in creel surveys
by Ibrahima Ousmane Ida, Louis-Paul Rivest, and Gaétan Daigle
These last years, balanced sampling techniques have experienced a recrudescence of interest. They constrain the Horvitz Thompson estimators of the totals of auxiliary variables to be equal, at least approximately, to the corresponding true totals, to avoid the occurrence of bad samples. Several procedures are available to carry out balanced sampling; there is the cube method, see Deville and Tillé (2004), and an alternative, the rejective algorithm introduced by Hájek (1964). After a brief review of these sampling methods, motivated by the planning of an angler survey, we investigate using Monte Carlo simulations, the survey designs produced by these two sampling algorithms.
Optimizing a mixed allocation
by Antoine Rebecq and Thomas Merly-Alpa
This article proposes a criterion for calculating the trade-off in so-called “mixed” allocations, which combine two classic allocations in sampling theory. In INSEE (National Institute of Statistics and Economic Studies) business surveys, it is common to use the arithmetic mean of a proportional allocation and a Neyman allocation (corresponding to a trade-off of 0.5). It is possible to obtain a trade-off value resulting in better properties for the estimators. This value belongs to a region that is obtained by solving an optimization program. Different methods for calculating the trade-off will be presented. An application for business surveys is presented, as well as a comparison with other usual trade-off allocations.
Variance estimation under monotone non-response for a panel survey
by Hélène Juillard and Guillaume Chauvet
Panel surveys are frequently used to measure the evolution of parameters over time. Panel samples may suffer from different types of unit non-response, which is currently handled by estimating the response probabilities and by reweighting respondents. In this work, we consider estimation and variance estimation under unit non-response for panel surveys. Extending the work by Kim and Kim (2007) for several times, we consider a propensity score adjusted estimator accounting for initial non-response and attrition, and propose a suitable variance estimator. It is then extended to cover most estimators encountered in surveys, including calibrated estimators, complex parameters and longitudinal estimators. The properties of the proposed variance estimator and of a simplified variance estimator are estimated through a simulation study. An illustration of the proposed methods on data from the ELFE survey is also presented.
How to decompose the non-response variance: A total survey error approach
by Keven Bosa, Serge Godbout, Fraser Mills and Frédéric Picard
When a linear imputation method is used to correct non-response based on certain assumptions, total variance can be assigned to non-responding units. Linear imputation is not as limited as it seems, given that the most common methods – ratio, donor, mean and auxiliary value imputation – are all linear imputation methods. We will discuss the inference framework and the unit-level decomposition of variance due to non-response. Simulation results will also be presented. This decomposition can be used to prioritize non-response follow-up or manual corrections, or simply to guide data analysis.
Comparison of the conditional bias and Kokic and Bell methods for Poisson and stratified sampling
by Thomas Deroyon and Cyril Favre-Martinoz
In business surveys, it is common to collect economic variables with highly skewed distribution. In this context, winsorization is frequently used to address the problem of influential values. In stratified simple random sampling, there are two methods for selecting the thresholds involved in winsorization. This article comprises two parts. The first reviews the notations and the concept of a winsorization estimator. The second part details the two methods and extends them to the case of Poisson sampling, and then compares them on simulated data sets and on the labour cost and structure of earnings survey carried out by INSEE.
Criteria for choosing between calibration weighting and survey weighting
by Mohammed El Haj Tirari and Boutaina Hdioud
Based on auxiliary information, calibration is often used to improve the precision of estimates. However, calibration weighting may not be appropriate for all variables of interest of the survey, particularly those not related to the auxiliary variables used in calibration. In this paper, we propose a criterion to assess, for any variable of interest, the impact of calibration weighting on the precision of the estimated total. This criterion can be used to decide on the weights associated with each survey variable of interest and determine the variables for which calibration weighting is appropriate.
- Date modified: