Survey Methodology

Release date: June 30, 2025

The journal Survey Methodology Volume 51, Number 1 (June 2025) contains the following twenty three papers:

Special issue for the 50th anniversary of Survey Methodology

In this issue: A celebration of the 50th anniversary of Survey Methodology

by Jean-François Beaumont

Abstract

This paper is an introduction to the eleven papers included in this special issue for the celebration of the 50th anniversary of the Survey Methodology Journal.

HTML version  PDF version

Invited discussion papers

Progress in survey science and practice: Yesterday-today-tomorrow

by Carl-Erik Särndal

Abstract

This article confronts survey science with important notions in philosophy of science: progress, paradigm, research tradition, research programmes. The article is conceptual and exploratory, rather than mathematical/technical.

This is against a background where survey science must evolve in unfamiliar and challenging conditions. Society is changing. Survey nonresponse is high. Probability sampling surveys are in question, considered too expensive. Low cost alternative data sources − big data and others − must, in the opinion of some, be incorporated in statistics production at the national statistical offices.

A lively research tradition has brought progress in survey science over more than one hundred years. The article recalls some of that progress and tries to foresee how the tradition may survive and face the coming decades.

HTML version  PDF version

Comments on “Progress in survey science and practice: Yesterday-today-tomorrow”

by J. Michael Brick

Abstract

Survey science appears to be in a critical condition and its future direction is unclear. This paper diagnoses the situation and poses important questions to researchers and users of surveys. My discussion emphasizes the role of design in survey science and the implications of data collected without design considerations.

HTML version  PDF version

Comments on “Progress in survey science and practice: Yesterday-today-tomorrow”

by Constance F. Citro

Abstract

Carl-Erik Särndal’s essay on the challenges to the probability sample survey paradigm (or research tradition) quotes my 2014 article in this journal, which “impatiently” called for a move to a mixed data (or blended data) sources paradigm. I explain my intent not to downgrade probability surveys but to blend them with administrative records and other sources to improve data quality and relevance. The United States has made strides toward blended data since I wrote my article.

HTML version  PDF version

Comments on “Progress in survey science and practice: Yesterday-today-tomorrow”

by Robert E. Fay

Abstract

The attempt to set the current concerns over the future of survey science in the context of the history and philosophy of science offers little specific guidance on the path forward. But the author is to be thanked for sharing his thoughts and encouraging new solutions.

HTML version  PDF version

Comments on “Progress in survey science and practice: Yesterday-today-tomorrow”

by Risto Lehtonen

Abstract

In his article, Professor Carl-Erik Särndal presents for sample-based statistics a new conceptual framework with only a few key assumptions. Selected aspects of the research tradition in Survey Science are briefly discussed in my comments.

HTML version  PDF version

Comments on “Progress in survey science and practice: Yesterday-today-tomorrow”

by Eric Rancourt

Abstract

In his paper, Särndal is reviewing the scientific aspects of the development of the survey sampling theory. In light of multiple changes in this field, some have called for a new paradigm. Upon careful analysis, Särndal lands on saying that there has been a strong research tradition which is anchored on assumptions about finite populations and feasibility of characterizing them with only a sample. With this framework, there can still be research and change, but the paradigm would essentially remain. In my discussion of this article, after providing precisions on the context of National Statistical Offices (mainly about Statistics Canada), I agree on many points and wonder if it is not a change in methodological paradigm rather than statistical paradigm that we are witnessing and point to some possible ways forward.

HTML version  PDF version

Comments on “Progress in survey science and practice: Yesterday-today-tomorrow”

by Mary E. Thompson

Abstract

These comments on C.-E. Särndal’s paper, “Progress in survey science and practice: Yesterday-today-tomorrow”, will touch on probability sampling fundamentals, progress through competing approaches to inference, connections with other parts of statistics, and data in the twenty-first century.

HTML version  PDF version

Author’s response to comments on “Progress in survey science and practice: Yesterday-today-tomorrow”

by Carl-Erik Särndal

Abstract

This rejoinder is arranged as a series of themes or issues, inspired by the original article, and addressed, to varying degrees, in the six discussions. Among the themes: probability sampling and other paradigms in survey science; the role of the national statistical institutes in the growth of survey science; recent breakthroughs in the use of administrative data in statistics production, with multiple data inputs; the research tradition: a finite population and a well-behaved sample; deepened awareness, in recent decades, of the tradition and its ramifications; the theory track and the role of the academic sector; attempts, over time, at resolving problems; imperfections in the data collection, in the realized sample; nonresponse treatment, responsive design, panel surveys; realpolitik in national statistics production: a realistic approach to meet urgent demands for statistical information.

HTML version  PDF version

Trends and directions in sample survey theory and methods

by J.N.K. Rao and Sharon L. Lohr

Abstract

Rao (1999) summarized trends in sample survey theory and methods at the turn of the millenium. We provide an updated discussion of some current trends in survey design and estimation methods for the 50th anniversary of Survey Methodology. Recent innovations in survey design include research on anticipating nonsampling errors at the design stage and development of balanced and adaptive sampling designs to take advantage of detailed sampling frame information or data gathered during the survey process. Nonparametric and machine learning methods are increasingly used for data editing as well as for model-assisted estimation and nonresponse adjustments. Small area models have been expanded to incorporate spatial and time series information, increase the flexibility and robustness of the linking and variance models, benchmark to large-area direct estimators, and (for unit level models) account for informative sampling designs. The increasing availability of large administrative datasets, sensor and satellite data, and convenience samples has spurred research on how to use these sources ‒ on their own and when integrated with probability samples. We conclude by discussing some frontiers for survey research.

HTML version  PDF version

Comments on “Trends and directions in sample survey theory and methods”

by David Haziza

Abstract

This discussion of the paper by Rao and Lohr focuses on the use of machine learning procedures for estimating finite population parameters. While there is growing interest in these methods within national statistical offices, several areas remain largely unexplored and warrant significant attention in the coming years. In this discussion, I highlight potential topics for future research and development in this rapidly evolving field.

HTML version  PDF version

Comments on “Trends and directions in sample survey theory and methods”

by Jean D. Opsomer, Daifeng Han and Medha Uppala

Abstract

In this discussion, we complement the excellent overview by Profs. Lohr and Rao with some additional topics. The first topic is a call for more recognition of the central role of modeling in survey estimation. The second is a brief discussion of the use of partial frame information in survey design. Finally, we draw the attention to recent increases of synthetic methods, in particular, multilevel regression and poststratification (MRP) in small area estimation applications.

HTML version  PDF version

Comments on “Trends and directions in sample survey theory and methods”

by M. Giovanna Ranalli

Abstract

This discussion examines some advancements in survey design and estimation, inspired by the comprehensive appraisal of Professors Jon Rao and Sharon Lohr on current trends in the field. It delves into three specific areas: balanced sampling, calibration, and small area estimation. Probabilistic balanced sampling methods, such as the cube method and penalized balanced sampling, are explored, with an emphasis on addressing emerging challenges, including extensions to linear mixed models, nonparametric regression models, and spatially balanced designs. Calibration is discussed using a modular framework that incorporates modern regression techniques, and highlights innovative uses of model calibration for data editing and causal inference. Small area estimation is considered in the context of latent variable modeling and data integration, emphasizing its role when the variable(s) of interest cannot be measured either directly or without error. Applications in integrating probability and non-probability data and conducting causal analysis at local level are also discussed.

HTML version  PDF version

Authors’ response to comments on “Trends and directions in sample survey theory and methods”

by J.N.K. Rao and Sharon L. Lohr

Abstract

The discussants highlight promising research topics for improving the quality and granularity of estimates from surveys. We agree that continued research is needed to evaluate models used for inference, and suggest development of measures of model dependence.

HTML version  PDF version

Invited papers

Bridging BigData and sampling methodology: What is big and where is the bridge?

by Fulvia Mecatti

Abstract

BigData users and the BigData research community are expanding rapidly, while statisticians at large are seemingly becoming divided between those who are enthusiastic and those who are concerned, if not downright hostile. Is BigData also a big step ahead, truly advancing our ability to extract meaningful information and actual knowledge from data? Is BigData underplaying traditional statistical inference as we know it, supplanting Survey Methodology as a low-cost futuristic option? In this paper I will attempt to unravel the multifaceted relationship bridging BigData to sampling methodology. Starting by reasoning why it should be interesting to look at BigData from a sampling statistician’s perspective, I will delve deeper into the somewhat ambiguous definition of BigData and share some very personal considerations and views on the matter. In the process, several open questions will arise while discussing a personal selection of insights that are traceable through the vast body of statistical literature around BigData and sampling methodology. The discussion will take various angles explored across nine key points, and it will conclude with a forward-looking perspective on a main challenge for future research: addressing the strong assumptions needed to manage deviations from purely randomized data collection.

HTML version  PDF version

Use of nonprobability samples for official statistics, state of the art

by Danny Pfeffermann and Michael Sverchkov

Abstract

Tightened budgets, continuing decrease of response rates in traditional probability surveys and increasing pressure by users for more timely data, has stimulated research on the use of nonprobability sample data, such as administrative records, web scraping, mobile phone data and voluntary internet surveys, for inference on finite population parameters like means and totals. These data are often easier, faster and cheaper to collect than traditional probability samples. However, a major concern with the use of this kind of data for official statistics is their nonrepresentativeness due to possible selection bias, which if not accounted for properly, could bias the inference. In this article, we review and discuss methods considered in the literature to deal with this problem and propose new methods, distinguishing between methods based on integration of the nonprobability sample with an appropriate probability sample, and methods that base the inference solely on the nonprobability sample. Empirical illustrations, based on simulated data are provided.

HTML version  PDF version

Model-assisted calibration estimation using generalized entropy calibration in survey sampling

by Jae Kwang Kim, Yonghyun Kwon, Yumou Qiu and Junyong Park

Abstract

We introduce a novel approach to model-assisted calibration estimation in survey sampling using generalized entropy. The method builds upon recent work by Kwon, Kim and Qiu (2024) and extends it to a model-assisted framework. Unlike traditional calibration techniques, this approach employs a generalized entropy function as the objective for optimization and incorporates a debiasing calibration constraint to ensure design consistency. The proposed estimator is shown to be asymptotically equivalent to an augmented generalized regression (GREG) estimator. It allows for unequal model variance, potentially improving efficiency when the sampling design is informative. The paper presents both design-based and model-based justifications for the method, along with asymptotic properties and variance estimation techniques. Computational aspects are discussed, including an unconstrained optimization approach that facilitates implementation, especially for high-dimensional auxiliary variables. The method’s performance is evaluated through a simulation study, demonstrating its effectiveness in improving estimation efficiency, particularly when the sampling design is informative.

HTML version  PDF version

sCHAID: A tool for constructing nonresponse adjustment cells under a design-based framework

by Jean D. Opsomer and Minsun K. Riddles

Abstract

Survey practitioners have increasingly embraced the benefits of modern machine learning techniques, including classification and regression tree algorithms, in the development of nonresponse adjustments. These methods, which do not require a predefined functional relationship between outcomes and predictors, offer a practical means of conducting variable selection and deriving interpretable structures that link response propensity with explanatory variables. However, when applying these algorithms to survey data, it is common to overlook crucial factors like sampling weights, as well as sample design features such as stratification and clustering. To bridge this shortcoming, we propose an extension of the Chi-square Automatic Interaction Detector (CHAID) approach, and we describe the design-based asymptotic properties of the resulting “survey CHAID” (sCHAID) method. To facilitate the practical use of sCHAID, we incorporate a Rao-Scott correction into the splitting criterion, accounting for the survey design. Using data from the U.S. American Community Survey, we illustrate the use of the method and evaluate its performance through comparisons with existing weighted and unweighted algorithms.

HTML version  PDF version

Mean squared prediction error estimators of the empirical best linear unbiased predictor of a small area mean under a semi-parametric Fay-Herriot model

by Shijie Chen, Partha Lahiri and J.N.K. Rao

Abstract

In this paper, we derive a second-order unbiased (or nearly unbiased) mean squared prediction error (MSPE) estimator of the empirical best linear unbiased predictor (EBLUP) of a small area mean for a semi-parametric extension to the well-known Fay-Herriot model. Specifically, we derive our MSPE estimator essentially assuming certain moment conditions on both the sampling errors and random effects distributions. The normality-based Prasad-Rao MSPE estimator has a surprising robustness property in that it remains second-order unbiased under the non-normality of random effects when a simple Prasad-Rao method-of-moments estimator is used for the variance component and the sampling error distribution is normal. We show that the normality-based MSPE estimator is no longer second-order unbiased when the sampling error distribution has non-zero kurtosis or when the Fay-Herriot moment method is used to estimate the variance component, even when the sampling error distribution is normal. Interestingly, when the simple method-of moments estimator is used for the variance component, our proposed MSPE estimator does not require the estimation of kurtosis of the random effects. Results of a simulation study on the accuracy of the proposed MSPE estimator, under non-normality of both sampling and random effects distributions, are also presented.

HTML version  PDF version

Imputation of nonignorable missing data in surveys using auxiliary margins via hot deck and sequential imputation

by Yanjiao Yang and Jerome P. Reiter

Abstract

Survey data collection often is plagued by unit and item nonresponse. To reduce reliance on strong assumptions about the missingness mechanisms, statisticians can use information about population marginal distributions known, for example, from censuses or administrative databases. One approach that does so is the Missing Data with Auxiliary Margins, or MD-AM, framework, which uses multiple imputation for both unit and item nonresponse so that survey-weighted estimates accord with the known marginal distributions. However, this framework relies on specifying and estimating a joint distribution for the survey data and nonresponse indicators, which can be computationally and practically daunting in data with many variables of mixed types. We propose two adaptations to the MD-AM framework to simplify the imputation task. First, rather than specifying a joint model for unit respondents’ data, we use random hot deck imputation while still leveraging the known marginal distributions. Second, instead of sampling from conditional distributions implied by the joint model for the missing data due to item nonresponse, we apply multiple imputation by chained equations for item nonresponse before imputation for unit nonresponse. Using simulation studies with nonignorable missingness mechanisms, we demonstrate that the proposed approach can provide more accurate point and interval estimates than models that do not leverage the auxiliary information. We illustrate the approach using data on voter turnout from the U.S. Current Population Survey.

HTML version  PDF version

On the use of machine learning methods for the treatment of unit nonresponse in surveys

by Khaled Larbi, John Tsang, David Haziza and Mehdi Dagdoug

Abstract

In recent years, there has been a significant interest in machine learning in national statistical offices. Thanks to their flexibility, these methods may prove useful at the nonresponse treatment stage. In this article, we conduct an empirical investigation in order to compare several machine learning procedures in terms of bias and efficiency. In addition to the classical machine learning procedures, we assess the performance of ensemble approaches that make use of different machine learning procedures to produce a set of weights adjusted for nonresponse.

HTML version  PDF version

Interviews

A conversation with Dr. Ivan P. Fellegi

by Edward J. Chen and Erin R. Lundy

Abstract

Ivan Fellegi is an expert in statistical science and a public servant who was the Chief Statistician of Canada from 1985 to 2008. This article briefly recounts his early life, long-spanning career and influential research contributions. It includes an interview conducted in February 2017 to mark the 60th year of service of Ivan Fellegi’s career at Statistics Canada.

HTML version  PDF version

A conversation with Geoffrey Hole

by Christian Genest

Abstract

Geoffrey J.C. Hole (or Geoff, as he likes to be called) was born on January 24, 1940 at Shardeloes, Amersham, Buckinghamshire, England, to Charles William Hole and Sybil Winifred Hole, formerly Morge. He completed a BSc Honours in Mathematics in 1961, and a Postgraduate Diploma in Statistics at Manchester University the following year. He started his career as a mathematical statistician in London, England, working successively for the National Coal Board (1962-63), the Central Electricity Generating Board (1963-66), and the Electricity Council (1966-67), where his title was Economist. He moved to Canada in 1967 to join the Dominion Bureau of Statistics (DBS) as a survey methodologist. In 1971-72, he was Chief of Census Operations, Methodology and Quality Control Section, and Assistant Coordinator, Socio-Economic Survey Methods Section. He then took a one-year leave of absence to complete an MSc (Econ) in Statistics at the London School of Economics. In 1973, Geoff returned to the DBS, which had become Statistics Canada, as Chief, Methodology Group V, Business Survey Methods Division. In 1974, he was appointed Director, Institutions and Agriculture Survey Methods Division, and, as of 1986, Director, Business Survey Methods Division. His career culminated when he became Director, Social Survey Methods Division, in 1987. He held that position until his retirement, on September 29, 2004. In addition to his long-term involvement at Statistics Canada, including as a member of the Editorial Board of Survey Methodology between 1983 and 1987, Geoff was very active in the Statistical Society of Canada (SSC), serving among others as Chair of the Program Committee for the 1986 Annual Meeting at the Banff Centre, in Alberta, and President of the SSC in 1989-90. He was also Program Chair for a joint conference of the International Association of Survey Statisticians and the International Association for Official Statistics which was held in Aguascalientes, Mexico, in 1998.

HTML version  PDF version


Date modified: