Survey design

Results

All (334)

All (334) (0 to 10 of 334 results)

1. Comparing two common approaches to within-household sampling: A field experiment in Costa Rica
Articles and reports: 12-001-X202600100004
Description: We test the notion that a quasi-probabilistic method of selecting individuals within households (last birthday, LB) draws in a different sample compared to a non-probabilistic approach that selects respondents according to known parameters on age and gender (frequency matching, FM). With data from an original field experiment, we evaluate fieldwork efficiency (time and completed cases), economy (cost), success in recruiting a representative sample, and differences across a set of attitudinal and behavioral measures. We find that the FM approach performs better on efficiency and cost and achieves a comparable sample; importantly, this comparability extends across measures of personality traits and public opinion. With appropriate caveats, we conclude that researchers’ choice of selection methods should be guided by both theoretical benefits and practical tradeoffs.
Release date: 2026-06-29
2. Master samples with optimized panels
Articles and reports: 12-001-X202600100006
Description: We introduce a general framework for constructing master samples that preserve desirable design properties across panels. The core procedure is to order an initial probability sample. Since the final sequence must be robust to a uniform random rotation, we define and minimize an objective that aggregates panel-level performance across all possible circular panels. A final random rotation is applied to ensure design validity. The framework is flexible with respect to the choice of design criteria, such as spatial balance or marginal balance, and can be implemented efficiently using simulated annealing to obtain high-quality approximate solutions. By construction, the approach supports both positive and negative sample coordination for spatially balanced, marginally balanced, and doubly balanced samples. The method’s versatility is demonstrated through three applications: constructing a master sample with spatially balanced panels, marginally balanced panels, and doubly balanced panels.
Release date: 2026-06-29
3. Spreading response burden in business surveys at Statistics Netherlands: Evaluating sample coordination methods targeting highly burdened businesses
Articles and reports: 12-001-X202600100007
Description: National statistical institutes operate sample coordination systems to spread the response burden in business surveys. Despite the applied sample coordination and monitoring the response burden, some businesses might still be heavily sampled within a short period. This may lead to a peaking response burden for individual businesses, which could affect response rates and response quality. This paper proposes a new sample coordination method based on Adapted Spatially Correlated Poisson (ASCP) sampling that focuses on businesses with a high response burden. The effects on the response burden will be evaluated in two simulation studies and compared with a stratified approach, a pragmatic method in which sampling fractions are manually adjusted and with the baseline method of ignoring the response burden. For the simulations, real-world scenarios and data from Statistics Netherlands are used. The first simulation study considers a practical situation in which a given sample is adjusted with the aim to avoid the occurrence of businesses with a peaking response burden. The second simulation study analyzes the longer-term effects of the different sample coordination methods and focuses both on the reduction and spread of the response burden. The advantages and disadvantages of the different methods will be explained and discussed in detail, and recommendations for applying these methods at national statistical institutes and other survey agencies will be given.
Release date: 2026-06-29
4. Graphical finite population sampling
Articles and reports: 12-001-X202600100008
Description: This paper introduces an innovative and intuitive finite population sampling method that has been developed using a unique graphical framework. In this approach, first-order inclusion probabilities are represented as bars on a two-dimensional graph. By manipulating the positions of these bars, researchers can create a wide range of different sampling designs. This graphical visualization of sampling designs facilitates the exploration of alternative designs and may simplify certain aspects of the implementation compared to traditional mathematical algorithms. This novel approach holds significant promise for tackling complex challenges in sampling, such as achieving an optimal design. By applying a version of the greedy best-first search algorithm to this graphical approach, the potential for integrating intelligent algorithms into finite population sampling is demonstrated.
Release date: 2026-06-29
5. Income Research Paper Series
Journals and periodicals: 75F0002M
Description: This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.
Release date: 2026-05-20
6. Sampling for business surveys at Statistics Canada
Articles and reports: 12-001-X202500200013
Description: This article examines the methodological complexities associated with the design of business surveys, with particular emphasis on sampling strategies implemented by National Statistical Offices (NSOs). It addresses the inherent challenges posed by the dynamic nature of the business population, which necessitates continual updates to the sampling frame to ensure representativeness and relevance. Critical design considerations include the determination of optimal sample sizes, stratification across key dimensions such as industry, geographic region, and enterprise size, as well as the treatment of business births and the exclusion of inactive (or “dead”) units. The article applies Bankier’s (1988) power allocation method to a two-way stratification scheme defined by industry and geography, evaluating its performance by comparing the resulting coefficients of variation with those obtained via a raking algorithm applied to the marginal coefficients. Furthermore, the approach is extended to a multivariate context to accommodate multiple estimation domains. The discussion also encompasses practical issues related to sample rotation and coordination, which are critical for maintaining data quality and minimizing respondent burden over time.
Release date: 2025-12-23
7. Adapting to change: Online first collection initiatives to improve the Labour Force Survey response rate
Articles and reports: 75-005-M2025001
Description: Since 2010, engaging Canadians to participate in the LFS has become more challenging due to a variety of social and technological changes. The decline in the LFS response rate accelerated in 2020, exacerbated by public health measures during the COVID-19 pandemic. This technical paper presents preliminary results of two collection initiatives implemented using an online first strategy to improve the LFS response rates by confirming respondent contact information and expanding the availability of online response. Through these and other planned initiatives, Statistics Canada is working to ensure that the LFS estimates continue to provide an accurate and representative portrait of the Canadian labour market.
Release date: 2025-10-21
8. Improving the Automated Capture of Survey of Household Spending Receipts using advanced Machine Learning Techniques Archived
Articles and reports: 11-522-X202500100004
Description: The Survey of Household Spending (SHS) conducted by Statistics Canada collects paper diaries and shopping receipts as a source of household expenditure data. An auto-capturing algorithm was created for SHS 2023 to reduce statistical clerks' manual work of extracting important information from scanned receipts of common store brands. The algorithm used Tesseract optical character recognition (OCR) to extract text characters from images of receipts, and it identified store and product entities using regular expressions, also known as regex. The goal of this study was to enhance the current auto-capture algorithm by experimenting with more advanced OCR and machine learning methods. As a result, PaddleOCR, an open-source OCR toolkit, was selected as the new default OCR engine due to its overall performance in recognizing texts, especially digits, accurately across receipts of various qualities. Additionally, entity classifiers based on support vector machines were trained on historical SHS records and existing regex patterns. By using classifiers to categorize different elements present on receipts instead of relying solely on regex patterns, product and store recognition improved. It is expected that this new algorithm will be used for SHS 2025 to improve the auto-capture quality and reduce the manual burden associated with capturing receipt variables.
Release date: 2025-09-08
9. Data-driven Imputation Strategies and their Associated Quality Indicators in Economic Surveys Archived
Articles and reports: 11-522-X202500100011
Description: The use of modern "data"-driven imputation methods to treat non-response in the context of surveys processed in the Integrated Business Statistics Program at Statistics Canada has previously been explored. It was observed that these methods can lead to high quality imputation and further have the potential to result in broad efficiencies when setting up a particular survey's edit and imputation strategy. However, estimation of the associated total variance, more specifically the component due to imputation, remains a challenge. In this article, two methods for estimation of total variance are proposed and show preliminary results that have motivated us to pursue further research in this area.
Release date: 2025-09-08
10. Ahead of the Trends: J.N.K. Rao's Contributions to Survey Research Archived
Articles and reports: 11-522-X202500100029
Description: J.N.K. Rao has contributed to almost every subdiscipline of survey research, including unequal-probability and two-phase sampling, variance estimation, regression and categorical data analysis, small area estimation, and data integration. For each of these topics, Rao's work anticipated and led future research directions. His contributions will be discussed in the context of broader research trends as seen in the articles of Survey Methodology over the journal's 50-year history.
Release date: 2025-09-08

Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (305)

Analysis (305) (30 to 40 of 305 results)

31. Multiple-frame surveys for a multiple-data-source world
Articles and reports: 12-001-X202100200008
Description:
Multiple-frame surveys, in which independent probability samples are selected from each of Q sampling frames, have long been used to improve coverage, to reduce costs, or to increase sample sizes for subpopulations of interest. Much of the theory has been developed assuming that (1) the union of the frames covers the population of interest, (2) a full-response probability sample is selected from each frame, (3) the variables of interest are measured in each sample with no measurement error, and (4) sufficient information exists to account for frame overlap when computing estimates. After reviewing design, estimation, and calibration for traditional multiple-frame surveys, I consider modifications of the assumptions that allow a multiple-frame structure to serve as an organizing principle for other data combination methods such as mass imputation, sample matching, small area estimation, and capture-recapture estimation. Finally, I discuss how results from multiple-frame survey research can be used when designing and evaluating data collection systems that integrate multiple sources of data.
Release date: 2022-01-06
32. Growing Regression Trees that Use Sampling Frame Covariates to Explore Response Burden for Use in Survey Design Archived
Articles and reports: 11-522-X202100100024
Description: The Economic Directorate of the U.S. Census Bureau is developing coordinated design and sample selection procedures for the Annual Integrated Economic Survey. The unified sample will replace the directorate’s existing practice of independently developing sampling frames and sampling procedures for a suite of separate annual surveys, which optimizes sample design features at the cost of increased response burden. Size attributes of business populations, e.g., revenues and employment, are highly skewed. A high percentage of companies operate in more than one industry. Therefore, many companies are sampled into multiple surveys compounding the response burden, especially for “medium sized” companies.
This component of response burden is reduced by selecting a single coordinated sample but will not be completely alleviated. Response burden is a function of several factors, including (1) questionnaire length and complexity, (2) accessibility of data, (3) expected number of repeated measures, and (4) frequency of collection. The sample design can have profound effects on the third and fourth factors. To help inform decisions about the integrated sample design, we use regression trees to identify covariates from the sampling frame that are related to response burden. Using historic frame and response data from four independently sampled surveys, we test a variety of algorithms, then grow regression trees that explain relationships between expected levels of response burden (as measured by response rate) and frame covariates common to more than one survey. We validate initial findings by cross-validation, examining results over time. Finally, we make recommendations on how to incorporate our robust findings into the coordinated sample design.
Release date: 2021-10-29
33. Physician experiences during the COVID-19 pandemic in the United States: Adapting an annual survey to assess pandemic-related challenges Archived
Articles and reports: 11-522-X202100100007
Description: The National Center for Health Statistics (NCHS) annually administers the National Ambulatory Medical Care Survey (NAMCS) to assess practice characteristics and ambulatory care provided by office-based physicians in the United States, including interviews with sampled physicians. After the onset of the COVID-19 pandemic, NCHS adapted NAMCS methodology to assess the impacts of COVID-19 on office-based physicians, including: shortages of personal protective equipment; COVID-19 testing in physician offices; providers testing positive for COVID-19; and telemedicine use during the pandemic. This paper describes challenges and opportunities in administering the 2020 NAMCS and presents key findings regarding physician experiences during the COVID-19 pandemic.

Key Words: National Ambulatory Medical Care Survey (NAMCS); Office-based physicians; Telemedicine; Personal protective equipment.
Release date: 2021-10-22
34. Harnessing Natural Language Processing and Machine Learning to Enhance Identification of Opioid-involved Health Outcomes in the National Hospital Care Survey Archived
Articles and reports: 11-522-X202100100016
Description: To build data capacity and address the U.S. opioid public health emergency, the National Center for Health Statistics received funding for two projects. The projects involve development of algorithms that use all available structured and unstructured data submitted for the 2016 National Hospital Care Survey (NHCS) to enhance identification of opioid-involvement and the presence of co-occurring disorders (coexistence of a substance use disorder and a mental health issue). A description of the algorithm development process is provided, and lessons learned from integrating data science methods like natural language processing to produce official statistics are presented. Efforts to make the algorithms and analytic datafiles accessible to researchers are also discussed.

Key Words: Opioids; Co-Occurring Disorders; Data Science; Natural Language Processing; Hospital Care
Release date: 2021-10-22
35. A method to find an efficient and robust sampling strategy under model uncertainty Archived
Articles and reports: 12-001-X202100100002
Description:
We consider the problem of deciding on sampling strategy, in particular sampling design. We propose a risk measure, whose minimizing value guides the choice. The method makes use of a superpopulation model and takes into account uncertainty about its parameters through a prior distribution. The method is illustrated with a real dataset, yielding satisfactory results. As a baseline, we use the strategy that couples probability proportional-to-size sampling with the difference estimator, as it is known to be optimal when the superpopulation model is fully known. We show that, even under moderate misspecifications of the model, this strategy is not robust and can be outperformed by some alternatives.
Release date: 2021-06-24
36. Probability-proportional-to-size ranked-set sampling from stratified populations Archived
Articles and reports: 12-001-X202000200001
Description:
This paper constructs a probability-proportional-to-size (PPS) ranked-set sample from a stratified population. A PPS-ranked-set sample partitions the units in a PPS sample into groups of similar observations. The construction of similar groups relies on relative positions (ranks) of units in small comparison sets. Hence, the ranks induce more structure (stratification) in the sample in addition to the data structure created by unequal selection probabilities in a PPS sample. This added data structure makes the PPS-ranked-set sample more informative then a PPS-sample. The stratified PPS-ranked-set sample is constructed by selecting a PPS-ranked-set sample from each stratum population. The paper constructs unbiased estimators for the population mean, total and their variances. The new sampling design is applied to apple production data to estimate the total apple production in Turkey.
Release date: 2020-12-15
37. Local polynomial estimation for a small area mean under informative sampling Archived
Articles and reports: 12-001-X202000100002
Description:
Model-based methods are required to estimate small area parameters of interest, such as totals and means, when traditional direct estimation methods cannot provide adequate precision. Unit level and area level models are the most commonly used ones in practice. In the case of the unit level model, efficient model-based estimators can be obtained if the sample design is such that the sample and population models coincide: that is, the sampling design is non-informative for the model. If on the other hand, the sampling design is informative for the model, the selection probabilities will be related to the variable of interest, even after conditioning on the available auxiliary data. This will imply that the population model no longer holds for the sample. Pfeffermann and Sverchkov (2007) used the relationships between the population and sample distribution of the study variable to obtain approximately unbiased semi-parametric predictors of the area means under informative sampling schemes. Their procedure is valid for both sampled and non-sampled areas.
Release date: 2020-06-30
38. Considering interviewer and design effects when planning sample sizes Archived
Articles and reports: 12-001-X202000100005
Description:
Selecting the right sample size is central to ensure the quality of a survey. The state of the art is to account for complex sampling designs by calculating effective sample sizes. These effective sample sizes are determined using the design effect of central variables of interest. However, in face-to-face surveys empirical estimates of design effects are often suspected to be conflated with the impact of the interviewers. This typically leads to an over-estimation of design effects and consequently risks misallocating resources towards a higher sample size instead of using more interviewers or improving measurement accuracy. Therefore, we propose a corrected design effect that separates the interviewer effect from the effects of the sampling design on the sampling variance. The ability to estimate the corrected design effect is tested using a simulation study. In this respect, we address disentangling cluster and interviewer variance. Corrected design effects are estimated for data from the European Social Survey (ESS) round 6 and compared with conventional design effect estimates. Furthermore, we show that for some countries in the ESS round 6 the estimates of conventional design effect are indeed strongly inflated by interviewer effects.
Release date: 2020-06-30
39. Robust variance estimators for generalized regression estimators in cluster samples Archived
Articles and reports: 12-001-X201900300001
Description:
Standard linearization estimators of the variance of the general regression estimator are often too small, leading to confidence intervals that do not cover at the desired rate. Hat matrix adjustments can be used in two-stage sampling that help remedy this problem. We present theory for several new variance estimators and compare them to standard estimators in a series of simulations. The proposed estimators correct negative biases and improve confidence interval coverage rates in a variety of situations that mirror ones that are met in practice.
Release date: 2019-12-17
40. Cost optimal sampling for the integrated observation of different populations Archived
Articles and reports: 12-001-X201900300004
Description:
Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.
Release date: 2019-12-17

Reference (29)

Reference (29) (0 to 10 of 29 results)

1. 2019 Census Content Test: Design and methodology Archived
Surveys and statistical programs – Documentation: 98-20-00012020020
Description:
This fact sheet provides detailed insight into the design and methodology of the content test component of the 2019 Census Test. This test evaluated changes to the wording and flow of some questions, as well as the potential addition of new questions, to help determine the content of the 2021 Census of Population.

Release date: 2020-07-20
2. Use of Administrative Data to Increase the Efficiency of the Sample Design for the New National Travel Survey Archived
Surveys and statistical programs – Documentation: 11-522-X201700014749
Description:
As part of the Tourism Statistics Program redesign, Statistics Canada is developing the National Travel Survey (NTS) to collect travel information from Canadian travellers. This new survey will replace the Travel Survey of Residents of Canada and the Canadian resident component of the International Travel Survey. The NTS will take advantage of Statistics Canada’s common sampling frames and common processing tools while maximizing the use of administrative data. This paper discusses the potential uses of administrative data such as Passport Canada files, Canada Border Service Agency files and Canada Revenue Agency files, to increase the efficiency of the NTS sample design.
Release date: 2016-03-24
3. The General Social Survey: New Data Overview Archived
Surveys and statistical programs – Documentation: 89-631-X
Description:
This report highlights the latest developments and rationale behind recent cycles of the General Social Survey (GSS). Starting with an overview of the GSS mandate and historic cycle topics, we then focus on two recent cycles related to families in Canada: Family Transitions (2006) and Family, Social Support and Retirement (2007). Finally, we give a summary of what is to come in the 2008 GSS on Social Networks, and describe a special project to mark 'Twenty Years of GSS'.
The survey collects data over a twelve month period from the population living in private households in the 10 provinces. For all cycles except Cycles 16 and 21, the population aged 15 and older has been sampled. Cycles 16 and 21 sampled persons aged 45 and older.
Cycle 20 (GSS 2006) is the fourth cycle of the GSS to collect data on families (the first three cycles on the family were in 1990, 1995 and 2001). Cycle 20 covers much the same content as previous cycles on families with some sections revised and expanded. The data enable analysts to measure conjugal and fertility history (chronology of marriages, common-law unions, and children), family origins, children's home leaving, fertility intentions, child custody as well as work history and other socioeconomic characteristics. Questions on financial support agreements or arrangements (for children and the ex-spouse or ex-partner) for separated and divorced families have been modified. Also, sections on social networks, well-being and housing characteristics have been added.
Release date: 2008-05-27
4. Content of the Survey of Labour and Income Dynamics Part B - Income and Wealth Content Archived
Surveys and statistical programs – Documentation: 75F0002M1992001
Description:
Starting in 1994, the Survey of Labour and Income Dynamics (SLID) will follow individuals and families for at least six years, tracking their labour market experiences, changes in income and family circumstances. An initial proposal for the content of SLID, entitled "Content of the Survey of Labour and Income Dynamics : Discussion Paper", was distributed in February 1992.
That paper served as a background document for consultation with and a review by interested users. The content underwent significant change during this process. Based upon the revised content, a large-scale test of SLID will be conducted in February and May 1993.
The present document outlines the income and wealth content to be tested in May 1993. This document is really a continuation of SLID Research Paper Series 92-01A, which outlines the demographic and labour content used in the January /February 1993 test.
Release date: 2008-02-29
5. Objectives and Content of the Preliminary Interview Archived
Surveys and statistical programs – Documentation: 75F0002M1992007
Description:
A Preliminary Interview will be conducted on the first panel of SLID, in January 1993, as a supplement to the Labour Force Survey. The first panel is made up of about 20,000 households that are rotating out of the Labour Force Survey in January and February, 1993.
The purpose of this document is to provide a description of the purpose of the SLID Preliminary Interview and the question wordings to be used.
Release date: 2008-02-29
6. Environment Surveys of Establishments: The Canadian Experience Archived
Surveys and statistical programs – Documentation: 16-001-M2007004
Description:
Statistics Canada administers a number of environmental surveys that fill important data gaps but also pose numerous challenges to administer. This paper focuses on two on-going environment surveys - one newly initiated and one in the process of a redesign.
Release date: 2007-11-23
7. Recent Changes in Geography Content in the Survey of Labour and Income Dynamics (SLID) Archived
Surveys and statistical programs – Documentation: 75F0002M2005002
Description:
This paper describes the changes made to the structure of geography information on SLID from reference year 1999 onwards. It goes into reasons for changing to the 2001 Census-based geography, shows how the overlap between the 1991 and 2001 Census-based concepts are handled, provides detail on how the geographic concepts are implemented, discusses a new imputation procedure and finishes with an illustration of the impact of these changes on selected tables.
Release date: 2005-03-31
8. Improvements in 2005 to the Labour Force Survey (LFS) Archived
Surveys and statistical programs – Documentation: 71F0031X2005002
Description:
This paper introduces and explains modifications made to the Labour Force Survey estimates in January 2005. Some of these modifications include the adjustment of all LFS estimates to reflect population counts based on the 2001 Census, updates to industry and occupation classification systems and sample redesign changes.
Release date: 2005-01-26
9. Entry-Exit Component of Labour Interview for January 2003 and Income Interview for May 2003: Survey of Labour and Income Dynamics Archived
Surveys and statistical programs – Documentation: 75F0002M2004006
Description:
This document presents information about the entry-exit portion of the annual labour and the income interviews of the Survey of Labour and Income Dynamics (SLID).
Release date: 2004-06-21
10. Issues in the Design of Canada's Adult Education and Training Survey Archived
Surveys and statistical programs – Documentation: 81-595-M2003009
Geography: Canada
Description:
This paper examines how the Canadian Adult Education and Training Survey (AETS) can be used to study participation in and impacts of education and training activities for adults.
Release date: 2003-10-15

Date modified:: 2026-07-19