Survey design

Results

All (330)

All (330) (0 to 10 of 330 results)

1. Income Research Paper Series
Journals and periodicals: 75F0002M
Description: This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.
Release date: 2026-05-20
2. Sampling for business surveys at Statistics Canada
Articles and reports: 12-001-X202500200013
Description: This article examines the methodological complexities associated with the design of business surveys, with particular emphasis on sampling strategies implemented by National Statistical Offices (NSOs). It addresses the inherent challenges posed by the dynamic nature of the business population, which necessitates continual updates to the sampling frame to ensure representativeness and relevance. Critical design considerations include the determination of optimal sample sizes, stratification across key dimensions such as industry, geographic region, and enterprise size, as well as the treatment of business births and the exclusion of inactive (or “dead”) units. The article applies Bankier’s (1988) power allocation method to a two-way stratification scheme defined by industry and geography, evaluating its performance by comparing the resulting coefficients of variation with those obtained via a raking algorithm applied to the marginal coefficients. Furthermore, the approach is extended to a multivariate context to accommodate multiple estimation domains. The discussion also encompasses practical issues related to sample rotation and coordination, which are critical for maintaining data quality and minimizing respondent burden over time.
Release date: 2025-12-23
3. Adapting to change: Online first collection initiatives to improve the Labour Force Survey response rate
Articles and reports: 75-005-M2025001
Description: Since 2010, engaging Canadians to participate in the LFS has become more challenging due to a variety of social and technological changes. The decline in the LFS response rate accelerated in 2020, exacerbated by public health measures during the COVID-19 pandemic. This technical paper presents preliminary results of two collection initiatives implemented using an online first strategy to improve the LFS response rates by confirming respondent contact information and expanding the availability of online response. Through these and other planned initiatives, Statistics Canada is working to ensure that the LFS estimates continue to provide an accurate and representative portrait of the Canadian labour market.
Release date: 2025-10-21
4. Improving the Automated Capture of Survey of Household Spending Receipts using advanced Machine Learning Techniques Archived
Articles and reports: 11-522-X202500100004
Description: The Survey of Household Spending (SHS) conducted by Statistics Canada collects paper diaries and shopping receipts as a source of household expenditure data. An auto-capturing algorithm was created for SHS 2023 to reduce statistical clerks' manual work of extracting important information from scanned receipts of common store brands. The algorithm used Tesseract optical character recognition (OCR) to extract text characters from images of receipts, and it identified store and product entities using regular expressions, also known as regex. The goal of this study was to enhance the current auto-capture algorithm by experimenting with more advanced OCR and machine learning methods. As a result, PaddleOCR, an open-source OCR toolkit, was selected as the new default OCR engine due to its overall performance in recognizing texts, especially digits, accurately across receipts of various qualities. Additionally, entity classifiers based on support vector machines were trained on historical SHS records and existing regex patterns. By using classifiers to categorize different elements present on receipts instead of relying solely on regex patterns, product and store recognition improved. It is expected that this new algorithm will be used for SHS 2025 to improve the auto-capture quality and reduce the manual burden associated with capturing receipt variables.
Release date: 2025-09-08
5. Data-driven Imputation Strategies and their Associated Quality Indicators in Economic Surveys Archived
Articles and reports: 11-522-X202500100011
Description: The use of modern "data"-driven imputation methods to treat non-response in the context of surveys processed in the Integrated Business Statistics Program at Statistics Canada has previously been explored. It was observed that these methods can lead to high quality imputation and further have the potential to result in broad efficiencies when setting up a particular survey's edit and imputation strategy. However, estimation of the associated total variance, more specifically the component due to imputation, remains a challenge. In this article, two methods for estimation of total variance are proposed and show preliminary results that have motivated us to pursue further research in this area.
Release date: 2025-09-08
6. Ahead of the Trends: J.N.K. Rao's Contributions to Survey Research Archived
Articles and reports: 11-522-X202500100029
Description: J.N.K. Rao has contributed to almost every subdiscipline of survey research, including unequal-probability and two-phase sampling, variance estimation, regression and categorical data analysis, small area estimation, and data integration. For each of these topics, Rao's work anticipated and led future research directions. His contributions will be discussed in the context of broader research trends as seen in the articles of Survey Methodology over the journal's 50-year history.
Release date: 2025-09-08
7. Contributions of J.N.K. Rao to Complex Survey Multilevel Models and Composite Likelihood Archived
Articles and reports: 11-522-X202500100030
Description: In the setting of multilevel models to be estimated using data from surveys with complex sampling designs, this paper outlines some contributions of the landmark paper by Rao, Verret and Hidiroglou (Survey Methodology, 2013) and subsequent related work.
Release date: 2025-09-08
8. Propensity Score Estimation and Optimal Sampling Design when Integrating Probability Samples with Non-probability Data Archived
Articles and reports: 11-522-X202500100032
Description: Although non-probability data sources are not new to official statistics, a revived interest in the topic has emerged from pressures due to falling survey response rates, increasing data collection costs and a desire to take advantage of new data source opportunities from the ongoing societal digitalisation. Due to the exclusion of certain segments of the target population, inference derived solely from a non-probability data source is likely to result in bias. This work approaches the challenge of addressing the bias by integrating non-probability data with reference probability samples. The focus will be on methods to model the propensity of inclusion in the non-probability dataset with the help of the accompanying reference sample, with the modelled propensities then applied in an inverse probability weighting approach to produce population estimates. The reference sample is sometimes assumed as given. In this presentation however, an objective of finding an optimal strategy will be pursued that is, the combination of a data integration-based estimator and sample design for the reference probability sample. Recent work is discussed in which advantage is taken of the good unit identification possibilities in business surveys to study an estimator based on propensities and derive optimal (unequal) selection probabilities for the reference sample.
Release date: 2025-09-08
9. Including Non-binary Gender in the Calibration Strategy for the Canadian Long-Form Sample Survey Weights Archived
Articles and reports: 11-522-X202500100033
Description: Aligning with recent needs for increased disaggregated data, in 2021 Canada became the first country to collect and disseminate data on gender diversity in a national census giving Canadians the option to select male, female, or non-binary. Due to their small size, non-binary population counts were not used in the 2021 Census long-form sample calibration procedure due to the risk of increasing the variance of estimates. This paper presents an alternative long-form calibration strategy which allows for small populations, such as the non-binary group, to be incorporated while mitigating methodological concerns. The strategy put forward can incorporate multiple small populations simultaneously while also being flexible enough to fit the calibration systems of other National Statistical Offices (NSOs). The results of a Monte Carlo (MC) simulation are presented showing improved data quality for the non-binary population under the alternative calibration strategy.
Release date: 2025-09-08
10. Authors’ response to comments on “Trends and directions in sample survey theory and methods”
Articles and reports: 12-001-X202500100010
Description: The discussants highlight promising research topics for improving the quality and granularity of estimates from surveys. We agree that continued research is needed to evaluate models used for inference, and suggest development of measures of model dependence.
Release date: 2025-06-30

Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (301)

Analysis (301) (60 to 70 of 301 results)

61. Requirement: Collect less. Our mission: Do the best we can. Archived
Articles and reports: 11-522-X201300014276
Description:
In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.
Release date: 2014-10-31
62. Study of the “product” sampling scheme as illustrated by the ELFE survey Archived
Articles and reports: 11-522-X201300014286
Description: The Étude Longitudinale Française depuis l’Enfance (ELFE) [French longitudinal study from childhood on], which began in 2011, involves over 18,300 infants whose parents agreed to participate when they were in the maternity hospital. This cohort survey, which will track the children from birth to adulthood, covers the many aspects of their lives from the perspective of social science, health and environmental health. In randomly selected maternity hospitals, all infants in the target population, who were born on one of 25 days distributed across the four seasons, were chosen. This sample is the outcome of a non-standard sampling scheme that we call product sampling. In this survey, it takes the form of the cross-tabulation between two independent samples: a sampling of maternity hospitals and a sampling of days. While it is easy to imagine a cluster effect due to the sampling of maternity hospitals, one can also imagine a cluster effect due to the sampling of days. The scheme’s time dimension therefore cannot be ignored if the desired estimates are subject to daily or seasonal variation. While this non-standard scheme can be viewed as a particular kind of two-phase design, it needs to be defined within a more specific framework. Following a comparison of the product scheme with a conventional two-stage design, we propose variance estimators specially formulated for this sampling scheme. Our ideas are illustrated with a simulation study.
Release date: 2014-10-31
63. Weighted estimation and bootstrap variance estimation for analyzing survey data: How to implement in selected software Archived
Articles and reports: 12-002-X201400111901
Description:
This document is for analysts/researchers who are considering doing research with data from a survey where both survey weights and bootstrap weights are provided in the data files. This document gives directions, for some selected software packages, about how to get started in using survey weights and bootstrap weights for an analysis of survey data. We give brief directions for obtaining survey-weighted estimates, bootstrap variance estimates (and other desired error quantities) and some typical test statistics for each software package in turn. While these directions are provided just for the chosen examples, there will be information about the range of weighted and bootstrapped analyses that can be carried out by each software package.
Release date: 2014-08-07
64. The influence of sampling method and interviewers on sample realization in the European Social Survey Archived
Articles and reports: 12-001-X201400114001
Description:
This article addresses the impact of different sampling procedures on realised sample quality in the case of probability samples. This impact was expected to result from varying degrees of freedom on the part of interviewers to interview easily available or cooperative individuals (thus producing substitutions). The analysis was conducted in a cross-cultural context using data from the first four rounds of the European Social Survey (ESS). Substitutions are measured as deviations from a 50/50 gender ratio in subsamples with heterosexual couples. Significant deviations were found in numerous countries of the ESS. They were also found to be lowest in cases of samples with official registers of residents as sample frame (individual person register samples) if one partner was more difficult to contact than the other. This scope of substitutions did not differ across the ESS rounds and it was weakly correlated with payment and control procedures. It can be concluded from the results that individual person register samples are associated with higher sample quality.
Release date: 2014-06-27
65. A nonparametric method to generate synthetic populations to adjust for complex sampling design features Archived
Articles and reports: 12-001-X201400114003
Description:
Outside of the survey sampling literature, samples are often assumed to be generated by simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.
Release date: 2014-06-27
66. Joint determination of optimal stratification and sample allocation using genetic algorithm Archived
Articles and reports: 12-001-X201300211884
Description:
This paper offers a solution to the problem of finding the optimal stratification of the available population frame, so as to ensure the minimization of the cost of the sample required to satisfy precision constraints on a set of different target estimates. The solution is searched by exploring the universe of all possible stratifications obtainable by cross-classifying the categorical auxiliary variables available in the frame (continuous auxiliary variables can be transformed into categorical ones by means of suitable methods). Therefore, the followed approach is multivariate with respect to both target and auxiliary variables. The proposed algorithm is based on a non deterministic evolutionary approach, making use of the genetic algorithm paradigm. The key feature of the algorithm is in considering each possible stratification as an individual subject to evolution, whose fitness is given by the cost of the associated sample required to satisfy a set of precision constraints, the cost being calculated by applying the Bethel algorithm for multivariate allocation. This optimal stratification algorithm, implemented in an R package (SamplingStrata), has been so far applied to a number of current surveys in the Italian National Institute of Statistics: the obtained results always show significant improvements in the efficiency of the samples obtained, with respect to previously adopted stratifications.
Release date: 2014-01-15
67. Optimizing quality of response through adaptive survey designs Archived
Articles and reports: 12-001-X201300111824
Description:
In most surveys all sample units receive the same treatment and the same design features apply to all selected people and households. In this paper, it is explained how survey designs may be tailored to optimize quality given constraints on costs. Such designs are called adaptive survey designs. The basic ingredients of such designs are introduced, discussed and illustrated with various examples.
Release date: 2013-06-28
68. Indirect sampling applied to skewed populations Archived
Articles and reports: 12-001-X201300111829
Description:
Indirect Sampling is used when the sampling frame is not the same as the target population, but related to the latter. The estimation process for Indirect Sampling is carried out using the Generalised Weight Share Method (GWSM), which is an unbiased procedure (see Lavallée 2002, 2007). For business surveys, Indirect Sampling is applied as follows: the sampling frame is one of establishments, while the target population is one of enterprises. Enterprises are selected through their establishments. This allows stratifying according to the establishment characteristics, rather than those associated with enterprises. Because the variables of interest of establishments are generally highly skewed (a small portion of the establishments covers the major portion of the economy), the GWSM results in unbiased estimates, but their variance can be large. The purpose of this paper is to suggest some adjustments to the weights to reduce the variance of the estimates in the context of skewed populations, while keeping the method unbiased. After a brief overview of Indirect Sampling and the GWSM, we describe the required adjustments to the GWSM. The estimates produced with these adjustments are compared to those from the original GWSM, via a small numerical example, and using real data originating from the Statistics Canada's Business Register.
Release date: 2013-06-28
69. On sample allocation for efficient domain estimation Archived
Articles and reports: 12-001-X201200111682
Description:
Sample allocation issues are studied in the context of estimating sub-population (stratum or domain) means as well as the aggregate population mean under stratified simple random sampling. A non-linear programming method is used to obtain "optimal" sample allocation to strata that minimizes the total sample size subject to specified tolerances on the coefficient of variation of the estimators of strata means and the population mean. The resulting total sample size is then used to determine sample allocations for the methods of Costa, Satorra and Ventura (2004) based on compromise allocation and Longford (2006) based on specified "inferential priorities". In addition, we study sample allocation to strata when reliability requirements for domains, cutting across strata, are also specified. Performance of the three methods is studied using data from Statistics Canada's Monthly Retail Trade Survey (MRTS) of single establishments.
Release date: 2012-06-27
70. Alternative demographic sample designs being explored at the U.S. Census Bureau Archived
Articles and reports: 12-001-X201100211606
Description:
This paper introduces a U.S. Census Bureau special compilation by presenting four other papers of the current issue: three papers from authors Tillé, Lohr and Thompson as well as a discussion paper from Opsomer.
Release date: 2011-12-21

Reference (29)

Reference (29) (10 to 20 of 29 results)

11. Entry Exit Component for Income, May 2002: Survey of Labour and Income Dynamics Archived
Surveys and statistical programs – Documentation: 75F0002M2003004
Description:
This paper presents the information for the Entry Exit portion of the Survey of Labour and Income Dynamics (SLID) Income interview.
Release date: 2003-09-09
12. The EU Labour Force Survey : on the way to convergence and quality Archived
Surveys and statistical programs – Documentation: 11-522-X20010016225
Description:
The European Union Labour Forces Survey (LFS) is based on national surveys that were originally very different. For the past decade, under pressure from increasingly demanding users (particularly with respect to timeliness, comparability and flexibility), the LFS has been subjected to a constant process of quality improvement.
The following topics are presented in this paper:A. the quality improvement process, which comprises screening national survey methods, target structure, legal foundations, quality reports, more accurate and more explicit definitions of components, etc.;B. expected or achieved results, which include an ongoing survey producing quarterly results within reasonable time frames, comparable employment and unemployment rates over time and space in more than 25 countries, specific information on current political topics, etc.;C. continuing shortcomings, such as implementation delays in certain countries, possibilities of longitudinal analysis, public access to microdata, etc.; D. future tasks envisioned, such as adaptation of the list of ISCO and ISCED variables and nomenclatures (to take into account evolution in employment and teaching methods), differential treatment of structural variables and increased recourse to administrative files (to limit respondent burden), harmonization of questionnaires, etc.
Release date: 2002-09-12
13. Nonresponse bias analyses at the National Center for Education Statistics Archived
Surveys and statistical programs – Documentation: 11-522-X20010016269
Description:
This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.
In surveys with low response rates, non-response bias can be a major concern. While it is not always possible to measure the actual bias due to non-response, there are different approaches that help identify potential sources of non-response bias. In the National Center for Education Statistics (NCES), surveys with a response rate lower than 70% must conduct a non-response bias analysis. This paper discusses the different approaches to non-response bias analyses using examples from NCES.
Release date: 2002-09-12
14. Summit of the Americas Regional Education Indicators Project: data quality challenges Archived
Surveys and statistical programs – Documentation: 11-522-X20010016293
Description:
This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.
This paper presents the Second Summit of the Americas Regional Education Indicators Project (PRIE), whose basic goal is to develop a set of comparable indicators for the Americas. This project is led by the Ministry of Education of Chile and has been developed in response to the countries' needs to improve their information systems and statistics. The countries need to construct reliable and relevant indicators to support decisions in education, both within their individual countries and the region as a whole. The first part of the paper analyses the importance of statistics and indicators in supporting educational policies and programs, and describes the present state of the information and statistics systems in these countries. It also discusses the major problems faced by the countries and reviews the countries' experiences in participating in other education indicators' projects or programs, such as the INES Program, WEI Project, MERCOSUR and CREMIS. The second part of the paper examines PRIE's technical co-operation program, its purpose and implementation. The second part also emphasizes how technical co-operation responds to the needs of the countries, and supports them in filling in the gaps in available and reliable data.
Release date: 2002-09-12
15. Response error reinterview of the 1999-2000 Schools and Staffing Survey Archived
Surveys and statistical programs – Documentation: 11-522-X20010016308
Description:
This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.
The Census Bureau uses response error analysis to evaluate the effectiveness of survey questions. For a given survey, questions that are deemed critical to the survey or considered problematic from past examination are selected for analysis. New or revised questions are prime candidates for re-interview. Re-interview is a new interview where a subset of questions from the original interview are re-asked to a sample of the survey respondents. For each re-interview question, the proportion of respondents who give inconsistent responses is evaluated. The "Index of Inconsistency" is used as the measure of response variance. Each question is labelled low, moderate, or high in response variance. In high response variance cases, the questions are put through cognitive testing, and modifications to the question are recommended.
The Schools and Staffing Survey (SASS) sponsored by The National Center for Education Statistics (NCES), is also investigated for response error analysis and the possible relationships between inconsistent responses and characteristics of the schools and teachers in that survey. Results of this analysis can be used to change survey procedures and improve data quality.
Release date: 2002-09-12
16. Entry Exit Component for Income Interview: May 2000, Survey of Labour and Income Dynamics Archived
Surveys and statistical programs – Documentation: 75F0002M2000012
Description:
This document presents the information for the new entry exit portion of the Survey of Labour and Income Dynamics (SLID) income interview.
Release date: 2001-03-27
17. Note to Former Users of Data from the Household Facilities and Equipment Survey Archived
Surveys and statistical programs – Documentation: 62F0026M2000003
Description:
Starting with the 1997 survey year, the Household Facilities and Equipment Survey was replaced by the Survey of Household Spending (SHS). This note provides information to users and prospective users of data from the SHS about the differences between the SHS and the former Household Facilities and Equipment Survey. Topics covered include sample size, weighting, collection method, reference period, and concepts.
Release date: 2000-07-19
18. Sampling and Weighting (Reference Products: Technical Reports: 1996 Census of Population) Archived
Surveys and statistical programs – Documentation: 92-371-X
Description:
This report deals with sampling and weighting, a process whereby certain characteristics are collected and processed for a random sample of dwellings and persons identified in the complete census enumeration. Data for the whole population are then obtained by scaling up the results for the sample to the full population level. The use of sampling may lead to substantial reductions in costs and respondent burden, or alternatively, can allow the scope of a census to be broadened at the same cost.
Release date: 1999-12-07
19. Random effects models for longitudinal data from complex samples Archived
Surveys and statistical programs – Documentation: 11-522-X19980015017
Description:
Longitudinal studies with repeated observations on individuals permit better characterizations of change and assessment of possible risk factors, but there has been little experience applying sophisticated models for longitudinal data to the complex survey setting. We present results from a comparison of different variance estimation methods for random effects models of change in cognitive function among older adults. The sample design is a stratified sample of people 65 and older, drawn as part of a community-based study designed to examine risk factors for dementia. The model summarizes the population heterogeneity in overall level and rate of change in cognitive function using random effects for intercept and slope. We discuss an unweighted regression including covariates for the stratification variables, a weighted regression, and bootstrapping; we also did preliminary work into using balanced repeated replication and jackknife repeated replication.
Release date: 1999-10-22
20. Estimation of gross flows from complex surveys adjusting for missing data, classification errors and informative sampling Archived
Surveys and statistical programs – Documentation: 11-522-X19980015022
Description:
This article extends and further develops the method proposed by Pfeffermann, Skinner and Humphreys (1998) for the estimation of gross flows in the presence of classification errors. The main feature of that method is the use of auxiliary information at the individual level which circumvents the need for validation data for estimating the misclassification rates. The new developments in this article are the establishment of conditions for model identification, a study of the properties of a model goodness of fit statistic and modifications to the sample likelihood to account for missing data and informative sampling. The new developments are illustrated by a small Monte-Carlo simulation study.
Release date: 1999-10-22

Date modified:: 2026-05-30