Survey design

Results

All (330)

All (330) (0 to 10 of 330 results)

1. Income Research Paper Series
Journals and periodicals: 75F0002M
Description: This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.
Release date: 2026-05-20
2. Sampling for business surveys at Statistics Canada
Articles and reports: 12-001-X202500200013
Description: This article examines the methodological complexities associated with the design of business surveys, with particular emphasis on sampling strategies implemented by National Statistical Offices (NSOs). It addresses the inherent challenges posed by the dynamic nature of the business population, which necessitates continual updates to the sampling frame to ensure representativeness and relevance. Critical design considerations include the determination of optimal sample sizes, stratification across key dimensions such as industry, geographic region, and enterprise size, as well as the treatment of business births and the exclusion of inactive (or “dead”) units. The article applies Bankier’s (1988) power allocation method to a two-way stratification scheme defined by industry and geography, evaluating its performance by comparing the resulting coefficients of variation with those obtained via a raking algorithm applied to the marginal coefficients. Furthermore, the approach is extended to a multivariate context to accommodate multiple estimation domains. The discussion also encompasses practical issues related to sample rotation and coordination, which are critical for maintaining data quality and minimizing respondent burden over time.
Release date: 2025-12-23
3. Adapting to change: Online first collection initiatives to improve the Labour Force Survey response rate
Articles and reports: 75-005-M2025001
Description: Since 2010, engaging Canadians to participate in the LFS has become more challenging due to a variety of social and technological changes. The decline in the LFS response rate accelerated in 2020, exacerbated by public health measures during the COVID-19 pandemic. This technical paper presents preliminary results of two collection initiatives implemented using an online first strategy to improve the LFS response rates by confirming respondent contact information and expanding the availability of online response. Through these and other planned initiatives, Statistics Canada is working to ensure that the LFS estimates continue to provide an accurate and representative portrait of the Canadian labour market.
Release date: 2025-10-21
4. Improving the Automated Capture of Survey of Household Spending Receipts using advanced Machine Learning Techniques Archived
Articles and reports: 11-522-X202500100004
Description: The Survey of Household Spending (SHS) conducted by Statistics Canada collects paper diaries and shopping receipts as a source of household expenditure data. An auto-capturing algorithm was created for SHS 2023 to reduce statistical clerks' manual work of extracting important information from scanned receipts of common store brands. The algorithm used Tesseract optical character recognition (OCR) to extract text characters from images of receipts, and it identified store and product entities using regular expressions, also known as regex. The goal of this study was to enhance the current auto-capture algorithm by experimenting with more advanced OCR and machine learning methods. As a result, PaddleOCR, an open-source OCR toolkit, was selected as the new default OCR engine due to its overall performance in recognizing texts, especially digits, accurately across receipts of various qualities. Additionally, entity classifiers based on support vector machines were trained on historical SHS records and existing regex patterns. By using classifiers to categorize different elements present on receipts instead of relying solely on regex patterns, product and store recognition improved. It is expected that this new algorithm will be used for SHS 2025 to improve the auto-capture quality and reduce the manual burden associated with capturing receipt variables.
Release date: 2025-09-08
5. Data-driven Imputation Strategies and their Associated Quality Indicators in Economic Surveys Archived
Articles and reports: 11-522-X202500100011
Description: The use of modern "data"-driven imputation methods to treat non-response in the context of surveys processed in the Integrated Business Statistics Program at Statistics Canada has previously been explored. It was observed that these methods can lead to high quality imputation and further have the potential to result in broad efficiencies when setting up a particular survey's edit and imputation strategy. However, estimation of the associated total variance, more specifically the component due to imputation, remains a challenge. In this article, two methods for estimation of total variance are proposed and show preliminary results that have motivated us to pursue further research in this area.
Release date: 2025-09-08
6. Ahead of the Trends: J.N.K. Rao's Contributions to Survey Research Archived
Articles and reports: 11-522-X202500100029
Description: J.N.K. Rao has contributed to almost every subdiscipline of survey research, including unequal-probability and two-phase sampling, variance estimation, regression and categorical data analysis, small area estimation, and data integration. For each of these topics, Rao's work anticipated and led future research directions. His contributions will be discussed in the context of broader research trends as seen in the articles of Survey Methodology over the journal's 50-year history.
Release date: 2025-09-08
7. Contributions of J.N.K. Rao to Complex Survey Multilevel Models and Composite Likelihood Archived
Articles and reports: 11-522-X202500100030
Description: In the setting of multilevel models to be estimated using data from surveys with complex sampling designs, this paper outlines some contributions of the landmark paper by Rao, Verret and Hidiroglou (Survey Methodology, 2013) and subsequent related work.
Release date: 2025-09-08
8. Propensity Score Estimation and Optimal Sampling Design when Integrating Probability Samples with Non-probability Data Archived
Articles and reports: 11-522-X202500100032
Description: Although non-probability data sources are not new to official statistics, a revived interest in the topic has emerged from pressures due to falling survey response rates, increasing data collection costs and a desire to take advantage of new data source opportunities from the ongoing societal digitalisation. Due to the exclusion of certain segments of the target population, inference derived solely from a non-probability data source is likely to result in bias. This work approaches the challenge of addressing the bias by integrating non-probability data with reference probability samples. The focus will be on methods to model the propensity of inclusion in the non-probability dataset with the help of the accompanying reference sample, with the modelled propensities then applied in an inverse probability weighting approach to produce population estimates. The reference sample is sometimes assumed as given. In this presentation however, an objective of finding an optimal strategy will be pursued that is, the combination of a data integration-based estimator and sample design for the reference probability sample. Recent work is discussed in which advantage is taken of the good unit identification possibilities in business surveys to study an estimator based on propensities and derive optimal (unequal) selection probabilities for the reference sample.
Release date: 2025-09-08
9. Including Non-binary Gender in the Calibration Strategy for the Canadian Long-Form Sample Survey Weights Archived
Articles and reports: 11-522-X202500100033
Description: Aligning with recent needs for increased disaggregated data, in 2021 Canada became the first country to collect and disseminate data on gender diversity in a national census giving Canadians the option to select male, female, or non-binary. Due to their small size, non-binary population counts were not used in the 2021 Census long-form sample calibration procedure due to the risk of increasing the variance of estimates. This paper presents an alternative long-form calibration strategy which allows for small populations, such as the non-binary group, to be incorporated while mitigating methodological concerns. The strategy put forward can incorporate multiple small populations simultaneously while also being flexible enough to fit the calibration systems of other National Statistical Offices (NSOs). The results of a Monte Carlo (MC) simulation are presented showing improved data quality for the non-binary population under the alternative calibration strategy.
Release date: 2025-09-08
10. Authors’ response to comments on “Trends and directions in sample survey theory and methods”
Articles and reports: 12-001-X202500100010
Description: The discussants highlight promising research topics for improving the quality and granularity of estimates from surveys. We agree that continued research is needed to evaluate models used for inference, and suggest development of measures of model dependence.
Release date: 2025-06-30

Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (301)

Analysis (301) (60 to 70 of 301 results)

61. Requirement: Collect less. Our mission: Do the best we can. Archived
Articles and reports: 11-522-X201300014276
Description:
In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.
Release date: 2014-10-31
62. Study of the “product” sampling scheme as illustrated by the ELFE survey Archived
Articles and reports: 11-522-X201300014286
Description: The Étude Longitudinale Française depuis l’Enfance (ELFE) [French longitudinal study from childhood on], which began in 2011, involves over 18,300 infants whose parents agreed to participate when they were in the maternity hospital. This cohort survey, which will track the children from birth to adulthood, covers the many aspects of their lives from the perspective of social science, health and environmental health. In randomly selected maternity hospitals, all infants in the target population, who were born on one of 25 days distributed across the four seasons, were chosen. This sample is the outcome of a non-standard sampling scheme that we call product sampling. In this survey, it takes the form of the cross-tabulation between two independent samples: a sampling of maternity hospitals and a sampling of days. While it is easy to imagine a cluster effect due to the sampling of maternity hospitals, one can also imagine a cluster effect due to the sampling of days. The scheme’s time dimension therefore cannot be ignored if the desired estimates are subject to daily or seasonal variation. While this non-standard scheme can be viewed as a particular kind of two-phase design, it needs to be defined within a more specific framework. Following a comparison of the product scheme with a conventional two-stage design, we propose variance estimators specially formulated for this sampling scheme. Our ideas are illustrated with a simulation study.
Release date: 2014-10-31
63. Weighted estimation and bootstrap variance estimation for analyzing survey data: How to implement in selected software Archived
Articles and reports: 12-002-X201400111901
Description:
This document is for analysts/researchers who are considering doing research with data from a survey where both survey weights and bootstrap weights are provided in the data files. This document gives directions, for some selected software packages, about how to get started in using survey weights and bootstrap weights for an analysis of survey data. We give brief directions for obtaining survey-weighted estimates, bootstrap variance estimates (and other desired error quantities) and some typical test statistics for each software package in turn. While these directions are provided just for the chosen examples, there will be information about the range of weighted and bootstrapped analyses that can be carried out by each software package.
Release date: 2014-08-07
64. The influence of sampling method and interviewers on sample realization in the European Social Survey Archived
Articles and reports: 12-001-X201400114001
Description:
This article addresses the impact of different sampling procedures on realised sample quality in the case of probability samples. This impact was expected to result from varying degrees of freedom on the part of interviewers to interview easily available or cooperative individuals (thus producing substitutions). The analysis was conducted in a cross-cultural context using data from the first four rounds of the European Social Survey (ESS). Substitutions are measured as deviations from a 50/50 gender ratio in subsamples with heterosexual couples. Significant deviations were found in numerous countries of the ESS. They were also found to be lowest in cases of samples with official registers of residents as sample frame (individual person register samples) if one partner was more difficult to contact than the other. This scope of substitutions did not differ across the ESS rounds and it was weakly correlated with payment and control procedures. It can be concluded from the results that individual person register samples are associated with higher sample quality.
Release date: 2014-06-27
65. A nonparametric method to generate synthetic populations to adjust for complex sampling design features Archived
Articles and reports: 12-001-X201400114003
Description:
Outside of the survey sampling literature, samples are often assumed to be generated by simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.
Release date: 2014-06-27
66. Joint determination of optimal stratification and sample allocation using genetic algorithm Archived
Articles and reports: 12-001-X201300211884
Description:
This paper offers a solution to the problem of finding the optimal stratification of the available population frame, so as to ensure the minimization of the cost of the sample required to satisfy precision constraints on a set of different target estimates. The solution is searched by exploring the universe of all possible stratifications obtainable by cross-classifying the categorical auxiliary variables available in the frame (continuous auxiliary variables can be transformed into categorical ones by means of suitable methods). Therefore, the followed approach is multivariate with respect to both target and auxiliary variables. The proposed algorithm is based on a non deterministic evolutionary approach, making use of the genetic algorithm paradigm. The key feature of the algorithm is in considering each possible stratification as an individual subject to evolution, whose fitness is given by the cost of the associated sample required to satisfy a set of precision constraints, the cost being calculated by applying the Bethel algorithm for multivariate allocation. This optimal stratification algorithm, implemented in an R package (SamplingStrata), has been so far applied to a number of current surveys in the Italian National Institute of Statistics: the obtained results always show significant improvements in the efficiency of the samples obtained, with respect to previously adopted stratifications.
Release date: 2014-01-15
67. Optimizing quality of response through adaptive survey designs Archived
Articles and reports: 12-001-X201300111824
Description:
In most surveys all sample units receive the same treatment and the same design features apply to all selected people and households. In this paper, it is explained how survey designs may be tailored to optimize quality given constraints on costs. Such designs are called adaptive survey designs. The basic ingredients of such designs are introduced, discussed and illustrated with various examples.
Release date: 2013-06-28
68. Indirect sampling applied to skewed populations Archived
Articles and reports: 12-001-X201300111829
Description:
Indirect Sampling is used when the sampling frame is not the same as the target population, but related to the latter. The estimation process for Indirect Sampling is carried out using the Generalised Weight Share Method (GWSM), which is an unbiased procedure (see Lavallée 2002, 2007). For business surveys, Indirect Sampling is applied as follows: the sampling frame is one of establishments, while the target population is one of enterprises. Enterprises are selected through their establishments. This allows stratifying according to the establishment characteristics, rather than those associated with enterprises. Because the variables of interest of establishments are generally highly skewed (a small portion of the establishments covers the major portion of the economy), the GWSM results in unbiased estimates, but their variance can be large. The purpose of this paper is to suggest some adjustments to the weights to reduce the variance of the estimates in the context of skewed populations, while keeping the method unbiased. After a brief overview of Indirect Sampling and the GWSM, we describe the required adjustments to the GWSM. The estimates produced with these adjustments are compared to those from the original GWSM, via a small numerical example, and using real data originating from the Statistics Canada's Business Register.
Release date: 2013-06-28
69. On sample allocation for efficient domain estimation Archived
Articles and reports: 12-001-X201200111682
Description:
Sample allocation issues are studied in the context of estimating sub-population (stratum or domain) means as well as the aggregate population mean under stratified simple random sampling. A non-linear programming method is used to obtain "optimal" sample allocation to strata that minimizes the total sample size subject to specified tolerances on the coefficient of variation of the estimators of strata means and the population mean. The resulting total sample size is then used to determine sample allocations for the methods of Costa, Satorra and Ventura (2004) based on compromise allocation and Longford (2006) based on specified "inferential priorities". In addition, we study sample allocation to strata when reliability requirements for domains, cutting across strata, are also specified. Performance of the three methods is studied using data from Statistics Canada's Monthly Retail Trade Survey (MRTS) of single establishments.
Release date: 2012-06-27
70. Alternative demographic sample designs being explored at the U.S. Census Bureau Archived
Articles and reports: 12-001-X201100211606
Description:
This paper introduces a U.S. Census Bureau special compilation by presenting four other papers of the current issue: three papers from authors Tillé, Lohr and Thompson as well as a discussion paper from Opsomer.
Release date: 2011-12-21

Reference (29)

Reference (29) (0 to 10 of 29 results)

1. 2019 Census Content Test: Design and methodology Archived
Surveys and statistical programs – Documentation: 98-20-00012020020
Description:
This fact sheet provides detailed insight into the design and methodology of the content test component of the 2019 Census Test. This test evaluated changes to the wording and flow of some questions, as well as the potential addition of new questions, to help determine the content of the 2021 Census of Population.

Release date: 2020-07-20
2. Use of Administrative Data to Increase the Efficiency of the Sample Design for the New National Travel Survey Archived
Surveys and statistical programs – Documentation: 11-522-X201700014749
Description:
As part of the Tourism Statistics Program redesign, Statistics Canada is developing the National Travel Survey (NTS) to collect travel information from Canadian travellers. This new survey will replace the Travel Survey of Residents of Canada and the Canadian resident component of the International Travel Survey. The NTS will take advantage of Statistics Canada’s common sampling frames and common processing tools while maximizing the use of administrative data. This paper discusses the potential uses of administrative data such as Passport Canada files, Canada Border Service Agency files and Canada Revenue Agency files, to increase the efficiency of the NTS sample design.
Release date: 2016-03-24
3. The General Social Survey: New Data Overview Archived
Surveys and statistical programs – Documentation: 89-631-X
Description:
This report highlights the latest developments and rationale behind recent cycles of the General Social Survey (GSS). Starting with an overview of the GSS mandate and historic cycle topics, we then focus on two recent cycles related to families in Canada: Family Transitions (2006) and Family, Social Support and Retirement (2007). Finally, we give a summary of what is to come in the 2008 GSS on Social Networks, and describe a special project to mark 'Twenty Years of GSS'.
The survey collects data over a twelve month period from the population living in private households in the 10 provinces. For all cycles except Cycles 16 and 21, the population aged 15 and older has been sampled. Cycles 16 and 21 sampled persons aged 45 and older.
Cycle 20 (GSS 2006) is the fourth cycle of the GSS to collect data on families (the first three cycles on the family were in 1990, 1995 and 2001). Cycle 20 covers much the same content as previous cycles on families with some sections revised and expanded. The data enable analysts to measure conjugal and fertility history (chronology of marriages, common-law unions, and children), family origins, children's home leaving, fertility intentions, child custody as well as work history and other socioeconomic characteristics. Questions on financial support agreements or arrangements (for children and the ex-spouse or ex-partner) for separated and divorced families have been modified. Also, sections on social networks, well-being and housing characteristics have been added.
Release date: 2008-05-27
4. Content of the Survey of Labour and Income Dynamics Part B - Income and Wealth Content Archived
Surveys and statistical programs – Documentation: 75F0002M1992001
Description:
Starting in 1994, the Survey of Labour and Income Dynamics (SLID) will follow individuals and families for at least six years, tracking their labour market experiences, changes in income and family circumstances. An initial proposal for the content of SLID, entitled "Content of the Survey of Labour and Income Dynamics : Discussion Paper", was distributed in February 1992.
That paper served as a background document for consultation with and a review by interested users. The content underwent significant change during this process. Based upon the revised content, a large-scale test of SLID will be conducted in February and May 1993.
The present document outlines the income and wealth content to be tested in May 1993. This document is really a continuation of SLID Research Paper Series 92-01A, which outlines the demographic and labour content used in the January /February 1993 test.
Release date: 2008-02-29
5. Objectives and Content of the Preliminary Interview Archived
Surveys and statistical programs – Documentation: 75F0002M1992007
Description:
A Preliminary Interview will be conducted on the first panel of SLID, in January 1993, as a supplement to the Labour Force Survey. The first panel is made up of about 20,000 households that are rotating out of the Labour Force Survey in January and February, 1993.
The purpose of this document is to provide a description of the purpose of the SLID Preliminary Interview and the question wordings to be used.
Release date: 2008-02-29
6. Environment Surveys of Establishments: The Canadian Experience Archived
Surveys and statistical programs – Documentation: 16-001-M2007004
Description:
Statistics Canada administers a number of environmental surveys that fill important data gaps but also pose numerous challenges to administer. This paper focuses on two on-going environment surveys - one newly initiated and one in the process of a redesign.
Release date: 2007-11-23
7. Recent Changes in Geography Content in the Survey of Labour and Income Dynamics (SLID) Archived
Surveys and statistical programs – Documentation: 75F0002M2005002
Description:
This paper describes the changes made to the structure of geography information on SLID from reference year 1999 onwards. It goes into reasons for changing to the 2001 Census-based geography, shows how the overlap between the 1991 and 2001 Census-based concepts are handled, provides detail on how the geographic concepts are implemented, discusses a new imputation procedure and finishes with an illustration of the impact of these changes on selected tables.
Release date: 2005-03-31
8. Improvements in 2005 to the Labour Force Survey (LFS) Archived
Surveys and statistical programs – Documentation: 71F0031X2005002
Description:
This paper introduces and explains modifications made to the Labour Force Survey estimates in January 2005. Some of these modifications include the adjustment of all LFS estimates to reflect population counts based on the 2001 Census, updates to industry and occupation classification systems and sample redesign changes.
Release date: 2005-01-26
9. Entry-Exit Component of Labour Interview for January 2003 and Income Interview for May 2003: Survey of Labour and Income Dynamics Archived
Surveys and statistical programs – Documentation: 75F0002M2004006
Description:
This document presents information about the entry-exit portion of the annual labour and the income interviews of the Survey of Labour and Income Dynamics (SLID).
Release date: 2004-06-21
10. Issues in the Design of Canada's Adult Education and Training Survey Archived
Surveys and statistical programs – Documentation: 81-595-M2003009
Geography: Canada
Description:
This paper examines how the Canadian Adult Education and Training Survey (AETS) can be used to study participation in and impacts of education and training activities for adults.
Release date: 2003-10-15

Date modified:: 2026-05-30