Survey design

Results

All (330)

All (330) (0 to 10 of 330 results)

1. Income Research Paper Series
Journals and periodicals: 75F0002M
Description: This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.
Release date: 2026-05-20
2. Sampling for business surveys at Statistics Canada
Articles and reports: 12-001-X202500200013
Description: This article examines the methodological complexities associated with the design of business surveys, with particular emphasis on sampling strategies implemented by National Statistical Offices (NSOs). It addresses the inherent challenges posed by the dynamic nature of the business population, which necessitates continual updates to the sampling frame to ensure representativeness and relevance. Critical design considerations include the determination of optimal sample sizes, stratification across key dimensions such as industry, geographic region, and enterprise size, as well as the treatment of business births and the exclusion of inactive (or “dead”) units. The article applies Bankier’s (1988) power allocation method to a two-way stratification scheme defined by industry and geography, evaluating its performance by comparing the resulting coefficients of variation with those obtained via a raking algorithm applied to the marginal coefficients. Furthermore, the approach is extended to a multivariate context to accommodate multiple estimation domains. The discussion also encompasses practical issues related to sample rotation and coordination, which are critical for maintaining data quality and minimizing respondent burden over time.
Release date: 2025-12-23
3. Adapting to change: Online first collection initiatives to improve the Labour Force Survey response rate
Articles and reports: 75-005-M2025001
Description: Since 2010, engaging Canadians to participate in the LFS has become more challenging due to a variety of social and technological changes. The decline in the LFS response rate accelerated in 2020, exacerbated by public health measures during the COVID-19 pandemic. This technical paper presents preliminary results of two collection initiatives implemented using an online first strategy to improve the LFS response rates by confirming respondent contact information and expanding the availability of online response. Through these and other planned initiatives, Statistics Canada is working to ensure that the LFS estimates continue to provide an accurate and representative portrait of the Canadian labour market.
Release date: 2025-10-21
4. Improving the Automated Capture of Survey of Household Spending Receipts using advanced Machine Learning Techniques Archived
Articles and reports: 11-522-X202500100004
Description: The Survey of Household Spending (SHS) conducted by Statistics Canada collects paper diaries and shopping receipts as a source of household expenditure data. An auto-capturing algorithm was created for SHS 2023 to reduce statistical clerks' manual work of extracting important information from scanned receipts of common store brands. The algorithm used Tesseract optical character recognition (OCR) to extract text characters from images of receipts, and it identified store and product entities using regular expressions, also known as regex. The goal of this study was to enhance the current auto-capture algorithm by experimenting with more advanced OCR and machine learning methods. As a result, PaddleOCR, an open-source OCR toolkit, was selected as the new default OCR engine due to its overall performance in recognizing texts, especially digits, accurately across receipts of various qualities. Additionally, entity classifiers based on support vector machines were trained on historical SHS records and existing regex patterns. By using classifiers to categorize different elements present on receipts instead of relying solely on regex patterns, product and store recognition improved. It is expected that this new algorithm will be used for SHS 2025 to improve the auto-capture quality and reduce the manual burden associated with capturing receipt variables.
Release date: 2025-09-08
5. Data-driven Imputation Strategies and their Associated Quality Indicators in Economic Surveys Archived
Articles and reports: 11-522-X202500100011
Description: The use of modern "data"-driven imputation methods to treat non-response in the context of surveys processed in the Integrated Business Statistics Program at Statistics Canada has previously been explored. It was observed that these methods can lead to high quality imputation and further have the potential to result in broad efficiencies when setting up a particular survey's edit and imputation strategy. However, estimation of the associated total variance, more specifically the component due to imputation, remains a challenge. In this article, two methods for estimation of total variance are proposed and show preliminary results that have motivated us to pursue further research in this area.
Release date: 2025-09-08
6. Ahead of the Trends: J.N.K. Rao's Contributions to Survey Research Archived
Articles and reports: 11-522-X202500100029
Description: J.N.K. Rao has contributed to almost every subdiscipline of survey research, including unequal-probability and two-phase sampling, variance estimation, regression and categorical data analysis, small area estimation, and data integration. For each of these topics, Rao's work anticipated and led future research directions. His contributions will be discussed in the context of broader research trends as seen in the articles of Survey Methodology over the journal's 50-year history.
Release date: 2025-09-08
7. Contributions of J.N.K. Rao to Complex Survey Multilevel Models and Composite Likelihood Archived
Articles and reports: 11-522-X202500100030
Description: In the setting of multilevel models to be estimated using data from surveys with complex sampling designs, this paper outlines some contributions of the landmark paper by Rao, Verret and Hidiroglou (Survey Methodology, 2013) and subsequent related work.
Release date: 2025-09-08
8. Propensity Score Estimation and Optimal Sampling Design when Integrating Probability Samples with Non-probability Data Archived
Articles and reports: 11-522-X202500100032
Description: Although non-probability data sources are not new to official statistics, a revived interest in the topic has emerged from pressures due to falling survey response rates, increasing data collection costs and a desire to take advantage of new data source opportunities from the ongoing societal digitalisation. Due to the exclusion of certain segments of the target population, inference derived solely from a non-probability data source is likely to result in bias. This work approaches the challenge of addressing the bias by integrating non-probability data with reference probability samples. The focus will be on methods to model the propensity of inclusion in the non-probability dataset with the help of the accompanying reference sample, with the modelled propensities then applied in an inverse probability weighting approach to produce population estimates. The reference sample is sometimes assumed as given. In this presentation however, an objective of finding an optimal strategy will be pursued that is, the combination of a data integration-based estimator and sample design for the reference probability sample. Recent work is discussed in which advantage is taken of the good unit identification possibilities in business surveys to study an estimator based on propensities and derive optimal (unequal) selection probabilities for the reference sample.
Release date: 2025-09-08
9. Including Non-binary Gender in the Calibration Strategy for the Canadian Long-Form Sample Survey Weights Archived
Articles and reports: 11-522-X202500100033
Description: Aligning with recent needs for increased disaggregated data, in 2021 Canada became the first country to collect and disseminate data on gender diversity in a national census giving Canadians the option to select male, female, or non-binary. Due to their small size, non-binary population counts were not used in the 2021 Census long-form sample calibration procedure due to the risk of increasing the variance of estimates. This paper presents an alternative long-form calibration strategy which allows for small populations, such as the non-binary group, to be incorporated while mitigating methodological concerns. The strategy put forward can incorporate multiple small populations simultaneously while also being flexible enough to fit the calibration systems of other National Statistical Offices (NSOs). The results of a Monte Carlo (MC) simulation are presented showing improved data quality for the non-binary population under the alternative calibration strategy.
Release date: 2025-09-08
10. Authors’ response to comments on “Trends and directions in sample survey theory and methods”
Articles and reports: 12-001-X202500100010
Description: The discussants highlight promising research topics for improving the quality and granularity of estimates from surveys. We agree that continued research is needed to evaluate models used for inference, and suggest development of measures of model dependence.
Release date: 2025-06-30

Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (301)

Analysis (301) (40 to 50 of 301 results)

41. Coordination of spatially balanced samples Archived
Articles and reports: 12-001-X201800254953
Description:
Sample coordination seeks to create a probabilistic dependence between the selection of two or more samples drawn from the same population or from overlapping populations. Positive coordination increases the expected sample overlap, while negative coordination decreases it. There are numerous applications for sample coordination with varying objectives. A spatially balanced sample is a sample that is well-spread in some space. Forcing a spread within the selected samples is a general and very efficient variance reduction technique for the Horvitz-Thompson estimator. The local pivotal method and the spatially correlated Poisson sampling are two general schemes for achieving well-spread samples. We aim to introduce coordination for these sampling methods based on the concept of permanent random numbers. The goal is to coordinate such samples while preserving spatial balance. The proposed methods are motivated by examples from forestry, environmental studies, and official statistics.
Release date: 2018-12-20
42. Using balanced sampling in creel surveys Archived
Articles and reports: 12-001-X201800254954
Description:
These last years, balanced sampling techniques have experienced a recrudescence of interest. They constrain the Horvitz Thompson estimators of the totals of auxiliary variables to be equal, at least approximately, to the corresponding true totals, to avoid the occurrence of bad samples. Several procedures are available to carry out balanced sampling; there is the cube method, see Deville and Tillé (2004), and an alternative, the rejective algorithm introduced by Hájek (1964). After a brief review of these sampling methods, motivated by the planning of an angler survey, we investigate using Monte Carlo simulations, the survey designs produced by these two sampling algorithms.
Release date: 2018-12-20
43. Optimizing a mixed allocation Archived
Articles and reports: 12-001-X201800254959
Description:
This article proposes a criterion for calculating the trade-off in so-called “mixed” allocations, which combine two classic allocations in sampling theory. In INSEE (National Institute of Statistics and Economic Studies) business surveys, it is common to use the arithmetic mean of a proportional allocation and a Neyman allocation (corresponding to a trade-off of 0.5). It is possible to obtain a trade-off value resulting in better properties for the estimators. This value belongs to a region that is obtained by solving an optimization program. Different methods for calculating the trade-off will be presented. An application for business surveys is presented, as well as a comparison with other usual trade-off allocations.
Release date: 2018-12-20
44. Model based inference using ranked set samples Archived
Articles and reports: 12-001-X201800154925
Description:
This paper develops statistical inference based on super population model in a finite population setting using ranked set samples (RSS). The samples are constructed without replacement. It is shown that the sample mean of RSS is model unbiased and has smaller mean square prediction error (MSPE) than the MSPE of a simple random sample mean. Using an unbiased estimator of MSPE, the paper also constructs a prediction confidence interval for the population mean. A small scale simulation study shows that estimator is as good as a simple random sample (SRS) estimator for poor ranking information. On the other hand it has higher efficiency than SRS estimator when the quality of ranking information is good, and the cost ratio of obtaining a single unit in RSS and SRS is not very high. Simulation study also indicates that coverage probabilities of prediction intervals are very close to the nominal coverage probabilities. Proposed inferential procedure is applied to a real data set.
Release date: 2018-06-21
45. Strategies for subsampling nonrespondents for economic programs Archived
Articles and reports: 12-001-X201800154929
Description:
The U.S. Census Bureau is investigating nonrespondent subsampling strategies for usage in the 2017 Economic Census. Design constraints include a mandated lower bound on the unit response rate, along with targeted industry-specific response rates. This paper presents research on allocation procedures for subsampling nonrespondents, conditional on the subsampling being systematic. We consider two approaches: (1) equal-probability sampling and (2) optimized allocation with constraints on unit response rates and sample size with the objective of selecting larger samples in industries that have initially lower response rates. We present a simulation study that examines the relative bias and mean squared error for the proposed allocations, assessing each procedure’s sensitivity to the size of the subsample, the response propensities, and the estimation procedure.
Release date: 2018-06-21
46. Sample allocation for efficient model-based small area estimation Archived
Articles and reports: 12-001-X201700114817
Description:
We present research results on sample allocations for efficient model-based small area estimation in cases where the areas of interest coincide with the strata. Although model-assisted and model-based estimation methods are common in the production of small area statistics, utilization of the underlying model and estimation method are rarely included in the sample area allocation scheme. Therefore, we have developed a new model-based allocation named g1-allocation. For comparison, one recently developed model-assisted allocation is presented. These two allocations are based on an adjusted measure of homogeneity which is computed using an auxiliary variable and is an approximation of the intra-class correlation within areas. Five model-free area allocation solutions presented in the past are selected from the literature as reference allocations. Equal and proportional allocations need the number of areas and area-specific numbers of basic statistical units. The Neyman, Bankier and NLP (Non-Linear Programming) allocation need values for the study variable concerning area level parameters such as standard deviation, coefficient of variation or totals. In general, allocation methods can be classified according to the optimization criteria and use of auxiliary data. Statistical properties of the various methods are assessed through sample simulation experiments using real population register data. It can be concluded from simulation results that inclusion of the model and estimation method into the allocation method improves estimation results.
Release date: 2017-06-22
47. Unequal probability inverse sampling Archived
Articles and reports: 12-001-X201600214660
Description:
In an economic survey of a sample of enterprises, occupations are randomly selected from a list until a number r of occupations in a local unit has been identified. This is an inverse sampling problem for which we are proposing a few solutions. Simple designs with and without replacement are processed using negative binomial distributions and negative hypergeometric distributions. We also propose estimators for when the units are selected with unequal probabilities, with or without replacement.
Release date: 2016-12-20
48. A note on the concept of invariance in two-phase sampling designs Archived
Articles and reports: 12-001-X201600214662
Description:
Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.
Release date: 2016-12-20
49. Adaptive rectangular sampling: An easy, incomplete, neighbourhood-free adaptive cluster sampling design Archived
Articles and reports: 12-001-X201600214684
Description:
This paper introduces an incomplete adaptive cluster sampling design that is easy to implement, controls the sample size well, and does not need to follow the neighbourhood. In this design, an initial sample is first selected, using one of the conventional designs. If a cell satisfies a prespecified condition, a specified radius around the cell is sampled completely. The population mean is estimated using the \pi-estimator. If all the inclusion probabilities are known, then an unbiased \pi estimator is available; if, depending on the situation, the inclusion probabilities are not known for some of the final sample units, then they are estimated. To estimate the inclusion probabilities, a biased estimator is constructed. However, the simulations show that if the sample size is large enough, the error of the inclusion probabilities is negligible, and the relative \pi-estimator is almost unbiased. This design rivals adaptive cluster sampling because it controls the final sample size and is easy to manage. It rivals adaptive two-stage sequential sampling because it considers the cluster form of the population and reduces the cost of moving across the area. Using real data on a bird population and simulations, the paper compares the design with adaptive two-stage sequential sampling. The simulations show that the design has significant efficiency in comparison with its rival.
Release date: 2016-12-20
50. An Overview of Selected International Business Record Linkage Programs Archived
Articles and reports: 18-001-X2016001
Description:
Although the record linkage of business data is not a completely new topic, the fact remains that the public and many data users are unaware of the programs and practices commonly used by statistical agencies across the world.
This report is a brief overview of the main practices, programs and challenges of record linkage of statistical agencies across the world who answered a short survey on this subject supplemented by publically available documentation produced by these agencies. The document shows that the linkage practices are similar between these statistical agencies; however the main differences are in the procedures in place to access to data along with regulatory policies that govern the record linkage permissions and the dissemination of data.
Release date: 2016-10-27

Reference (29)

Reference (29) (20 to 30 of 29 results)

21. Calculation of change for annual business surveys Archived
Surveys and statistical programs – Documentation: 11-522-X19980015027
Description:
The disseminated results of annual business surveys inevitably contain statistics that are changing. Since the economic sphere is increasingly dynamic, a simple difference of aggregates between n-l and n is no longer sufficient to provide an overall description of what has happened. The change calculation module in the new generation of annual business surveys divides overall change into various components (births, deaths, inter-industry migration) and calculates change on the basis of a constant field, assigning special importance to restructurings. The main difficulties lie in establishing subsamples, reweighting, calibrating according to calculable changes, and taking account of restructuring.
Release date: 1999-10-22
22. Marginal models for repeated observations: Inference with survey data Archived
Surveys and statistical programs – Documentation: 11-522-X19980015029
Description:
In longitudinal surveys, sample subjects are observed over several time points. This feature typically leads to dependent observations on the same subject, in addition to the customary correlations across subjects induced by the sample design. Much research in the literature has focussed on modeling the marginal mean of a response as a function of covariates. Liang and Zeger (1986) used generalized estimating equations (GEE), requiring only correct specification of the marginal mean, and obtained standard errors of regression parameter estimates and associated Wald tests, assuming a "working" correlation structure for the repeated measurements on a sample subject. Rotnitzky and Jewell (1990) developed quasi-score tests and Rao-Scott adjustments to "working" quasi-score tests under marginal models. These methods are asymptotically robust to misspecification of the within-subject correlation structure, but assume independence of sample subjects which is not satisfied for complex longitudinal survey data based on stratified multi-stage sampling. We proposed asymptotically valid Wald and quasi-score tests for longitudinal survey data, using the Taylor Linearization and jackknife methods. Alternative tests, based on Rao-Scott adjustments to naive tests that ignore survey design features and on Bonferroni-t, are also developed. These tests are particularly useful when the effective degrees of freedom, usually taken as the total number of sample primary units (clusters) minus the number of strata, is small.
Release date: 1999-10-22
23. Estimating the incidence of dementia from longitudinal two-phase sampling with nonignorable missing data Archived
Surveys and statistical programs – Documentation: 11-522-X19980015030
Description:
Two-phase sampling designs have been conducted in waves to estimate the incidence of a rare disease such as dementia. Estimation of disease incidence from longitudinal dementia study has to appropriately adjust for data missing by death as well as the sampling design used at each study wave. In this paper we adopt a selection model approach to model the missing data by death and use a likelihood approach to derive incidence estimates. A modified EM algorithm is used to deal with data missing by sampling selection. The non-paramedic jackknife variance estimator is used to derive variance estimates for the model parameters and the incidence estimates. The proposed approaches are applied to data from the Indianapolis-Ibadan Dementia Study.
Release date: 1999-10-22
24. Estimation with partial overlap longitudinal samples Archived
Surveys and statistical programs – Documentation: 11-522-X19980015035
Description:
In a longitudinal survey conducted for k periods some units may be observed for less than k of the periods. Examples include, surveys designed with partially overlapping subsamples, a pure panel survey with nonresponse, and a panel survey supplemented with additional samples for some of the time periods. Estimators of the regression type are exhibited for such surveys. An application to special studies associated with the National Resources Inventory is discussed.
Release date: 1999-10-22
25. Towards a New Canadian Asset and Debt Survey - a Content Discussion Paper Archived
Notices and consultations: 13F0026M1999001
Description:
The main objectives of a new Canadian survey measuring asset and debt holding of families and individuals will be to update wealth information that is over one decade old; to improve the reliability of the wealth estimates; and, to provide a primary tool for analysing many important policy issues related to the distribution of assets and debts, future consumption possibilities, and savings behaviour that is of interest to governments, business and communities.
This paper is the document that launched the development of the new asset and debt survey, subsequently renamed the Survey of Financial Security. It looks at the conceptual framework for the survey, including the appropriate unit of measurement (family, household or person) and discusses measurement issues such as establishing an accounting framework for assets and debts. The variables proposed for inclusion are also identified. The paper poses several questions to readers and asks for comments and feedback.
Release date: 1999-03-23
26. Asset and Debt Survey: Findings of the Content Consultation Process Archived
Notices and consultations: 13F0026M1999002
Description:
This document summarizes the comments and feedback received on an earlier document: Towards a new Canadian asset and debt survey - A content discussion paper. The new asset and debt survey (now called the Survey of Financial Security) is to update the wealth information on Canadian families and unattached individuals. Since the last data collection was conducted in 1984, it was essential to include a consultative process in the development of the survey in order to obtain feedback on issues of concern and to define the conceptual framework for the survey.
Comments on the content discussion paper are summarized by major theme and sections indicate how the suggestions are being incorporated into the survey or why they could not be incorporated. This paper also mentions the main objectives of the survey and provides an overview of the survey content, revised according to the feedback from the discussion paper.
Release date: 1999-03-23
27. Proposal for an Asset and Debt Survey Archived
Surveys and statistical programs – Documentation: 13F0026M1999003
Description:
This paper presents a proposal for conducting a Canadian asset and debt survey. The first step in preparing this proposal was the release, in February 1997, of a document entitled Towards a new Canadian asset and debt survey whose intent was to elicit feedback on the initial thinking regarding the content of the survey.
This paper reviews the conceptual framework for a new asset and debt survey, data requirements, survey design, collection methodology and testing. It provides also an overview of the anticipated data processing system, describes the analysis and dissemination plan (analytical products and microdata files), and identifies the survey costs and major milestones. Finally, it presents the management/coordination approach used.
Release date: 1999-03-23
28. Sample Representativeness for the Survey of Labour and Income Dynamics Archived
Surveys and statistical programs – Documentation: 75F0002M1993019
Description:
This paper examines the issues and the procedures designed to maintain a representative sample of the population for the Survey of Labour and Income Dynamics (SLID).
Release date: 1995-12-30
29. SLID Following Rules: Who to Trace and Who to Interview Archived
Surveys and statistical programs – Documentation: 75F0002M1994001
Description:
This paper describes the Survey of Labour and Income Dynamics (SLID) following rules, which govern who is traced and who is interviewed. It also outlines the conceptual basis for these procedures.
Release date: 1995-12-30

Date modified:: 2026-05-30