Survey design

Results

All (330)

All (330) (0 to 10 of 330 results)

1. Income Research Paper Series
Journals and periodicals: 75F0002M
Description: This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.
Release date: 2026-05-20
2. Sampling for business surveys at Statistics Canada
Articles and reports: 12-001-X202500200013
Description: This article examines the methodological complexities associated with the design of business surveys, with particular emphasis on sampling strategies implemented by National Statistical Offices (NSOs). It addresses the inherent challenges posed by the dynamic nature of the business population, which necessitates continual updates to the sampling frame to ensure representativeness and relevance. Critical design considerations include the determination of optimal sample sizes, stratification across key dimensions such as industry, geographic region, and enterprise size, as well as the treatment of business births and the exclusion of inactive (or “dead”) units. The article applies Bankier’s (1988) power allocation method to a two-way stratification scheme defined by industry and geography, evaluating its performance by comparing the resulting coefficients of variation with those obtained via a raking algorithm applied to the marginal coefficients. Furthermore, the approach is extended to a multivariate context to accommodate multiple estimation domains. The discussion also encompasses practical issues related to sample rotation and coordination, which are critical for maintaining data quality and minimizing respondent burden over time.
Release date: 2025-12-23
3. Adapting to change: Online first collection initiatives to improve the Labour Force Survey response rate
Articles and reports: 75-005-M2025001
Description: Since 2010, engaging Canadians to participate in the LFS has become more challenging due to a variety of social and technological changes. The decline in the LFS response rate accelerated in 2020, exacerbated by public health measures during the COVID-19 pandemic. This technical paper presents preliminary results of two collection initiatives implemented using an online first strategy to improve the LFS response rates by confirming respondent contact information and expanding the availability of online response. Through these and other planned initiatives, Statistics Canada is working to ensure that the LFS estimates continue to provide an accurate and representative portrait of the Canadian labour market.
Release date: 2025-10-21
4. Improving the Automated Capture of Survey of Household Spending Receipts using advanced Machine Learning Techniques Archived
Articles and reports: 11-522-X202500100004
Description: The Survey of Household Spending (SHS) conducted by Statistics Canada collects paper diaries and shopping receipts as a source of household expenditure data. An auto-capturing algorithm was created for SHS 2023 to reduce statistical clerks' manual work of extracting important information from scanned receipts of common store brands. The algorithm used Tesseract optical character recognition (OCR) to extract text characters from images of receipts, and it identified store and product entities using regular expressions, also known as regex. The goal of this study was to enhance the current auto-capture algorithm by experimenting with more advanced OCR and machine learning methods. As a result, PaddleOCR, an open-source OCR toolkit, was selected as the new default OCR engine due to its overall performance in recognizing texts, especially digits, accurately across receipts of various qualities. Additionally, entity classifiers based on support vector machines were trained on historical SHS records and existing regex patterns. By using classifiers to categorize different elements present on receipts instead of relying solely on regex patterns, product and store recognition improved. It is expected that this new algorithm will be used for SHS 2025 to improve the auto-capture quality and reduce the manual burden associated with capturing receipt variables.
Release date: 2025-09-08
5. Data-driven Imputation Strategies and their Associated Quality Indicators in Economic Surveys Archived
Articles and reports: 11-522-X202500100011
Description: The use of modern "data"-driven imputation methods to treat non-response in the context of surveys processed in the Integrated Business Statistics Program at Statistics Canada has previously been explored. It was observed that these methods can lead to high quality imputation and further have the potential to result in broad efficiencies when setting up a particular survey's edit and imputation strategy. However, estimation of the associated total variance, more specifically the component due to imputation, remains a challenge. In this article, two methods for estimation of total variance are proposed and show preliminary results that have motivated us to pursue further research in this area.
Release date: 2025-09-08
6. Ahead of the Trends: J.N.K. Rao's Contributions to Survey Research Archived
Articles and reports: 11-522-X202500100029
Description: J.N.K. Rao has contributed to almost every subdiscipline of survey research, including unequal-probability and two-phase sampling, variance estimation, regression and categorical data analysis, small area estimation, and data integration. For each of these topics, Rao's work anticipated and led future research directions. His contributions will be discussed in the context of broader research trends as seen in the articles of Survey Methodology over the journal's 50-year history.
Release date: 2025-09-08
7. Contributions of J.N.K. Rao to Complex Survey Multilevel Models and Composite Likelihood Archived
Articles and reports: 11-522-X202500100030
Description: In the setting of multilevel models to be estimated using data from surveys with complex sampling designs, this paper outlines some contributions of the landmark paper by Rao, Verret and Hidiroglou (Survey Methodology, 2013) and subsequent related work.
Release date: 2025-09-08
8. Propensity Score Estimation and Optimal Sampling Design when Integrating Probability Samples with Non-probability Data Archived
Articles and reports: 11-522-X202500100032
Description: Although non-probability data sources are not new to official statistics, a revived interest in the topic has emerged from pressures due to falling survey response rates, increasing data collection costs and a desire to take advantage of new data source opportunities from the ongoing societal digitalisation. Due to the exclusion of certain segments of the target population, inference derived solely from a non-probability data source is likely to result in bias. This work approaches the challenge of addressing the bias by integrating non-probability data with reference probability samples. The focus will be on methods to model the propensity of inclusion in the non-probability dataset with the help of the accompanying reference sample, with the modelled propensities then applied in an inverse probability weighting approach to produce population estimates. The reference sample is sometimes assumed as given. In this presentation however, an objective of finding an optimal strategy will be pursued that is, the combination of a data integration-based estimator and sample design for the reference probability sample. Recent work is discussed in which advantage is taken of the good unit identification possibilities in business surveys to study an estimator based on propensities and derive optimal (unequal) selection probabilities for the reference sample.
Release date: 2025-09-08
9. Including Non-binary Gender in the Calibration Strategy for the Canadian Long-Form Sample Survey Weights Archived
Articles and reports: 11-522-X202500100033
Description: Aligning with recent needs for increased disaggregated data, in 2021 Canada became the first country to collect and disseminate data on gender diversity in a national census giving Canadians the option to select male, female, or non-binary. Due to their small size, non-binary population counts were not used in the 2021 Census long-form sample calibration procedure due to the risk of increasing the variance of estimates. This paper presents an alternative long-form calibration strategy which allows for small populations, such as the non-binary group, to be incorporated while mitigating methodological concerns. The strategy put forward can incorporate multiple small populations simultaneously while also being flexible enough to fit the calibration systems of other National Statistical Offices (NSOs). The results of a Monte Carlo (MC) simulation are presented showing improved data quality for the non-binary population under the alternative calibration strategy.
Release date: 2025-09-08
10. Authors’ response to comments on “Trends and directions in sample survey theory and methods”
Articles and reports: 12-001-X202500100010
Description: The discussants highlight promising research topics for improving the quality and granularity of estimates from surveys. We agree that continued research is needed to evaluate models used for inference, and suggest development of measures of model dependence.
Release date: 2025-06-30

Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (301)

Analysis (301) (30 to 40 of 301 results)

31. A method to find an efficient and robust sampling strategy under model uncertainty
Articles and reports: 12-001-X202100100002
Description:
We consider the problem of deciding on sampling strategy, in particular sampling design. We propose a risk measure, whose minimizing value guides the choice. The method makes use of a superpopulation model and takes into account uncertainty about its parameters through a prior distribution. The method is illustrated with a real dataset, yielding satisfactory results. As a baseline, we use the strategy that couples probability proportional-to-size sampling with the difference estimator, as it is known to be optimal when the superpopulation model is fully known. We show that, even under moderate misspecifications of the model, this strategy is not robust and can be outperformed by some alternatives.
Release date: 2021-06-24
32. Probability-proportional-to-size ranked-set sampling from stratified populations Archived
Articles and reports: 12-001-X202000200001
Description:
This paper constructs a probability-proportional-to-size (PPS) ranked-set sample from a stratified population. A PPS-ranked-set sample partitions the units in a PPS sample into groups of similar observations. The construction of similar groups relies on relative positions (ranks) of units in small comparison sets. Hence, the ranks induce more structure (stratification) in the sample in addition to the data structure created by unequal selection probabilities in a PPS sample. This added data structure makes the PPS-ranked-set sample more informative then a PPS-sample. The stratified PPS-ranked-set sample is constructed by selecting a PPS-ranked-set sample from each stratum population. The paper constructs unbiased estimators for the population mean, total and their variances. The new sampling design is applied to apple production data to estimate the total apple production in Turkey.
Release date: 2020-12-15
33. Local polynomial estimation for a small area mean under informative sampling Archived
Articles and reports: 12-001-X202000100002
Description:
Model-based methods are required to estimate small area parameters of interest, such as totals and means, when traditional direct estimation methods cannot provide adequate precision. Unit level and area level models are the most commonly used ones in practice. In the case of the unit level model, efficient model-based estimators can be obtained if the sample design is such that the sample and population models coincide: that is, the sampling design is non-informative for the model. If on the other hand, the sampling design is informative for the model, the selection probabilities will be related to the variable of interest, even after conditioning on the available auxiliary data. This will imply that the population model no longer holds for the sample. Pfeffermann and Sverchkov (2007) used the relationships between the population and sample distribution of the study variable to obtain approximately unbiased semi-parametric predictors of the area means under informative sampling schemes. Their procedure is valid for both sampled and non-sampled areas.
Release date: 2020-06-30
34. Considering interviewer and design effects when planning sample sizes Archived
Articles and reports: 12-001-X202000100005
Description:
Selecting the right sample size is central to ensure the quality of a survey. The state of the art is to account for complex sampling designs by calculating effective sample sizes. These effective sample sizes are determined using the design effect of central variables of interest. However, in face-to-face surveys empirical estimates of design effects are often suspected to be conflated with the impact of the interviewers. This typically leads to an over-estimation of design effects and consequently risks misallocating resources towards a higher sample size instead of using more interviewers or improving measurement accuracy. Therefore, we propose a corrected design effect that separates the interviewer effect from the effects of the sampling design on the sampling variance. The ability to estimate the corrected design effect is tested using a simulation study. In this respect, we address disentangling cluster and interviewer variance. Corrected design effects are estimated for data from the European Social Survey (ESS) round 6 and compared with conventional design effect estimates. Furthermore, we show that for some countries in the ESS round 6 the estimates of conventional design effect are indeed strongly inflated by interviewer effects.
Release date: 2020-06-30
35. Robust variance estimators for generalized regression estimators in cluster samples Archived
Articles and reports: 12-001-X201900300001
Description:
Standard linearization estimators of the variance of the general regression estimator are often too small, leading to confidence intervals that do not cover at the desired rate. Hat matrix adjustments can be used in two-stage sampling that help remedy this problem. We present theory for several new variance estimators and compare them to standard estimators in a series of simulations. The proposed estimators correct negative biases and improve confidence interval coverage rates in a variety of situations that mirror ones that are met in practice.
Release date: 2019-12-17
36. Cost optimal sampling for the integrated observation of different populations Archived
Articles and reports: 12-001-X201900300004
Description:
Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.
Release date: 2019-12-17
37. A grouping genetic algorithm for joint stratification and sample allocation designs Archived
Articles and reports: 12-001-X201900300007
Description:
Finding the optimal stratification and sample size in univariate and multivariate sample design is hard when the population frame is large. There are alternative ways of modelling and solving this problem, and one of the most natural uses genetic algorithms (GA) combined with the Bethel-Chromy evaluation algorithm. The GA iteratively searches for the minimum sample size necessary to meet precision constraints in partitionings of atomic strata created by the Cartesian product of auxiliary variables. We point out a drawback with classical GAs when applied to the grouping problem, and propose a new GA approach using “grouping” genetic operators instead of traditional operators. Experiments show a significant improvement in solution quality for similar computational effort.
Release date: 2019-12-17
38. On combining independent probability samples Archived
Articles and reports: 12-001-X201900200003
Description:
Merging available sources of information is becoming increasingly important for improving estimates of population characteristics in a variety of fields. In presence of several independent probability samples from a finite population we investigate options for a combined estimator of the population total, based on either a linear combination of the separate estimators or on the combined sample approach. A linear combination estimator based on estimated variances can be biased as the separate estimators of the population total can be highly correlated to their respective variance estimators. We illustrate the possibility to use the combined sample to estimate the variances of the separate estimators, which results in general pooled variance estimators. These pooled variance estimators use all available information and have potential to significantly reduce bias of a linear combination of separate estimators.
Release date: 2019-06-27
39. An optimisation algorithm applied to the one-dimensional stratification problem Archived
Articles and reports: 12-001-X201900200006
Description:
This paper presents a new algorithm to solve the one-dimensional optimal stratification problem, which reduces to just determining stratum boundaries. When the number of strata H and the total sample size n are fixed, the stratum boundaries are obtained by minimizing the variance of the estimator of a total for the stratification variable. This algorithm uses the Biased Random Key Genetic Algorithm (BRKGA) metaheuristic to search for the optimal solution. This metaheuristic has been shown to produce good quality solutions for many optimization problems in modest computing times. The algorithm is implemented in the R package stratbr available from CRAN (de Moura Brito, do Nascimento Silva and da Veiga, 2017a). Numerical results are provided for a set of 27 populations, enabling comparison of the new algorithm with some competing approaches available in the literature. The algorithm outperforms simpler approximation-based approaches as well as a couple of other optimization-based approaches. It also matches the performance of the best available optimization-based approach due to Kozak (2004). Its main advantage over Kozak’s approach is the coupling of the optimal stratification with the optimal allocation proposed by de Moura Brito, do Nascimento Silva, Silva Semaan and Maculan (2015), thus ensuring that if the stratification bounds obtained achieve the global optimal, then the overall solution will be the global optimum for the stratification bounds and sample allocation.
Release date: 2019-06-27
40. An alternative way of estimating a cumulative logistic model with complex survey data Archived
Articles and reports: 12-001-X201900200007
Description:
When fitting an ordered categorical variable with L > 2 levels to a set of covariates onto complex survey data, it is common to assume that the elements of the population fit a simple cumulative logistic regression model (proportional-odds logistic-regression model). This means the probability that the categorical variable is at or below some level is a binary logistic function of the model covariates. Moreover, except for the intercept, the values of the logistic-regression parameters are the same at each level. The conventional “design-based” method used for fitting the proportional-odds model is based on pseudo-maximum likelihood. We compare estimates computed using pseudo-maximum likelihood with those computed by assuming an alternative design-sensitive robust model-based framework. We show with a simple numerical example how estimates using the two approaches can differ. The alternative approach is easily extended to fit a general cumulative logistic model, in which the parallel-lines assumption can fail. A test of that assumption easily follows.
Release date: 2019-06-27

Reference (29)

Reference (29) (20 to 30 of 29 results)

21. Calculation of change for annual business surveys Archived
Surveys and statistical programs – Documentation: 11-522-X19980015027
Description:
The disseminated results of annual business surveys inevitably contain statistics that are changing. Since the economic sphere is increasingly dynamic, a simple difference of aggregates between n-l and n is no longer sufficient to provide an overall description of what has happened. The change calculation module in the new generation of annual business surveys divides overall change into various components (births, deaths, inter-industry migration) and calculates change on the basis of a constant field, assigning special importance to restructurings. The main difficulties lie in establishing subsamples, reweighting, calibrating according to calculable changes, and taking account of restructuring.
Release date: 1999-10-22
22. Marginal models for repeated observations: Inference with survey data Archived
Surveys and statistical programs – Documentation: 11-522-X19980015029
Description:
In longitudinal surveys, sample subjects are observed over several time points. This feature typically leads to dependent observations on the same subject, in addition to the customary correlations across subjects induced by the sample design. Much research in the literature has focussed on modeling the marginal mean of a response as a function of covariates. Liang and Zeger (1986) used generalized estimating equations (GEE), requiring only correct specification of the marginal mean, and obtained standard errors of regression parameter estimates and associated Wald tests, assuming a "working" correlation structure for the repeated measurements on a sample subject. Rotnitzky and Jewell (1990) developed quasi-score tests and Rao-Scott adjustments to "working" quasi-score tests under marginal models. These methods are asymptotically robust to misspecification of the within-subject correlation structure, but assume independence of sample subjects which is not satisfied for complex longitudinal survey data based on stratified multi-stage sampling. We proposed asymptotically valid Wald and quasi-score tests for longitudinal survey data, using the Taylor Linearization and jackknife methods. Alternative tests, based on Rao-Scott adjustments to naive tests that ignore survey design features and on Bonferroni-t, are also developed. These tests are particularly useful when the effective degrees of freedom, usually taken as the total number of sample primary units (clusters) minus the number of strata, is small.
Release date: 1999-10-22
23. Estimating the incidence of dementia from longitudinal two-phase sampling with nonignorable missing data Archived
Surveys and statistical programs – Documentation: 11-522-X19980015030
Description:
Two-phase sampling designs have been conducted in waves to estimate the incidence of a rare disease such as dementia. Estimation of disease incidence from longitudinal dementia study has to appropriately adjust for data missing by death as well as the sampling design used at each study wave. In this paper we adopt a selection model approach to model the missing data by death and use a likelihood approach to derive incidence estimates. A modified EM algorithm is used to deal with data missing by sampling selection. The non-paramedic jackknife variance estimator is used to derive variance estimates for the model parameters and the incidence estimates. The proposed approaches are applied to data from the Indianapolis-Ibadan Dementia Study.
Release date: 1999-10-22
24. Estimation with partial overlap longitudinal samples Archived
Surveys and statistical programs – Documentation: 11-522-X19980015035
Description:
In a longitudinal survey conducted for k periods some units may be observed for less than k of the periods. Examples include, surveys designed with partially overlapping subsamples, a pure panel survey with nonresponse, and a panel survey supplemented with additional samples for some of the time periods. Estimators of the regression type are exhibited for such surveys. An application to special studies associated with the National Resources Inventory is discussed.
Release date: 1999-10-22
25. Towards a New Canadian Asset and Debt Survey - a Content Discussion Paper Archived
Notices and consultations: 13F0026M1999001
Description:
The main objectives of a new Canadian survey measuring asset and debt holding of families and individuals will be to update wealth information that is over one decade old; to improve the reliability of the wealth estimates; and, to provide a primary tool for analysing many important policy issues related to the distribution of assets and debts, future consumption possibilities, and savings behaviour that is of interest to governments, business and communities.
This paper is the document that launched the development of the new asset and debt survey, subsequently renamed the Survey of Financial Security. It looks at the conceptual framework for the survey, including the appropriate unit of measurement (family, household or person) and discusses measurement issues such as establishing an accounting framework for assets and debts. The variables proposed for inclusion are also identified. The paper poses several questions to readers and asks for comments and feedback.
Release date: 1999-03-23
26. Asset and Debt Survey: Findings of the Content Consultation Process Archived
Notices and consultations: 13F0026M1999002
Description:
This document summarizes the comments and feedback received on an earlier document: Towards a new Canadian asset and debt survey - A content discussion paper. The new asset and debt survey (now called the Survey of Financial Security) is to update the wealth information on Canadian families and unattached individuals. Since the last data collection was conducted in 1984, it was essential to include a consultative process in the development of the survey in order to obtain feedback on issues of concern and to define the conceptual framework for the survey.
Comments on the content discussion paper are summarized by major theme and sections indicate how the suggestions are being incorporated into the survey or why they could not be incorporated. This paper also mentions the main objectives of the survey and provides an overview of the survey content, revised according to the feedback from the discussion paper.
Release date: 1999-03-23
27. Proposal for an Asset and Debt Survey Archived
Surveys and statistical programs – Documentation: 13F0026M1999003
Description:
This paper presents a proposal for conducting a Canadian asset and debt survey. The first step in preparing this proposal was the release, in February 1997, of a document entitled Towards a new Canadian asset and debt survey whose intent was to elicit feedback on the initial thinking regarding the content of the survey.
This paper reviews the conceptual framework for a new asset and debt survey, data requirements, survey design, collection methodology and testing. It provides also an overview of the anticipated data processing system, describes the analysis and dissemination plan (analytical products and microdata files), and identifies the survey costs and major milestones. Finally, it presents the management/coordination approach used.
Release date: 1999-03-23
28. Sample Representativeness for the Survey of Labour and Income Dynamics Archived
Surveys and statistical programs – Documentation: 75F0002M1993019
Description:
This paper examines the issues and the procedures designed to maintain a representative sample of the population for the Survey of Labour and Income Dynamics (SLID).
Release date: 1995-12-30
29. SLID Following Rules: Who to Trace and Who to Interview Archived
Surveys and statistical programs – Documentation: 75F0002M1994001
Description:
This paper describes the Survey of Labour and Income Dynamics (SLID) following rules, which govern who is traced and who is interviewed. It also outlines the conceptual basis for these procedures.
Release date: 1995-12-30

Date modified:: 2026-05-30