Statistical methods

Key indicators

Selected geographical area:Canada

Investment in new housing construction - Canada
(August 2018)

$5,106.5 million

-2.2%

(12-month change)
Residential construction investment - Canada
(Second quarter 2018)

$36,023.7 million

7.8%

(year-over-year change)

Results

All (2,478)

All (2,478) (50 to 60 of 2,478 results)

51. Efficient Record Linkage for Large Datasets by Business Names Archived
Articles and reports: 11-522-X202500100019
Description: Accurate and efficient record linkage is crucial for maintaining a comprehensive and current Statistical Business Register (SBR) at Statistics Canada. Linking external business lists to the SBR by name presents computational and methodological challenges, especially as data volumes grow. This paper describes a scalable methodology that employs blocking techniques to constrain the computational search space and integrates multiple similarity measures—from edit distances and n-gram overlaps to embedding-based methods using Sentence-BERT (SBERT)—to identify likely matches. By combining simple character-level comparisons with more advanced semantic embedding methods, the approach can adapt to various naming conventions and complexities. While it does not guarantee superior accuracy in all circumstances, it offers a pragmatic balance between computational feasibility and linkage quality.
Release date: 2025-09-08
52. Evaluating the Accuracy when Linking Records in Waves Archived
Articles and reports: 11-522-X202500100020
Description: At Statistics Canada, many data sets are linked with quasi-identifiers such as the first name, last name, or address. In such cases, linkage errors are a potential concern and must be measured. In that regard, previous studies have shown that the evaluation may be based on modeling the number of links from a given record while accounting for all the interactions among the linkage variables and dispensing with clerical reviews, so long as the decision to link two records does not involve other records. In this communication, the methodology is adapted for a class of practical strategies, which violate this constraint by linking the records in consecutive waves, where a given wave links a subset of the records that are not linked in previous waves. In particular, the linkage may be based on a deterministic wave followed by a probabilistic one.
Release date: 2025-09-08
53. Model-Based Threshold Selection for Agricultural Linkages Archived
Articles and reports: 11-522-X202500100021
Description: Optimal threshold selection is a critical challenge in probabilistic linkage, with significant implications for the accuracy and reliability of linked datasets. This paper analyzes the performance of the neighbour model, a recently proposed error model which models linkage errors by the number of links from each record. Three threshold selection algorithms utilizing the neighbour model were assessed, highlighting the strengths and limitations of each. Their performance was assessed through simulation studies, which demonstrated that methods using the neighbour model achieved lower relative bias compared to two established methods for threshold selection. Additionally, the practical utility was validated through goodness-of-fit tests conducted on four agricultural datasets, showing the potential of the model for use in real-world applications.
Release date: 2025-09-08
54. T1 Redesign: T1 Partnership Identification Process Archived
Articles and reports: 11-522-X202500100022
Description: In Canada, T1 Tax forms are used to report personal income, whether earned as an employee or through self-employment. Income from self-employment, or "T1 Business Income" is reported by sole proprietorships or partnerships. A T1 partnership involves two or more legal entities jointly filing for a shared business. T1 business data is received as individual filings, meaning partnerships are received separately for each partner. Internal record linkage within the T1 business database is performed to identify partnerships and prevent overcoverage within the final population of T1 businesses. This new T1 partnership identification process takes advantage of newer algorithms, such as DBSCAN numerical clustering fuzzy matching, to identify internal linkages. Graph theory is used to construct the list of partnerships from the row-pairs identified in the linkage process.
Release date: 2025-09-08
55. Development of Linkage-Adjusted Weights Accounting for Gender for the 2021 Canadian Census Health and Environment Cohort Archived
Articles and reports: 11-522-X202500100023
Description: The latest Canadian Census Health and Environment Cohort (CanCHEC) continues a series of population-based microdata linkages focused on population health research by demographic, social and economic characteristics. The 2021 CanCHEC consists of 95.5% of the 2021 Census long-form sample survey records. The records of survey respondents that could not be linked to the Derived Record Depository and those presumed to be duplicates account for the remaining 4.5%. Linkage-adjusted main and replicate weights allow researchers to estimate and evaluate the variance of summary measures about population health in the presence of missed linked pairs to better understand the experiences of diverse population groups.
Release date: 2025-09-08
56. The Future of National Statistical Organisations: The Longer-Term Role and Shape of NSOs Archived
Articles and reports: 11-522-X202500100024
Description: This paper explores a vision for the future of National Statistics Offices (NSOs). It analyses the history and role of NSOs before exploring current and future challenges and opportunities for NSOs, before finally outlining a future where NSOs become more agile, open, and collaborative while maintaining their high level of trust in the community, thereby allowing them to fulfil their new role as data stewards in a rapidly evolving data landscape.
Release date: 2025-09-08
57. Statistical Inference for a Finite Population Mean with Machine Learning-Based Imputation for Missing Survey Data Archived
Articles and reports: 11-522-X202500100025
Description: National statistical offices have increasingly adopted machine learning (ML) for its potential to improve survey estimates. ML techniques offer significant advantages, notably the ability to manage high-dimensional data and to capture complex, nonlinear relationships, thereby enhancing the overall quality of survey statistics. In this article, following the approach of Chernozhukov et al. (2018), we describe a double debiased machine learning framework that enables valid statistical inference when imputed estimators are derived from ML procedures. Simulation results suggest that the proposed framework performs well in a wide range of scenarios.
Release date: 2025-09-08
58. A Safe and Inclusive Approach to Disseminating Statistical Information about the Non-binary Population in Canada Archived
Articles and reports: 11-522-X202500100026
Description: In 2022, Canada became the first country to release statistical information about its transgender and non-binary populations based on census data. Moreover, following a 2018 government-wide policy direction, Statistics Canada's surveys have been collecting and disseminating information about gender by default rather than sex at birth. Due to the small size of the transgender and non-binary populations, disseminating safe statistical information about them at detailed geographical levels poses a challenge.
Release date: 2025-09-08
59. One-Stop-Shop for Artificial Intelligence and Machine Learning for Official Statistics Archived
Articles and reports: 11-522-X202500100027
Description: Several challenges encountered when constructing U.S. administrative record-based (AR-based) population estimates for 2020 are identified. They include locational accuracy, person coverage and its consistency over time, filtering out non-residents and people not alive on the reference date, uncovering missing links across person and address records, and predicting demographic characteristics. Several ways to address these issues are discussed. Regression results illustrate how the challenges and solutions affect the AR-based county population estimates.
Release date: 2025-09-08
60. Exploration of Approaches to Small Area Estimation with Measurement Errors and their Application to Indonesian Household Surveys Archived
Articles and reports: 11-522-X202500100028
Description: The United Nations Sustainable Development Goals require detailed, disaggregated data, typically obtained through household surveys. However, surveys alone cannot meet these needs for granular statistics. To address this, National Statistical Institutes adopt small area methods, but these face challenges as auxiliary variables, often derived from surveys, introduce measurement errors into the models. The aim is the application of measurement error correction in classic Fay-Herriot area-level model. The results demonstrate the robustness of the standard approach and ignoring measurement error but show there are specific scenarios where correction for measurement errors is beneficial. The approach is applied to a case study utilizing Indonesian household survey data.
Release date: 2025-09-08

Data (10)

Data (10) ((10 results))

1. Social Policy Simulation Database and Model (SPSD/M)
Public use microdata: 89F0002X
Description: The SPSD/M is a static microsimulation model designed to analyse financial interactions between governments and individuals in Canada. It can compute taxes paid to and cash transfers received from government. It is comprised of a database, a series of tax/transfer algorithms and models, analytical software and user documentation.
Release date: 2026-02-12
2. National Address Register
Profile of a community or region: 46-26-0002
Description: The National Address Register (NAR) is a list of commercial and residential addresses in Canada that are extracted from Statistics Canada's Building Register and deemed non-confidential.
Release date: 2025-12-19
3. PASSAGES microsimulation model
Table: 89-26-0006
Description: PASSAGES is an open-source dynamic microsimulation model aimed at supporting policy analysis and research relating to Canadian retirement income system outcomes at the individual and family level. The publicly available version includes a synthetic starting database, a model, and documentation. A confidential starting database is also available.
Release date: 2025-03-12
4. Canadian Statistical Geospatial Explorer Hub Archived
Data Visualization: 71-607-X2020010
Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
Release date: 2024-08-21
5. Income divergence index (D-index) by census tract
Table: 11-10-0074-01
Geography: Census tract
Frequency: Occasional
Description:
The divergence index (D-index) describes the degree that families with different income levels are mixing together in neighbourhoods. It compares neighbourhood (census tract, CT) discrete income distributions to a base distribution, which is the income quintiles of the neighbourhood’s census metropolitan area (CMA).

Release date: 2020-06-22
6. Housing Data Viewer Archived
Data Visualization: 71-607-X2019010
Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
Release date: 2019-10-30
7. Findings of the Canadian Vehicle Fuel Pilot Survey Archived
Table: 53-500-X
Description:
This report presents the results of a pilot survey conducted by Statistics Canada to measure the fuel consumption of on-road motor vehicles registered in Canada. This study was carried out in connection with the Canadian Vehicle Survey (CVS) which collects information on road activity such as distance traveled, number of passengers and trip purpose.
Release date: 2004-10-21
8. National Tourism Indicators, Historical Estimates Archived
Table: 13-220-X
Description: In the 1997 edition, new and revised benchmarks were introduced for 1992 and 1988. The indicators are used to monitor supply, demand and employment for tourism in Canada on a timely basis. The annual tables are derived using the National Income and Expenditure Accounts (NIEA) and various industry and travel surveys. Tables providing actual data and percentage changes, for seasonally adjusted current and constant price estimates are included. In addition, an analytical section provides graphs, and time series of first differences, percentage changes, and seasonal factors for selected indicators. Data are published from 1987 and the publication will be available on the day of release. New data are included in the demand tables for non-tourism commodities produced by non-tourism industries and in the employment tables covering direct tourism employment generated by non-tourism industries. This product was commissioned by the Canadian Tourism Commission to provide annual updates for the Tourism Satellite Account.
Release date: 2003-01-08
9. Historical Statistics of Canada Archived
Table: 11-516-X
Description:
The second edition of Historical statistics of Canada was jointly produced by the Social Science Federation of Canada and Statistics Canada in 1983. This volume contains about 1,088 statistical tables on the social, economic and institutional conditions of Canada from the start of Confederation in 1867 to the mid-1970s. The tables are arranged in sections with an introduction explaining the content of each section, the principal sources of data for each table, and general explanatory notes regarding the statistics. In most cases, there is sufficient description of the individual series to enable the reader to use them without consulting the numerous basic sources referenced in the publication.
The electronic version of this historical publication is accessible on the Internet site of Statistics Canada as a free downloadable document: text as HTML pages and all tables as individual spreadsheets in a comma delimited format (CSV) (which allows online viewing or downloading).
Release date: 1999-07-29
10. National Population Health Survey Overview Archived
Table: 82-567-X
Description:
The National Population Health Survey (NPHS) is designed to enhance the understanding of the processes affecting health. The survey collects cross-sectional as well as longitudinal data. In 1994/95 the survey interviewed a panel of 17,276 individuals, then returned to interview them a second time in 1996/97. The response rate for these individuals was 96% in 1996/97. Data collection from the panel will continue for up to two decades. For cross-sectional purposes, data were collected for a total of 81,000 household residents in all provinces (except people on Indian reserves or on Canadian Forces bases) in 1996/97.
This overview illustrates the variety of information available by presenting data on perceived health, chronic conditions, injuries, repetitive strains, depression, smoking, alcohol consumption, physical activity, consultations with medical professionals, use of medications and use of alternative medicine.
Release date: 1998-07-29

Analysis (2,036)

Analysis (2,036) (2,020 to 2,030 of 2,036 results)

2,021. Some variance estimators for multistage sampling Archived
Articles and reports: 12-001-X197500254829
Description: J.N.K. Rao (1975) derived a general formula for estimating the variance in multistage sample designs. This general formula extends the previous results by Des Raj (1966) to the case where the conditional variance from a given primary sampling unit is a random variable. The authors reviewed Rao's paper for its application to Horvitz-Thompson and Yates-Grundy variance estimators as well as the variance estimator for the random group method by Rao, Hartley and Cochran (1962). The authors present an altered version of the Yates-Grundy variance estimators as a result of Rao's paper.
Release date: 1975-12-15
2,022. On the improvement of sample survey estimates Archived
Articles and reports: 12-001-X197500254830
Description: This paper focuses on the improvement of sample survey estimates in the particular situation where the survey sample, or part of it, is included in a larger sample from which auxiliary information is available. The properties of a method of estimation - sometimes applied in specific circumstances - are investigated and the limitations of its application are found. The application of the method to rotation designs in continuing surveys is more closely studied in the context of composite estimation.
Release date: 1975-12-15
2,023. The telephone experiment in the Canadian Labour Force Survey Archived
Articles and reports: 12-001-X197500254831
Description: This paper summarizes the results of a telephone experiment conducted in conjunction with the Canadian Labour Force Survey over the period June 1972 to November 1973. Included in the paper is a detailed outline of the purpose and design of the experiment. A discussion of the impact telephone interviewing had on the cost of enumeration, non-response and participation and unemployment rates is given. In addition, interviewer and respondent attitudes toward telephone interviewing are described. Finally, the paper summarizes the experiences gained from this experiment and indicates some areas where further examinations related to telephone interviewing can be carried out.
Release date: 1975-12-15
2,024. On a ratio estimate with post-stratified weighting Archived
Articles and reports: 12-001-X197500254832
Description: A ratio estimate based on an auxiliary variable is considered for the case when the sample is post-stratified using information on another auxiliary variable. The variance of the ratio estimate is derived by the method of linearization [3,4]. An application to subprovincial estimation in the Canadian Labour Force Survey is discussed.
Release date: 1975-12-15
2,025. Analytic studies of sample survey data Archived
Articles and reports: 12-001-X197500300001
Description: Most sample surveys in the past have been "descriptive" in the sense that the main objective is the computation of means or totals of a number of characters of interest along with their standard errors. However, in recent years data produced from "descriptive" surveys are also being increasingly used for "analytical" purposes, i.e., for investigating relationships among variables. Also some sample surveys might have primary "analytical goals" in which case the "optimal" designing of such "analytical surveys" becomes important.
These lecture notes present an account of some recent developments in the analytical studies of sample survey data. Many challenging problems remain to be solved and I hope these notes will provide stimulation for further research in this important area.
Release date: 1975-12-15
2,026. Measurement of response errors in Censuses and sample surveys Archived
Articles and reports: 12-001-X197500254824
Description:
Madow [1968] has proposed a two-phase sampling scheme under which response bias can be eliminated from sample surveys by obtaining “true” values for a subsample of the original sample. Often in cases of Censuses or ongoing surveys, the subsample data are not used to correct the main survey estimates but to assess their reliability. The main purpose of this paper is to present methods by which reliability estimates can be obtained when true values can be determined for a subsample of units.
Release date: 1975-12-15
2,027. Controlled random rounding Archived
Articles and reports: 12-001-X197500254825
Description:
Random rounding is a technique to ensure confidentiality of aggregate statistics. By randomly rounding all the components of a total, independently, together with the random rounding of the total itself, substantial discrepancies may arise when aggregating the published data. This paper presents a procedure which avoids substantial discrepancies while still protecting the concept of confidentiality.
Release date: 1975-12-15
2,028. The development of an automated estimation system Archived
Articles and reports: 12-001-X197500100001
Description: Although a survey is designed to satisfy a specific set of survey constraints, some steps involved in designing a survey, such as stratification, sample allocation and sample selection are common to all surveys. The steps involved in the creation of survey design systems are to identify, develop and implement common methods and procedures for such stages which, when taken together, constitute a survey design. The paper describes some methodological considerations in the development of an automated system for three methods of ratio estimation.
Release date: 1975-06-16
2,029. A computer algorithm for joint probabilities of selection (Systematic PPS sampling) Archived
Articles and reports: 12-001-X197500100002
Description: In 1962, Hartley and Rao derived an asymptotic formula for the joint probability selection for samples selected with unequal probability sampling. In 1966, Connor, derived an exact formula for this joint probability, however, his formulae were very involved. In the present paper the authors, using a modification of Connor's formula derive the exact joint probabilities using a specially designed computer algorithm.
Release date: 1975-06-16
2,030. Sample design of the Family Expenditure Survey (1974) Archived
Articles and reports: 12-001-X197500100003
Description: In order to monitor changes in expenditure patterns and, if necessary, provide information for a reweighting of the Consumer Price Index, family expenditure surveys have been carried out at approximately two year intervals since 1953.
While all of the Family Expenditure Surveys have utilized the Canadian Labour Force Survey [1] frame, the particular survey in 1974 was designed somewhat differently from earlier surveys in that segments or city blocks were specially selected for the survey and there was strict control on the sample size not adhered to in earlier surveys.
The sample design, from the considerations based on the broad requirements of the survey to the details of the sampling procedures, is described in this article.
Release date: 1975-06-16

Reference (380)

Reference (380) (20 to 30 of 380 results)

21. Methods for Constructing Life Tables for Canada, Provinces and Territories
Surveys and statistical programs – Documentation: 84-538-X
Geography: Canada
Description: This electronic publication presents the methodology underlying the production of the life tables for Canada, provinces and territories.
Release date: 2023-08-28
22. Agriculture–Population Linkage: Data quality report
Surveys and statistical programs – Documentation: 32-26-0006
Description: This report provides data quality information pertaining to the Agriculture–Population Linkage, such as sources of error, matching process, response rates, imputation rates, sampling, weighting, disclosure control methods and data quality indicators.
Release date: 2023-08-25
23. Aggregated and derived income concepts and income statistics, 2021 Census of Population
Surveys and statistical programs – Documentation: 98-20-00032021011
Description: This video explains the key concepts of different levels of aggregation of income data such as household and family income; income concepts derived from key income variables such as adjusted income and equivalence scale; and statistics used for income data such as median and average income, quartiles, quintiles, deciles and percentiles.
Release date: 2023-03-29
24. Low-income concepts and statistics, 2021 Census of Population
Surveys and statistical programs – Documentation: 98-20-00032021012
Description: This video builds on concepts introduced in the other videos on income. It explains key low-income concepts - Market Basket Measure (MBM), Low income measure (LIM) and Low-income cut-offs (LICO) and the indicators associated with these concepts such as the low-income gap and the low-income ratio. These concepts are used in analysis of the economic well-being of the population.
Release date: 2023-03-29
25. Longitudinal Immigration Database (IMDB) Technical Report, 2021
Surveys and statistical programs – Documentation: 11-633-X2022009
Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.

This report will discuss the IMDB data sources, concepts and variables, record linkage, data processing, dissemination, data evaluation and quality indicators, comparability with other immigration datasets, and the analyses possible with the IMDB.
Release date: 2022-12-05
26. Guide to the Census of Agriculture
Surveys and statistical programs – Documentation: 32-26-0002
Description: This reference guide may be useful to both new and experienced users who wish to familiarize themselves with and find specific information about the Census of Agriculture.

It provides an overview of the Census of Agriculture communications, content determination, collection, processing, data quality evaluation and dissemination activities. It also summarizes the key changes to the census and other useful information.
Release date: 2022-04-14
27. Standard Geographical Classification (SGC) Volume II. Reference Maps
Geographic files and documentation: 12-572-X
Description:
The Standard Geographical Classification (SGC) provides a systematic classification structure that categorizes all of the geographic area of Canada. The SGC is the official classification used in the Census of Population and other Statistics Canada surveys.
The classification is organized in two volumes: Volume I, The Classification and Volume II, Reference Maps.
Volume II contains reference maps showing boundaries, names, codes and locations of the geographic areas in the classification. The reference maps show census subdivisions, census divisions, census metropolitan areas, census agglomerations, census metropolitan influenced zones and economic regions. Definitions for these terms are found in Volume I, The Classification. Volume I describes the classification and related standard geographic areas and place names.
The maps in Volume II can be downloaded in PDF format from our website.
Release date: 2022-02-09
28. Longitudinal Immigration Database (IMDB) Technical Report, 2020
Surveys and statistical programs – Documentation: 11-633-X2021008
Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years. The IMDB includes Immigration, Refugees and Citizenship Canada (IRCC) administrative records which contain exhaustive information about immigrants who were admitted to Canada since 1952. It also includes data about non-permanent residents who have been issued temporary resident permits since 1980. This report will discuss the IMDB data sources, concepts and variables, record linkage, data processing, dissemination, data evaluation and quality indicators, comparability with other immigration datasets, and the analyses possible with the IMDB.
Release date: 2021-12-06
29. Statistics: Power from Data!
Surveys and statistical programs – Documentation: 12-004-X
Description:
Statistics: Power from Data! is a web resource that was created in 2001 to assist secondary students and teachers of Mathematics and Information Studies in getting the most from statistics. Over the past 20 years, this product has become one of Statistics Canada most popular references for students, teachers, and many other members of the general population. This product was last updated in 2021.

Release date: 2021-09-02
30. Multi-year Consolidated Plan for Research, Modelling and Data Development, 2021 to 2023 Archived
Surveys and statistical programs – Documentation: 11-633-X2021005
Description:
The Analytical Studies and Modelling Branch (ASMB) is the research arm of Statistics Canada mandated to provide high-quality, relevant and timely information on economic, health and social issues that are important to Canadians. The branch strategically makes use of expert knowledge and a broad range of data sources and modelling techniques to address the information needs of a broad range of government, academic and public sector partners and stakeholders through analysis and research, modeling and predictive analytics, and data development. The branch strives to deliver relevant, high-quality, timely, comprehensive, horizontal and integrated research and to enable the use of its research through capacity building and strategic dissemination to meet the user needs of policy makers, academics and the general public.
This Multi-year Consolidated Plan for Research, Modelling and Data Development outlines the priorities for the branch over the next two years.
Release date: 2021-08-12

Date modified:: 2026-05-14

Language selection

WxT Language switcher

Search and menus

WxT Search form

Statistical methods

Key indicators

Selected geographical area:Canada

Selected geographical area:Newfoundland and Labrador

Selected geographical area:Prince Edward Island

Selected geographical area:Nova Scotia

Selected geographical area:New Brunswick

Selected geographical area:Quebec

Selected geographical area:Ontario

Selected geographical area:Manitoba

Selected geographical area:Saskatchewan

Selected geographical area:Alberta

Selected geographical area:British Columbia

Selected geographical area:Yukon

Selected geographical area:Northwest Territories

Selected geographical area:Nunavut

Filter results by

Keyword(s)

Subject

Results

All (2,478) (50 to 60 of 2,478 results)

Data (10) ((10 results))

Analysis (2,036) (2,020 to 2,030 of 2,036 results)

Reference (380) (20 to 30 of 380 results)

Statistical methods

Key indicators

Selected geographical area:Canada

Selected geographical area:Newfoundland and Labrador

Selected geographical area:Prince Edward Island

Selected geographical area:Nova Scotia

Selected geographical area:New Brunswick

Selected geographical area:Quebec

Selected geographical area:Ontario

Selected geographical area:Manitoba

Selected geographical area:Saskatchewan

Selected geographical area:Alberta

Selected geographical area:British Columbia

Selected geographical area:Yukon

Selected geographical area:Northwest Territories

Selected geographical area:Nunavut

Filter results by

Keyword(s)

Subject

Results

All (2,478) (50 to 60 of 2,478 results)

Data (10) ((10 results))

Analysis (2,036) (2,020 to 2,030 of 2,036 results)

Reference (380) (20 to 30 of 380 results)

How are the results ordered?

How are the results ordered?

How do I use the filters and the search box?

How do I refine my search?

How does the search work?