Data analysis
Filter results by
Search HelpKeyword(s)
Type
Survey or statistical program
- Census of Population (13)
- Canadian Community Health Survey - Annual Component (7)
- Labour Force Survey (7)
- Survey of Household Spending (6)
- Canadian Income Survey (4)
- Survey of Labour and Income Dynamics (3)
- Longitudinal Immigration Database (3)
- Canadian Health Measures Survey (3)
- Gross Domestic Product by Industry - National (Monthly) (2)
- Monthly Oil and Other Liquid Petroleum Products Pipeline Survey (2)
- Uniform Crime Reporting Survey (2)
- Census of Agriculture (2)
- Households and the Environment Survey (2)
- Time Use Survey (2)
- Biennial Drinking Water Plants Survey (2)
- Longitudinal Employment Analysis Program (2)
- Canada's International Transactions in Services (1)
- Waste Management Industry Survey: Government Sector (1)
- National Balance Sheet Accounts (1)
- National Gross Domestic Product by Income and by Expenditure Accounts (1)
- National Tourism Indicators (1)
- Biennial Waste Management Survey (1)
- Monthly Electricity Supply and Disposition Survey (1)
- Annual Electricity Supply and Disposition Survey (1)
- Consumer Price Index (1)
- Monthly New Motor Vehicle Sales Survey (1)
- Survey of Employment, Payrolls and Hours (1)
- Survey of Financial Security (1)
- Monthly Passenger Bus and Urban Transit Survey (1)
- Stock and Consumption of Fixed Non-residential Capital (1)
- Tuition and Living Accommodation Costs (1)
- Vital Statistics - Death Database (1)
- Annual Demographic Estimates: Canada, Provinces and Territories (1)
- Homeowner Repair and Renovation Survey (1)
- Annual Income Estimates for Census Families and Individuals (T1 Family File) (1)
- Annual Survey of Research and Development in Canadian Industry (1)
- Research and Development of Canadian Private Non-Profit Organizations (1)
- General Social Survey - Victimization (1)
- Postsecondary Student Information System (1)
- General Social Survey - Social Identity (1)
- Culture Services Trade (1)
- Canadian Community Health Survey - Nutrition (1)
- Canadian System of Environmental-Economic Accounts - Physical Flow Accounts (1)
- Air Quality Indicators (1)
- Freshwater Quality Indicator (1)
- Longitudinal and International Study of Adults (1)
- Government Finance Statistics (1)
- National Household Survey (1)
- Gross Domestic Expenditures on Research and Development (1)
- Survey of Safety in Public and Private Spaces (1)
- Canadian Housing Statistics Program (1)
- Study on International Money Transfers (1)
- Canadian Housing Survey (1)
- Survey on Early Learning and Child Care Arrangements (SELCCA) (1)
- Canadian Perspectives Survey Series (CPSS) (1)
- Canada Mortgage and Housing Corporation (1)
Results
All (289)
All (289) (0 to 10 of 289 results)
- Articles and reports: 36-28-0001202600500003Description: This spotlight article outlines practical methods for assessing the economic impacts of public programs delivered by federal agencies and Crown corporations. It summarizes key steps in conducting quantitative impact analysis, including data linkage, cohort construction and implementation of quasi causal estimators.Release date: 2026-05-27
- Journals and periodicals: 11-633-XDescription: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.Release date: 2026-05-27
- Surveys and statistical programs – Documentation: 11-633-X2026001Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.Release date: 2026-03-05
- Articles and reports: 12-001-X202500200004Description: The class of generalized linear models (GLM) is a flexible generalization of ordinary least squares regression that allows the linear model to be related to the response variable via a link function and assumes the magnitude of the variance of each measurement to be a function of its predicted value. Multicollinearity in GLMs can inflate variances of the estimated coefficients and cause poor prediction in certain regions of the regression space. It may also cause a nonsignificant Wald statistic even when the predictors are highly predictive in a model of the family of GLMs. Little previous research has closely investigated the diagnostics of multicollinearity in GLMs, especially when complex survey data are used. In this paper, we develop variance inflation factors (VIFs) that measure the amount that the variance of a parameter estimator is increased due to multicollinearity in GLMs. We also extend VIFs and condition indexes to apply to complex survey data, accounting for design features, e.g. weights, clusters, and strata. Illustrations of these methods are given using data from a household survey of health and nutrition.Release date: 2025-12-23
- Stats in brief: 89-20-00062025001Description: This video is designed to help you critically assess the data presented to you. No data is perfect. By understanding the strengths and limitations of the data, you can avoid being misled—and make smarter, more informed decisions.Release date: 2025-12-15
- Articles and reports: 11-522-X202500100010Description: Statistics Canada's Labour Force Survey (LFS) plays an essential role in the estimation of labour market conditions in Canada. Periodically, LFS revises its data to the most recent industry and occupational classification versions. Differences in versions can be extensive, including high-level and unit-group structural changes, creations, deletions, split-offs and combination of classification units (classes). Historically, to reconcile split-off classes - where one class splits into multiple classes - a sample of LFS split-off records would be manually recoded to the new classification version. Based on the split-off proportion observed in the recoded sample, a random allocation method would be applied on all data to reflect the changing Canadian labour market over time. This article proposes using machine learning (fastText), constrained to split-off proportions using linear programming, to revise industry and occupation classifications in LFS. The hybrid framework benefits from a text-based revision mechanism while adhering to traditional proportions driven estimates, thus ensuring a minimal impact on the comparability of published labour market indicators.Release date: 2025-09-08
- Articles and reports: 36-28-0001202500300002Description: Government programs are evaluated to measure their effectiveness. This article discusses the benefits of using Statistics Canada data combined with the data collected from the government program to provide a far more comprehensive evaluation than program data alone can offer. The article also summarizes a recent example of a program evaluation that benefited from Statistics Canada data and the expertise of Statistics Canada researchers in analyzing the data.Release date: 2025-03-26
- Articles and reports: 12-001-X202400200004Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.Release date: 2024-12-20
- Articles and reports: 12-001-X202400200005Description: Adaptive survey designs (ASDs) tailor recruitment protocols to population subgroups that are relevant to a survey. In recent years, effective ASD optimization has been the topic of research and several applications. However, the performance of an optimized ASD over time is sensitive to time changes in response propensities. How adaptation strategies can adjust to such variation over time is not yet fully understood. In this paper, we propose a robust optimization approach in the context of sequential mixed-mode surveys employing Bayesian analysis. The approach is formulated as a mathematical programming problem that explicitly accounts for uncertainty due to time change. ASD decisions can then be made by considering time-dependent variation in conditional mode response propensities and between-mode correlations in response propensities. The approach is demonstrated using a case study: the 2014-2017 Dutch Health Survey. We evaluate the sensitivity of ASD performance to 1) the budget level and 2) the length of applicable historic time-series data. We find there is only a moderate dependence on the budget level and the dependence on historic data is moderated by the amount of seasonality during the year.Release date: 2024-12-20
- Surveys and statistical programs – Documentation: 11-633-X2024004Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.Release date: 2024-12-09
- Previous Go to previous page of All results
- 1 (current) Go to page 1 of All results
- 2 Go to page 2 of All results
- 3 Go to page 3 of All results
- 4 Go to page 4 of All results
- 5 Go to page 5 of All results
- 6 Go to page 6 of All results
- 7 Go to page 7 of All results
- ...
- 29 Go to page 29 of All results
- Next Go to next page of All results
Data (2)
Data (2) ((2 results))
- 1. Canadian Statistical Geospatial Explorer Hub ArchivedData Visualization: 71-607-X2020010Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.Release date: 2024-08-21
- 2. Housing Data Viewer ArchivedData Visualization: 71-607-X2019010Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.Release date: 2019-10-30
Analysis (256)
Analysis (256) (40 to 50 of 256 results)
- Articles and reports: 11-522-X202100100018Description: Statistics Finland started publishing nowcasts of the trend indicator of output (TIO), the monthly indicator of real economic activity, to answer users´ needs during the Covid-19 pandemic. The indicator was first published in April 2020, at the very beginning of the pandemic in Finland, and had a monthly release schedule until June 2021. The TIO nowcasts are produced using open-source data on truck traffic volumes at about 100 automatic measuring points in the Helsinki/Uusimaa -region and the Economic Sentiment Indicator for Finland. Estimation is done using a machine learning approach and the methodology is based on previous work done by Statistics Finland and ETLA Economic Research.
Key Words: nowcasting; flash estimates; machine learning; experimental statistics.
Release date: 2021-10-29 - Articles and reports: 11-522-X202100100025Description:
We propose a longitudinal analysis with a point of view connected to the organizational changes that have taken place in the Italian National Institute of Statistics in recent years. In 2016 the Institute introduced a new Directorate, intending to standardize and generalize the business process of Data Collection according to the European standard of the GAMSO model. The paper discusses the pros and cons of this change from the perspective of the survey's participation. The ICT survey response rate analysis demonstrates an increase of around 20% since the beginning of the new organization: the paper tries to focus on the impact of the changes introduced with the new organization. We focused our attention on two specific subsets of respondents - the so-called "wanted" - the ones who have never answered to an ICT survey or to any other Istat survey and - the so-called “lost” - the ones included in two consecutive survey’s samples and that answered in the previous edition but not in the current one. The paper aims to illustrate how an efficient organization of data collection reflects its benefits on survey results and what kind of actions should be taken to catch the attention of the "wanted". Finally, we apply a logistic model measuring the probability that an enterprise responding in 2018 (t-1) also answered in 2019 (t). All the analysis suggests some actions that could be taken to improve respondents' participation, data quality, and respondents' perception of the official statistics.
Key Words: data collection strategy, response rate, paradata, response burden, ICT Survey.
Release date: 2021-10-29 - Articles and reports: 11-522-X202100100026Description:
The Government of Canada’s Directive on Open Government aims to ensure that Canadians have greater access to government data and information. One solution for open data is smart synthetic files, which retain as much analytical value as possible and take into account confidentiality issues that arise from collecting personal information. In recent years, Statistics Canada has acquired a recognized expertise in producing synthetic data files of high analytical value. In a current project, Statistics Canada is tackling a new challenge to synthesize a database and preserve hierarchical structures in the form of families, where records are linked and share common traits that must be maintained. These challenges are also encountered when synthesizing structured data such as business data. This paper presents the challenges and solutions for building synthetic data with such hierarchical structures. Application of this strategy will be illustrated with the development of a synthetic database that supports the development of retirement income policies. This database includes over 20 variables and 8 million records structured into approximately 4 million family units. We will present how family structures have been preserved, discuss the practical and technical challenges inherent in developing such a large and complex database, present the risk and utility of the data, and propose avenues for future research.
Keywords: synthetic data of high analytical value; family structures; modern data access solution.
Release date: 2021-10-29 - Articles and reports: 11-522-X202100100027Description:
Privacy concerns are a barrier to applying remote analytics, including machine learning, on sensitive data via the cloud. In this work, we use a leveled fully Homomorphic Encryption scheme to train an end-to-end supervised machine learning algorithm to classify texts while protecting the privacy of the input data points. We train our single-layer neural network on a large simulated dataset, providing a practical solution to a real-world multi-class text classification task. To improve both accuracy and training time, we train an ensemble of such classifiers in parallel using ciphertext packing.
Key Words: Privacy Preservation, Machine Learning, Encryption
Release date: 2021-10-29 - 45. Statistical Disclosure Control and Developments in Formal Privacy: In Memoriam to Chris Skinner ArchivedArticles and reports: 11-522-X202100100022Description:
I provide an overview of the evolution of Statistical Disclosure Control (SDC) research over the last decades and how it has evolved to handle the data revolution with more formal definitions of privacy. I emphasize the many contributions by Chris Skinner in the research areas of SDC. I will review his seminal research, starting in the 1990’s with his work on the release of UK Census sample microdata. This led to a wide-range of research on measuring the risk of re-identification in survey microdata through probabilistic models. I also focus on other aspects of Chris’ research in SDC. Chris was the recipient of the 2019 Waksberg Award and sadly never got a chance to present his Waksberg Lecture at the Statistics Canada International Methodology Symposium. This paper follows the outline that Chris had prepared in preparation for that lecture, and provided to me by his son, Tom Skinner. Keywords: Risk of Re-identification, Data Revolution, Privacy Models, Differential Privacy
Release date: 2021-10-22 - Articles and reports: 11-522-X202100100021Description: Istat has started a new project for the Short Term statistical processes, to satisfy the coming new EU Regulation to release estimates in a shorter time. The assessment and analysis of the current Short Term Survey on Turnover in Services (FAS) survey process, aims at identifying how the best features of the current methods and practices can be exploited to design a more “efficient” process. In particular, the project is expected to release methods that would allow important economies of scale, scope and knowledge to be applied in general to the STS productive context, usually working with a limited number of resources. The analysis of the AS-IS process revealed that the FAS survey incurs substantial E&I costs, especially due to intensive follow-up and interactive editing that is used for every type of detected errors. In this view, we tried to exploit the lessons learned by participating to the High-Level Group for the Modernisation of Official Statistics (HLG-MOS, UNECE) about the Use of Machine Learning in Official Statistics. In this work, we present a first experiment using Random Forest models to: (i) predict which units represent “suspicious” data, (ii) to assess the prediction potential use over new data and (iii) to explore data to identify hidden rules and patterns. In particular, we focus on the use of Random Forest modelling to compare some alternative methods in terms of error prediction efficiency and to address the major aspects for the new design of the E&I scheme.Release date: 2021-10-15
- Articles and reports: 12-001-X202100100003Description:
One effective way to conduct statistical disclosure control is to use scrambled responses. Scrambled responses can be generated by using a controlled random device. In this paper, we propose using the sample empirical likelihood approach to conduct statistical inference under complex survey design with scrambled responses. Specifically, we propose using a Wilk-type confidence interval for statistical inference. Our proposed method can be used as a general tool for inference with confidential public use survey data files. Asymptotic properties are derived, and the limited simulation study verifies the validity of theory. We further apply the proposed method to some real applications.
Release date: 2021-06-24 - 48. Statistics 101: Correlation and Causality ArchivedStats in brief: 89-20-00062021002Description:
This video is intended for viewers who wish to gain a basic understanding of correlation and causality. As a prerequisite, before beginning this video, we highly recommend having already completed our videos titled “What is Data? An Introduction to Data Terminology and Concepts” and “Types of Data: Understanding and Exploring Data”.
Release date: 2021-05-03 - Articles and reports: 11-633-X2021003Description: Canada continues to experience an opioid crisis. While there is solid information on the demographic and geographic characteristics of people experiencing fatal and non-fatal opioid overdoses in Canada, there is limited information on the social and economic conditions of those who experience these events. To fill this information gap, Statistics Canada collaborated with existing partnerships in British Columbia, including the BC Coroners Service, BC Stats, the BC Centre for Disease Control and the British Columbia Ministry of Health, to create the Statistics Canada British Columbia Opioid Overdose Analytical File (BC-OOAF).Release date: 2021-02-17
- Articles and reports: 11-633-X2021001Description:
Using data from the Canadian Housing Survey, this project aimed to construct a measure of social inclusion, using indicators identified by the Canada Mortgage and Housing Corporation (CMHC), to report a social inclusion score for each geographic stratum separately for dwellings that are and are not in social and affordable housing. This project also sought to examine associations between social inclusion and a set of economic, social and health variables.
Release date: 2021-01-05
- Previous Go to previous page of Analysis results
- 1 Go to page 1 of Analysis results
- 2 Go to page 2 of Analysis results
- 3 Go to page 3 of Analysis results
- 4 Go to page 4 of Analysis results
- 5 (current) Go to page 5 of Analysis results
- 6 Go to page 6 of Analysis results
- 7 Go to page 7 of Analysis results
- ...
- 26 Go to page 26 of Analysis results
- Next Go to next page of Analysis results
Reference (26)
Reference (26) (0 to 10 of 26 results)
- Surveys and statistical programs – Documentation: 11-633-X2026001Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.Release date: 2026-03-05
- Surveys and statistical programs – Documentation: 11-633-X2024004Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.Release date: 2024-12-09
- Surveys and statistical programs – Documentation: 11-633-X2024001Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.Release date: 2024-01-22
- Surveys and statistical programs – Documentation: 32-26-0006Description: This report provides data quality information pertaining to the Agriculture–Population Linkage, such as sources of error, matching process, response rates, imputation rates, sampling, weighting, disclosure control methods and data quality indicators.Release date: 2023-08-25
- Surveys and statistical programs – Documentation: 98-20-00032021011Description: This video explains the key concepts of different levels of aggregation of income data such as household and family income; income concepts derived from key income variables such as adjusted income and equivalence scale; and statistics used for income data such as median and average income, quartiles, quintiles, deciles and percentiles.Release date: 2023-03-29
- Surveys and statistical programs – Documentation: 98-20-00032021012Description: This video builds on concepts introduced in the other videos on income. It explains key low-income concepts - Market Basket Measure (MBM), Low income measure (LIM) and Low-income cut-offs (LICO) and the indicators associated with these concepts such as the low-income gap and the low-income ratio. These concepts are used in analysis of the economic well-being of the population.Release date: 2023-03-29
- Surveys and statistical programs – Documentation: 11-633-X2022009Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.
This report will discuss the IMDB data sources, concepts and variables, record linkage, data processing, dissemination, data evaluation and quality indicators, comparability with other immigration datasets, and the analyses possible with the IMDB.
Release date: 2022-12-05 - Notices and consultations: 98-26-0001Description:
This white paper presents Statistics Canada’s planned approach to the 2021 Census of Population and provides a clear explanation of the processes behind the census program, touching on historical, legal, operational and content aspects. Statistics Canada recognizes that it is important to not only successfully conduct the census, but also to be transparent and informative about the way in which those efforts are accomplished. Painting a Portrait of Canada: The 2021 Census of Population gives readers an exclusive, detailed look at how census data is collected, analyzed and given back to Canadians, in the form of high-quality statistical information, used to make evidence-based decisions in Canadian society.
Release date: 2020-07-20 - Surveys and statistical programs – Documentation: 91F0015M2016012Description:
This article provides information on using family-related variables from the microdata files of Canada’s Census of Population. These files exist internally at Statistics Canada, in the Research Data Centres (RDCs), and as public-use microdata files (PUMFs). This article explains certain technical aspects of all three versions, including the creation of multi-level variables for analytical purposes.
Release date: 2016-12-22 - 10. The Data Warehouse and analytical tools to facilitate the integration of the Canadian Macroeconomic Accounts ArchivedSurveys and statistical programs – Documentation: 11-522-X201700014710Description:
The Data Warehouse has modernized the way the Canadian System of Macroeconomic Accounts (MEA) are produced and analyzed today. Its continuing evolution facilitates the amounts and types of analytical work that is done within the MEA. It brings in the needed element of harmonization and confrontation as the macroeconomic accounts move toward full integration. The improvements in quality, transparency, and timeliness have strengthened the statistics that are being disseminated.
Release date: 2016-03-24