Data analysis

Results

All (289)

All (289) (0 to 10 of 289 results)

1. Quantitative impact analysis: A practical overview
Articles and reports: 36-28-0001202600500003
Description: This spotlight article outlines practical methods for assessing the economic impacts of public programs delivered by federal agencies and Crown corporations. It summarizes key steps in conducting quantitative impact analysis, including data linkage, cohort construction and implementation of quasi causal estimators.
Release date: 2026-05-27
2. Analytical Studies: Methods and References
Journals and periodicals: 11-633-X
Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
Release date: 2026-05-27
3. When and How to Use Area-Level Measures for Health Analysis: A Review and Recommendation Report
Surveys and statistical programs – Documentation: 11-633-X2026001
Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.
Release date: 2026-03-05
4. Collinearity diagnostics in generalized linear models fitted with survey data
Articles and reports: 12-001-X202500200004
Description: The class of generalized linear models (GLM) is a flexible generalization of ordinary least squares regression that allows the linear model to be related to the response variable via a link function and assumes the magnitude of the variance of each measurement to be a function of its predicted value. Multicollinearity in GLMs can inflate variances of the estimated coefficients and cause poor prediction in certain regions of the regression space. It may also cause a nonsignificant Wald statistic even when the predictors are highly predictive in a model of the family of GLMs. Little previous research has closely investigated the diagnostics of multicollinearity in GLMs, especially when complex survey data are used. In this paper, we develop variance inflation factors (VIFs) that measure the amount that the variance of a parameter estimator is increased due to multicollinearity in GLMs. We also extend VIFs and condition indexes to apply to complex survey data, accounting for design features, e.g. weights, clusters, and strata. Illustrations of these methods are given using data from a household survey of health and nutrition.
Release date: 2025-12-23
5. Fighting Misinformation
Stats in brief: 89-20-00062025001
Description: This video is designed to help you critically assess the data presented to you. No data is perfect. By understanding the strengths and limitations of the data, you can avoid being misled—and make smarter, more informed decisions.
Release date: 2025-12-15
6. Life in the FastText Lane: Harnessing Linear Programming Constrained Machine Learning for Classifications Revision Archived
Articles and reports: 11-522-X202500100010
Description: Statistics Canada's Labour Force Survey (LFS) plays an essential role in the estimation of labour market conditions in Canada. Periodically, LFS revises its data to the most recent industry and occupational classification versions. Differences in versions can be extensive, including high-level and unit-group structural changes, creations, deletions, split-offs and combination of classification units (classes). Historically, to reconcile split-off classes - where one class splits into multiple classes - a sample of LFS split-off records would be manually recoded to the new classification version. Based on the split-off proportion observed in the recoded sample, a random allocation method would be applied on all data to reflect the changing Canadian labour market over time. This article proposes using machine learning (fastText), constrained to split-off proportions using linear programming, to revise industry and occupation classifications in LFS. The hybrid framework benefits from a text-based revision mechanism while adhering to traditional proportions driven estimates, thus ensuring a minimal impact on the comparability of published labour market indicators.
Release date: 2025-09-08
7. Leveraging Statistics Canada data integration opportunities for program evaluation
Articles and reports: 36-28-0001202500300002
Description: Government programs are evaluated to measure their effectiveness. This article discusses the benefits of using Statistics Canada data combined with the data collected from the government program to provide a far more comprehensive evaluation than program data alone can offer. The article also summarizes a recent example of a program evaluation that benefited from Statistics Canada data and the expertise of Statistics Canada researchers in analyzing the data.
Release date: 2025-03-26
8. Bayesian predictive inference of a finite population mean without specifying the relation between the study variable and the covariates
Articles and reports: 12-001-X202400200004
Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.
Release date: 2024-12-20
9. Robust adaptive survey design for time changes in mixed-mode response propensities
Articles and reports: 12-001-X202400200005
Description: Adaptive survey designs (ASDs) tailor recruitment protocols to population subgroups that are relevant to a survey. In recent years, effective ASD optimization has been the topic of research and several applications. However, the performance of an optimized ASD over time is sensitive to time changes in response propensities. How adaptation strategies can adjust to such variation over time is not yet fully understood. In this paper, we propose a robust optimization approach in the context of sequential mixed-mode surveys employing Bayesian analysis. The approach is formulated as a mathematical programming problem that explicitly accounts for uncertainty due to time change. ASD decisions can then be made by considering time-dependent variation in conditional mode response propensities and between-mode correlations in response propensities. The approach is demonstrated using a case study: the 2014-2017 Dutch Health Survey. We evaluate the sensitivity of ASD performance to 1) the budget level and 2) the length of applicable historic time-series data. We find there is only a moderate dependence on the budget level and the dependence on historic data is moderated by the amount of seasonality during the year.
Release date: 2024-12-20
10. Longitudinal Immigration Database (IMDB) Technical Report, 2023
Surveys and statistical programs – Documentation: 11-633-X2024004
Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.
Release date: 2024-12-09

Data (2)

Data (2) ((2 results))

1. Canadian Statistical Geospatial Explorer Hub Archived
Data Visualization: 71-607-X2020010
Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
Release date: 2024-08-21
2. Housing Data Viewer Archived
Data Visualization: 71-607-X2019010
Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
Release date: 2019-10-30

Analysis (256)

Analysis (256) (180 to 190 of 256 results)

181. Perfoming logistic regression on survey data with the new surveylogistic procedure Archived
Articles and reports: 11-522-X20020016723
Description:
Categorical outcomes, such as binary, ordinal and nominal responses, occur often in survey research. Logistic regression investigates the relationship between such categorical responses variables and a set of explanatory variables. The LOGISTIC procedure can be used to perform a logistic analysis on data from a random sample. However, this approach is not valid if the data come from other sample designs, such as complex survey designs with stratification, clustering and/or unequal weighting. In these cases, specialized techniques must be applied in order to produce the appropriate estimates and standard errors.
The SURVEYLOGISTIC procedure, experimental in Version 9, brings logistic regression for survey data to the SAS System and delivers much of the functionality of the LOGISTIC procedure. This paper describes the methodological approach and applications for this new software.
Release date: 2004-09-13
182. The analysis of survey data using Stata: Some recent developments Archived
Articles and reports: 11-522-X20020016724
Description:
Some of the most commonly used statistical models are fitted using maximum likelihood (ML) or some extension of ML. Stata's ML command provides researchers and data analysts with a tool to develop estimation commands to fit their models using their data. Such models may include multiple equations, clustered observations, sampling weights and other survey design characteristics. These elements are discussed in this paper.
Release date: 2004-09-13
183. Bridging multiple-race responses in the U.S. Census to single-race categories for the calculation of vital rates Archived
Articles and reports: 11-522-X20020016725
Description:
In 1997, the US Office of Management and Budget issued revised standards for the collection of race information within the federal statistical system. One revision allows individuals to choose more than one race group when responding to federal surveys and other federal data collections. This change presents challenges for analyses that involve data collected under both the old and new race-reporting systems, since the data on race are not comparable. The following paper discusses the problems encountered by these changes and methods developed to overcome them.
Since most people under both systems report only a single race, a common proposed solution is to try to bridge the transition by assigning a single-race category to each multiple-race reporter under the new system, and to conduct analyses using just the observed and assigned single-race categories. Thus, the problem can be viewed as a missing-data problem, in which single-race responses are missing for multiple-race reporters and needing to be imputed.
The US Office of Management and Budget suggested several simple bridging methods to handle this missing-data problem. Schenker and Parker (Statistics in Medicine, forthcoming) analysed data from the National Health Interview Survey of the US National Center for Health Statistics, which allows multiple-race reporting but also asks multiple-race reporters to specify a primary race, and found that improved bridging methods could result from incorporating individual-level and contextual covariates into the bridging models.
While Schenker and Parker discussed only three large multiple-race groups, the current application requires predicting single-race categories for several small multiple-race groups as well. Thus, problems of sparse data arise in fitting the bridging models. We address these problems by building combined models for several multiple-race groups, thus borrowing strength across them. These and other methodological issues are discussed.
Release date: 2004-09-13
184. WesVar: Software for analyzing data from complex surveys Archived
Articles and reports: 11-522-X20020016728
Description:
Nearly all surveys use complex sampling designs to collect data and these data are frequently used for statistical analyses beyond the estimation of simple descriptive parameters of the target population. Many procedures available in popular statistical software packages are not appropriate for this purpose because the analyses are based on the assumption that the sample has been drawn with simple random sampling. Therefore, the results of the analyses conducted using these software packages would not be valid when the sample design incorporates multistage sampling, stratification, or clustering. Two commonly used methods for analysing data from complex surveys are replication and Taylor linearization techniques. We discuss the use of WESVAR software to compute estimates and replicate variance estimates by properly reflecting complex sampling and estimation procedures. We also illustrate the WESVAR features by using data from two Westat surveys that employ complex survey designs: the Third International Mathematics and Science Study (TIMSS) and the National Health and Nutrition Examination Survey (NHANES).
Release date: 2004-09-13
185. Using IRT and factor scores in regression and other analyses: A review Archived
Articles and reports: 11-522-X20020016731
Description:
Behavioural researchers use a variety of techniques to predict respondent scores on constructs that are not directly observable. Examples of such constructs include job satisfaction, work stress, aptitude for graduate study, children's mathematical ability, etc. The techniques commonly used for modelling and predicting scores on such constructs include factor analysis, classical psychometric scaling and item response theory (IRT), and for each technique there are often several different strategies that can be used to generate individual scores. However, researchers are seldom satisfied with simply measuring these constructs. They typically use the derived scores in multiple regression, analysis of variance and numerous multivariate procedures. Though using predicted scores in this way can result in biased estimates of model parameters, not all researchers are aware of this difficulty. The paper will review the literature on this issue, with particular emphasis on IRT methods. Problems will be illustrated, some remedies suggested, and areas for further research will be identified.
Release date: 2004-09-13
186. Analysis of dose-response relationships on complex survey data Archived
Articles and reports: 11-522-X20020016732
Description:
Analysis of dose-response relationships has long been important in toxicology. More recently, this type of analysis has been employed to evaluate public education campaigns. The data that are collected in such evaluations are likely to come from standard household survey designs with all the usual complexities of multiple stages, stratification and variable selection probabilities. On a recent evaluation, a system was developed with the following features: categorization of doses into three or four levels, propensity scoring of dose selection and a new jack-knifed Jonckheere-Terpstra test for a monotone dose-response relationship. This system allows rapid production of tests for monotone dose-response relationships that are corrected both for sample design and for confounding. The focus of this paper will be the results of a Monte-Carlo simulation of the properties of the jack-knifed Jonckheere-Terpstra.
Moreover, there is no experimental control over dosages and the possibility of confounding variables must be considered. Standard regressions in WESVAR and SUDAAN could be used to determine if there is a linear dose-response relationship while controlling on confounders, but such an approach obviously has low power to detect nonlinear but monotone dose-response relationships and is time-consuming to implement if there are a large number of possible outcomes of interest.
Release date: 2004-09-13
187. Longitudinal analysis of LabLongitudinal analysis of Labour Force Survey Data Archived
Articles and reports: 11-522-X20020016739
Description:
The Labour Force Survey (LFS) was not designed to be a longitudinal survey. However, given that respondent households typically remain in the sample for six consecutive months, it is possible to reconstruct six-month fragments of longitudinal data from the monthly records of household members. Such longitudinal data (altogether consisting of millions of person-months of individual- and family-level data) is useful for analyses of monthly labour market dynamics over relatively long periods of time, 20 years and more.
We make use of these data to estimate hazard functions describing transitions among the labour market states: self-employed, paid employee and not employed. Data on job tenure for the employed, and data on the date last worked for the not employed - together with the date of survey responses - permit the estimated models to include terms reflecting seasonality and macro-economic cycles, as well as the duration dependence of each type of transition. In addition, the LFS data permit spouse labour market activity and family composition variables to be included in the hazard models as time-varying covariates. The estimated hazard equations have been included in the LifePaths socio-economic microsimulation model. In this setting, the equations may be used to simulate lifetime employment activity from past, present and future birth cohorts. Cross-sectional simulation results have been used to validate these models by comparisons with census data from the period 1971 to 1996.
Release date: 2004-09-13
188. A causal event history approach to interrelated family events Archived
Articles and reports: 11-522-X20020016742
Geography: Canada
Description:
One of the most important advances brought about by life course and event history studies is the use of parallel or interdependent processes as explaining factors in transition rate models. The purpose of this paper is to demonstrate a causal approach to the study of interrelated family events. Various types of interdependent processes are described first, followed by two event history perspectives: the 'system' and 'causal' approaches. The authors assert that the causal approach is more appropriate from an analytical point of view as it provides a straightforward solution to simultaneity, cause-effect lags, and temporal shapes of effects. Based on comparative cross-national applications in West and East Germany, Canada, Latvia and the Netherlands, the usefulness of the causal approach is demonstrated by analysing two highly interdependent family processes: entry into marriage (for individuals who are in a consensual union) as the dependent process, and first pregnancy/childbirth as the explaining one. Both statistical and theoretical explanations are explored emphasizing the need for conceptual reasoning.
Release date: 2004-09-13
189. Modelling and analysis of duration data from longitudinal surveys Archived
Articles and reports: 11-522-X20020016743
Description:
There is much interest in using data from longitudinal surveys to help understand life history processes such as education, employment, fertility, health and marriage. The analysis of data on the durations of spells or sojourns that individuals spend in certain states (e.g., employment, marriage) is a primary tool in studying such processes. This paper examines methods for analysing duration data that address important features associated with longitudinal surveys: the use of complex survey designs in heterogeneous populations; missing or inaccurate information about the timing of events; and the possibility of non-ignorable dropout or censoring mechanisms. Parametric and non-parametric techniques for estimation and for model checking are considered. Both new and existing methodology are proposed and applied to duration data from Canada's Survey of Labour and Income Dynamics (SLID).
Release date: 2004-09-13
190. Analyzing developmental trajectories: An overview of a group-based approach Archived
Articles and reports: 11-522-X20020016744
Description:
A developmental trajectory describes the course of a behaviour over age or time. This technical paper provides an overview of a semi-parametric, group-based method for analysing developmental trajectories. This methodology provides an alternative to assuming a homogenous population of trajectories as is done in standard growth modelling.
Four capabilities are described: (1) the capability to identify, rather than assume, distinctive groups of trajectories; (2) the capability to estimate the proportion of the population following each such trajectory group; (3) the capability to relate group membership probability to individual characteristics and circumstances; and (4) the capability to use the group membership probabilities for various other purposes, such as creating profiles of group members.
In addition, two important extensions of the method are described: the capability to add time-varying covariates to trajectory models and the capability to estimate joint trajectory models of distinct but related behaviours. The former provides the statistical capacity for testing if a contemporary factor, such as an experimental intervention or a non-experimental event like pregnancy, deflects a pre-existing trajectory. The latter provides the capability to study the unfolding of distinct but related behaviours such as problematic childhood behaviour and adolescent drug abuse.
Release date: 2004-09-13

Reference (26)

Reference (26) (0 to 10 of 26 results)

1. When and How to Use Area-Level Measures for Health Analysis: A Review and Recommendation Report
Surveys and statistical programs – Documentation: 11-633-X2026001
Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.
Release date: 2026-03-05
2. Longitudinal Immigration Database (IMDB) Technical Report, 2023
Surveys and statistical programs – Documentation: 11-633-X2024004
Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.
Release date: 2024-12-09
3. Longitudinal Immigration Database (IMDB) Technical Report, 2022
Surveys and statistical programs – Documentation: 11-633-X2024001
Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.
Release date: 2024-01-22
4. Agriculture–Population Linkage: Data quality report
Surveys and statistical programs – Documentation: 32-26-0006
Description: This report provides data quality information pertaining to the Agriculture–Population Linkage, such as sources of error, matching process, response rates, imputation rates, sampling, weighting, disclosure control methods and data quality indicators.
Release date: 2023-08-25
5. Aggregated and derived income concepts and income statistics, 2021 Census of Population
Surveys and statistical programs – Documentation: 98-20-00032021011
Description: This video explains the key concepts of different levels of aggregation of income data such as household and family income; income concepts derived from key income variables such as adjusted income and equivalence scale; and statistics used for income data such as median and average income, quartiles, quintiles, deciles and percentiles.
Release date: 2023-03-29
6. Low-income concepts and statistics, 2021 Census of Population
Surveys and statistical programs – Documentation: 98-20-00032021012
Description: This video builds on concepts introduced in the other videos on income. It explains key low-income concepts - Market Basket Measure (MBM), Low income measure (LIM) and Low-income cut-offs (LICO) and the indicators associated with these concepts such as the low-income gap and the low-income ratio. These concepts are used in analysis of the economic well-being of the population.
Release date: 2023-03-29
7. Longitudinal Immigration Database (IMDB) Technical Report, 2021
Surveys and statistical programs – Documentation: 11-633-X2022009
Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years.

This report will discuss the IMDB data sources, concepts and variables, record linkage, data processing, dissemination, data evaluation and quality indicators, comparability with other immigration datasets, and the analyses possible with the IMDB.
Release date: 2022-12-05
8. Painting a Portrait of Canada: The 2021 Census of Population
Notices and consultations: 98-26-0001
Description:
This white paper presents Statistics Canada’s planned approach to the 2021 Census of Population and provides a clear explanation of the processes behind the census program, touching on historical, legal, operational and content aspects. Statistics Canada recognizes that it is important to not only successfully conduct the census, but also to be transparent and informative about the way in which those efforts are accomplished. Painting a Portrait of Canada: The 2021 Census of Population gives readers an exclusive, detailed look at how census data is collected, analyzed and given back to Canadians, in the form of high-quality statistical information, used to make evidence-based decisions in Canadian society.
Release date: 2020-07-20
9. Using family-related variables from the Census of Population and the National Household Survey microdata files Archived
Surveys and statistical programs – Documentation: 91F0015M2016012
Description:
This article provides information on using family-related variables from the microdata files of Canada’s Census of Population. These files exist internally at Statistics Canada, in the Research Data Centres (RDCs), and as public-use microdata files (PUMFs). This article explains certain technical aspects of all three versions, including the creation of multi-level variables for analytical purposes.
Release date: 2016-12-22
10. The Data Warehouse and analytical tools to facilitate the integration of the Canadian Macroeconomic Accounts Archived
Surveys and statistical programs – Documentation: 11-522-X201700014710
Description:
The Data Warehouse has modernized the way the Canadian System of Macroeconomic Accounts (MEA) are produced and analyzed today. Its continuing evolution facilitates the amounts and types of analytical work that is done within the MEA. It brings in the needed element of harmonization and confrontation as the macroeconomic accounts move toward full integration. The improvements in quality, transparency, and timeliness have strengthened the statistics that are being disseminated.
Release date: 2016-03-24

Date modified:: 2026-06-22

Language selection

WxT Language switcher

Search and menus

WxT Search form

Data analysis

Filter results by

Keyword(s)

Type

Geography

Survey or statistical program

Content

Results

All (289) (0 to 10 of 289 results)

Data (2) ((2 results))

Analysis (256) (180 to 190 of 256 results)

Reference (26) (0 to 10 of 26 results)

Data analysis

Filter results by

Keyword(s)

Type

Geography

Survey or statistical program

Content

Results

All (289) (0 to 10 of 289 results)

Data (2) ((2 results))

Analysis (256) (180 to 190 of 256 results)

Reference (26) (0 to 10 of 26 results)

How are the results ordered?

How are the results ordered?

How do I use the filters and the search box?

How do I refine my search?

How does the search work?