Data analysis

Results

All (289)

All (289) (0 to 10 of 289 results)

1. Quantitative impact analysis: A practical overview
Articles and reports: 36-28-0001202600500003
Description: This spotlight article outlines practical methods for assessing the economic impacts of public programs delivered by federal agencies and Crown corporations. It summarizes key steps in conducting quantitative impact analysis, including data linkage, cohort construction and implementation of quasi causal estimators.
Release date: 2026-05-27
2. Analytical Studies: Methods and References
Journals and periodicals: 11-633-X
Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
Release date: 2026-05-27
3. When and How to Use Area-Level Measures for Health Analysis: A Review and Recommendation Report
Surveys and statistical programs – Documentation: 11-633-X2026001
Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.
Release date: 2026-03-05
4. Collinearity diagnostics in generalized linear models fitted with survey data
Articles and reports: 12-001-X202500200004
Description: The class of generalized linear models (GLM) is a flexible generalization of ordinary least squares regression that allows the linear model to be related to the response variable via a link function and assumes the magnitude of the variance of each measurement to be a function of its predicted value. Multicollinearity in GLMs can inflate variances of the estimated coefficients and cause poor prediction in certain regions of the regression space. It may also cause a nonsignificant Wald statistic even when the predictors are highly predictive in a model of the family of GLMs. Little previous research has closely investigated the diagnostics of multicollinearity in GLMs, especially when complex survey data are used. In this paper, we develop variance inflation factors (VIFs) that measure the amount that the variance of a parameter estimator is increased due to multicollinearity in GLMs. We also extend VIFs and condition indexes to apply to complex survey data, accounting for design features, e.g. weights, clusters, and strata. Illustrations of these methods are given using data from a household survey of health and nutrition.
Release date: 2025-12-23
5. Fighting Misinformation
Stats in brief: 89-20-00062025001
Description: This video is designed to help you critically assess the data presented to you. No data is perfect. By understanding the strengths and limitations of the data, you can avoid being misled—and make smarter, more informed decisions.
Release date: 2025-12-15
6. Life in the FastText Lane: Harnessing Linear Programming Constrained Machine Learning for Classifications Revision Archived
Articles and reports: 11-522-X202500100010
Description: Statistics Canada's Labour Force Survey (LFS) plays an essential role in the estimation of labour market conditions in Canada. Periodically, LFS revises its data to the most recent industry and occupational classification versions. Differences in versions can be extensive, including high-level and unit-group structural changes, creations, deletions, split-offs and combination of classification units (classes). Historically, to reconcile split-off classes - where one class splits into multiple classes - a sample of LFS split-off records would be manually recoded to the new classification version. Based on the split-off proportion observed in the recoded sample, a random allocation method would be applied on all data to reflect the changing Canadian labour market over time. This article proposes using machine learning (fastText), constrained to split-off proportions using linear programming, to revise industry and occupation classifications in LFS. The hybrid framework benefits from a text-based revision mechanism while adhering to traditional proportions driven estimates, thus ensuring a minimal impact on the comparability of published labour market indicators.
Release date: 2025-09-08
7. Leveraging Statistics Canada data integration opportunities for program evaluation
Articles and reports: 36-28-0001202500300002
Description: Government programs are evaluated to measure their effectiveness. This article discusses the benefits of using Statistics Canada data combined with the data collected from the government program to provide a far more comprehensive evaluation than program data alone can offer. The article also summarizes a recent example of a program evaluation that benefited from Statistics Canada data and the expertise of Statistics Canada researchers in analyzing the data.
Release date: 2025-03-26
8. Bayesian predictive inference of a finite population mean without specifying the relation between the study variable and the covariates
Articles and reports: 12-001-X202400200004
Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.
Release date: 2024-12-20
9. Robust adaptive survey design for time changes in mixed-mode response propensities
Articles and reports: 12-001-X202400200005
Description: Adaptive survey designs (ASDs) tailor recruitment protocols to population subgroups that are relevant to a survey. In recent years, effective ASD optimization has been the topic of research and several applications. However, the performance of an optimized ASD over time is sensitive to time changes in response propensities. How adaptation strategies can adjust to such variation over time is not yet fully understood. In this paper, we propose a robust optimization approach in the context of sequential mixed-mode surveys employing Bayesian analysis. The approach is formulated as a mathematical programming problem that explicitly accounts for uncertainty due to time change. ASD decisions can then be made by considering time-dependent variation in conditional mode response propensities and between-mode correlations in response propensities. The approach is demonstrated using a case study: the 2014-2017 Dutch Health Survey. We evaluate the sensitivity of ASD performance to 1) the budget level and 2) the length of applicable historic time-series data. We find there is only a moderate dependence on the budget level and the dependence on historic data is moderated by the amount of seasonality during the year.
Release date: 2024-12-20
10. Longitudinal Immigration Database (IMDB) Technical Report, 2023
Surveys and statistical programs – Documentation: 11-633-X2024004
Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.
Release date: 2024-12-09

Data (2)

Data (2) ((2 results))

1. Canadian Statistical Geospatial Explorer Hub Archived
Data Visualization: 71-607-X2020010
Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
Release date: 2024-08-21
2. Housing Data Viewer Archived
Data Visualization: 71-607-X2019010
Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
Release date: 2019-10-30

Analysis (256)

Analysis (256) (180 to 190 of 256 results)

181. Perfoming logistic regression on survey data with the new surveylogistic procedure Archived
Articles and reports: 11-522-X20020016723
Description:
Categorical outcomes, such as binary, ordinal and nominal responses, occur often in survey research. Logistic regression investigates the relationship between such categorical responses variables and a set of explanatory variables. The LOGISTIC procedure can be used to perform a logistic analysis on data from a random sample. However, this approach is not valid if the data come from other sample designs, such as complex survey designs with stratification, clustering and/or unequal weighting. In these cases, specialized techniques must be applied in order to produce the appropriate estimates and standard errors.
The SURVEYLOGISTIC procedure, experimental in Version 9, brings logistic regression for survey data to the SAS System and delivers much of the functionality of the LOGISTIC procedure. This paper describes the methodological approach and applications for this new software.
Release date: 2004-09-13
182. The analysis of survey data using Stata: Some recent developments Archived
Articles and reports: 11-522-X20020016724
Description:
Some of the most commonly used statistical models are fitted using maximum likelihood (ML) or some extension of ML. Stata's ML command provides researchers and data analysts with a tool to develop estimation commands to fit their models using their data. Such models may include multiple equations, clustered observations, sampling weights and other survey design characteristics. These elements are discussed in this paper.
Release date: 2004-09-13
183. Bridging multiple-race responses in the U.S. Census to single-race categories for the calculation of vital rates Archived
Articles and reports: 11-522-X20020016725
Description:
In 1997, the US Office of Management and Budget issued revised standards for the collection of race information within the federal statistical system. One revision allows individuals to choose more than one race group when responding to federal surveys and other federal data collections. This change presents challenges for analyses that involve data collected under both the old and new race-reporting systems, since the data on race are not comparable. The following paper discusses the problems encountered by these changes and methods developed to overcome them.
Since most people under both systems report only a single race, a common proposed solution is to try to bridge the transition by assigning a single-race category to each multiple-race reporter under the new system, and to conduct analyses using just the observed and assigned single-race categories. Thus, the problem can be viewed as a missing-data problem, in which single-race responses are missing for multiple-race reporters and needing to be imputed.
The US Office of Management and Budget suggested several simple bridging methods to handle this missing-data problem. Schenker and Parker (Statistics in Medicine, forthcoming) analysed data from the National Health Interview Survey of the US National Center for Health Statistics, which allows multiple-race reporting but also asks multiple-race reporters to specify a primary race, and found that improved bridging methods could result from incorporating individual-level and contextual covariates into the bridging models.
While Schenker and Parker discussed only three large multiple-race groups, the current application requires predicting single-race categories for several small multiple-race groups as well. Thus, problems of sparse data arise in fitting the bridging models. We address these problems by building combined models for several multiple-race groups, thus borrowing strength across them. These and other methodological issues are discussed.
Release date: 2004-09-13
184. WesVar: Software for analyzing data from complex surveys Archived
Articles and reports: 11-522-X20020016728
Description:
Nearly all surveys use complex sampling designs to collect data and these data are frequently used for statistical analyses beyond the estimation of simple descriptive parameters of the target population. Many procedures available in popular statistical software packages are not appropriate for this purpose because the analyses are based on the assumption that the sample has been drawn with simple random sampling. Therefore, the results of the analyses conducted using these software packages would not be valid when the sample design incorporates multistage sampling, stratification, or clustering. Two commonly used methods for analysing data from complex surveys are replication and Taylor linearization techniques. We discuss the use of WESVAR software to compute estimates and replicate variance estimates by properly reflecting complex sampling and estimation procedures. We also illustrate the WESVAR features by using data from two Westat surveys that employ complex survey designs: the Third International Mathematics and Science Study (TIMSS) and the National Health and Nutrition Examination Survey (NHANES).
Release date: 2004-09-13
185. Using IRT and factor scores in regression and other analyses: A review Archived
Articles and reports: 11-522-X20020016731
Description:
Behavioural researchers use a variety of techniques to predict respondent scores on constructs that are not directly observable. Examples of such constructs include job satisfaction, work stress, aptitude for graduate study, children's mathematical ability, etc. The techniques commonly used for modelling and predicting scores on such constructs include factor analysis, classical psychometric scaling and item response theory (IRT), and for each technique there are often several different strategies that can be used to generate individual scores. However, researchers are seldom satisfied with simply measuring these constructs. They typically use the derived scores in multiple regression, analysis of variance and numerous multivariate procedures. Though using predicted scores in this way can result in biased estimates of model parameters, not all researchers are aware of this difficulty. The paper will review the literature on this issue, with particular emphasis on IRT methods. Problems will be illustrated, some remedies suggested, and areas for further research will be identified.
Release date: 2004-09-13
186. Analysis of dose-response relationships on complex survey data Archived
Articles and reports: 11-522-X20020016732
Description:
Analysis of dose-response relationships has long been important in toxicology. More recently, this type of analysis has been employed to evaluate public education campaigns. The data that are collected in such evaluations are likely to come from standard household survey designs with all the usual complexities of multiple stages, stratification and variable selection probabilities. On a recent evaluation, a system was developed with the following features: categorization of doses into three or four levels, propensity scoring of dose selection and a new jack-knifed Jonckheere-Terpstra test for a monotone dose-response relationship. This system allows rapid production of tests for monotone dose-response relationships that are corrected both for sample design and for confounding. The focus of this paper will be the results of a Monte-Carlo simulation of the properties of the jack-knifed Jonckheere-Terpstra.
Moreover, there is no experimental control over dosages and the possibility of confounding variables must be considered. Standard regressions in WESVAR and SUDAAN could be used to determine if there is a linear dose-response relationship while controlling on confounders, but such an approach obviously has low power to detect nonlinear but monotone dose-response relationships and is time-consuming to implement if there are a large number of possible outcomes of interest.
Release date: 2004-09-13
187. Longitudinal analysis of LabLongitudinal analysis of Labour Force Survey Data Archived
Articles and reports: 11-522-X20020016739
Description:
The Labour Force Survey (LFS) was not designed to be a longitudinal survey. However, given that respondent households typically remain in the sample for six consecutive months, it is possible to reconstruct six-month fragments of longitudinal data from the monthly records of household members. Such longitudinal data (altogether consisting of millions of person-months of individual- and family-level data) is useful for analyses of monthly labour market dynamics over relatively long periods of time, 20 years and more.
We make use of these data to estimate hazard functions describing transitions among the labour market states: self-employed, paid employee and not employed. Data on job tenure for the employed, and data on the date last worked for the not employed - together with the date of survey responses - permit the estimated models to include terms reflecting seasonality and macro-economic cycles, as well as the duration dependence of each type of transition. In addition, the LFS data permit spouse labour market activity and family composition variables to be included in the hazard models as time-varying covariates. The estimated hazard equations have been included in the LifePaths socio-economic microsimulation model. In this setting, the equations may be used to simulate lifetime employment activity from past, present and future birth cohorts. Cross-sectional simulation results have been used to validate these models by comparisons with census data from the period 1971 to 1996.
Release date: 2004-09-13
188. A causal event history approach to interrelated family events Archived
Articles and reports: 11-522-X20020016742
Geography: Canada
Description:
One of the most important advances brought about by life course and event history studies is the use of parallel or interdependent processes as explaining factors in transition rate models. The purpose of this paper is to demonstrate a causal approach to the study of interrelated family events. Various types of interdependent processes are described first, followed by two event history perspectives: the 'system' and 'causal' approaches. The authors assert that the causal approach is more appropriate from an analytical point of view as it provides a straightforward solution to simultaneity, cause-effect lags, and temporal shapes of effects. Based on comparative cross-national applications in West and East Germany, Canada, Latvia and the Netherlands, the usefulness of the causal approach is demonstrated by analysing two highly interdependent family processes: entry into marriage (for individuals who are in a consensual union) as the dependent process, and first pregnancy/childbirth as the explaining one. Both statistical and theoretical explanations are explored emphasizing the need for conceptual reasoning.
Release date: 2004-09-13
189. Modelling and analysis of duration data from longitudinal surveys Archived
Articles and reports: 11-522-X20020016743
Description:
There is much interest in using data from longitudinal surveys to help understand life history processes such as education, employment, fertility, health and marriage. The analysis of data on the durations of spells or sojourns that individuals spend in certain states (e.g., employment, marriage) is a primary tool in studying such processes. This paper examines methods for analysing duration data that address important features associated with longitudinal surveys: the use of complex survey designs in heterogeneous populations; missing or inaccurate information about the timing of events; and the possibility of non-ignorable dropout or censoring mechanisms. Parametric and non-parametric techniques for estimation and for model checking are considered. Both new and existing methodology are proposed and applied to duration data from Canada's Survey of Labour and Income Dynamics (SLID).
Release date: 2004-09-13
190. Analyzing developmental trajectories: An overview of a group-based approach Archived
Articles and reports: 11-522-X20020016744
Description:
A developmental trajectory describes the course of a behaviour over age or time. This technical paper provides an overview of a semi-parametric, group-based method for analysing developmental trajectories. This methodology provides an alternative to assuming a homogenous population of trajectories as is done in standard growth modelling.
Four capabilities are described: (1) the capability to identify, rather than assume, distinctive groups of trajectories; (2) the capability to estimate the proportion of the population following each such trajectory group; (3) the capability to relate group membership probability to individual characteristics and circumstances; and (4) the capability to use the group membership probabilities for various other purposes, such as creating profiles of group members.
In addition, two important extensions of the method are described: the capability to add time-varying covariates to trajectory models and the capability to estimate joint trajectory models of distinct but related behaviours. The former provides the statistical capacity for testing if a contemporary factor, such as an experimental intervention or a non-experimental event like pregnancy, deflects a pre-existing trajectory. The latter provides the capability to study the unfolding of distinct but related behaviours such as problematic childhood behaviour and adolescent drug abuse.
Release date: 2004-09-13

Reference (26)

Reference (26) (10 to 20 of 26 results)

11. Note to Users of Data from the 2012 Canadian Income Survey
Notices and consultations: 75-513-X2014001
Description:
Starting with the 2012 reference year, annual individual and family income data is produced by the Canadian Income Survey (CIS). The CIS is a cross-sectional survey developed to provide information on the income and income sources of Canadians, along with their individual and household characteristics. The CIS reports on many of the same statistics as the Survey of Labour and Income Dynamics (SLID), which last reported on income for the 2011 reference year. This note describes the CIS methodology, as well as the main differences in survey objectives, methodology and questionnaires between CIS and SLID.
Release date: 2014-12-10
12. Using a Trend-cycle Approach to Estimate Changes in Southern Canada's Water Yield from 1971 to 2004 Archived
Surveys and statistical programs – Documentation: 16-001-M2010014
Description: Quantifying how Canada's water yield has changed over time is an important component of the water accounts maintained by Statistics Canada. This study evaluates the movement in the series of annual water yield estimates for Southern Canada from 1971 to 2004. We estimated the movement in the series using a trend-cycle approach and found that water yield for southern Canada has generally decreased over the period of observation.
Release date: 2010-09-13
13. Finding and Using Statistics Archived
Surveys and statistical programs – Documentation: 11-533-X
Description:
This guide has been created especially for users needing a step-by-step review on how to find, read and use data, with quick tips on locating information on the Statistics Canada website. Originally published in paper format in the 1980s, revised as part of the 1994 Statistics Canada Catalogue, and then transformed into an electronic version, this guide is continually being updated to maintain its currency and usefulness.
Release date: 2007-11-19
14. Trade in Culture Services A Handbook of Concepts and Methods Archived
Surveys and statistical programs – Documentation: 81-595-M2007056
Geography: Canada
Description: This handbook discusses the collection and interpretation of statistical data on Canada's trade in culture services.
Release date: 2007-10-31
15. Producing Hours Worked for the SNA in Order to Measure Productivity: The Canadian Experience Archived
Surveys and statistical programs – Documentation: 15-206-X2006004
Description:
This paper provides a brief description of the methodology currently used to produce the annual volume of hours worked consistent with the System of National Accounts (SNA). These data are used for labour input in the annual and quarterly measures of labour productivity, as well as in the annual measures of multifactor productivity. For this purpose, hours worked are broken down by educational level and age group, so that changes in the composition of the labour force can be taken into account. They are also used to calculate hourly compensation and the unit labour cost and for simulations of the SNA Input-Output Model; as such, they are integrated as labour force inputs into most SNA satellite accounts (i.e., environment, tourism).
Release date: 2006-10-27
16. Constant Dollar Adjustment of Expenditure Data from the Survey of Household Spending Archived
Surveys and statistical programs – Documentation: 62F0026M2005005
Description:
This discussion paper reviews the previous research into the subject of presenting historical time series and comparisons in constant dollars for the Survey of Household Spending (SHS), and its predecessor the Family Expenditure Survey (FAMEX). It examines two principal methods of converting spending data into constant dollars. The purpose of this discussion paper is to show interested parties how the two methods differ in complexity of implementation and interpretation.
Release date: 2005-07-15
17. The CRISP-NLSCY files Archived
Notices and consultations: 12-002-X20050018033
Description:
Dr. J. Douglas Willms, and his staff at the Canadian Research Institute for Social Policy (CRISP) at the University of New Brunswick (Fredericton Campus), have developed a set of files for researchers interested in using Statistics Canada's National Longitudinal Survey of Children and Youth (NLSCY) data sets. "The Files" consist of SPSS data and syntax, which are intended to assist researchers in conducting more efficient longitudinal analyses, using NLSCY data.
Release date: 2005-06-23
18. Using Median Expenditures: Impact on Household Spending Data Archived
Surveys and statistical programs – Documentation: 62F0026M2005001
Description:
This paper provides some guidance to users on the use of medians and also gives some examples of situations when it can be a more appropriate measure than the average.
Release date: 2005-05-17
19. Culture Goods Trade Estimates: Methodology and Technical Notes Archived
Surveys and statistical programs – Documentation: 81-595-M2004020
Geography: Canada
Description:
This article discusses the collection and interpretation of statistical data on Canada's trade in culture goods. It defines the products that are included in culture trade and explains how appropriate products are selected from the relevant classification standards.
This version has been replaced by Culture Goods Trade Data User Guide, Catalogue No. 81-595-MIE2006040.
Release date: 2004-07-28
20. Occupation, 2001 Census Technical Report (Reference Products: 2001 Census) Archived
Surveys and statistical programs – Documentation: 92-388-X
Description:
This report contains basic conceptual and data quality information to help users interpret and make use of census occupation data. It gives an overview of the collection, coding (to the 2001 National Occupational Classification), edit and imputation of the occupation data from the 2001 Census. The report describes procedural changes between the 2001 and earlier censuses, and provides an analysis of the quality level of the 2001 Census occupation data. Finally, it details the revision of the 1991 Standard Occupational Classification used in the 1991 and 1996 Censuses to the 2001 National Occupational Classification for Statistics used in 2001. The historical comparability of data coded to the two classifications is discussed. Appendices to the report include a table showing historical data for the 1991, 1996 and 2001 Censuses.
Release date: 2004-07-15

Date modified:: 2026-06-22

Language selection

WxT Language switcher

Search and menus

WxT Search form

Data analysis

Filter results by

Keyword(s)

Type

Geography

Survey or statistical program

Content

Results

All (289) (0 to 10 of 289 results)

Data (2) ((2 results))

Analysis (256) (180 to 190 of 256 results)

Reference (26) (10 to 20 of 26 results)

Data analysis

Filter results by

Keyword(s)

Type

Geography

Survey or statistical program

Content

Results

All (289) (0 to 10 of 289 results)

Data (2) ((2 results))

Analysis (256) (180 to 190 of 256 results)

Reference (26) (10 to 20 of 26 results)

How are the results ordered?

How are the results ordered?

How do I use the filters and the search box?

How do I refine my search?

How does the search work?