Data analysis

Results

All (289)

All (289) (0 to 10 of 289 results)

1. Quantitative impact analysis: A practical overview
Articles and reports: 36-28-0001202600500003
Description: This spotlight article outlines practical methods for assessing the economic impacts of public programs delivered by federal agencies and Crown corporations. It summarizes key steps in conducting quantitative impact analysis, including data linkage, cohort construction and implementation of quasi causal estimators.
Release date: 2026-05-27
2. Analytical Studies: Methods and References
Journals and periodicals: 11-633-X
Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
Release date: 2026-05-27
3. When and How to Use Area-Level Measures for Health Analysis: A Review and Recommendation Report
Surveys and statistical programs – Documentation: 11-633-X2026001
Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.
Release date: 2026-03-05
4. Collinearity diagnostics in generalized linear models fitted with survey data
Articles and reports: 12-001-X202500200004
Description: The class of generalized linear models (GLM) is a flexible generalization of ordinary least squares regression that allows the linear model to be related to the response variable via a link function and assumes the magnitude of the variance of each measurement to be a function of its predicted value. Multicollinearity in GLMs can inflate variances of the estimated coefficients and cause poor prediction in certain regions of the regression space. It may also cause a nonsignificant Wald statistic even when the predictors are highly predictive in a model of the family of GLMs. Little previous research has closely investigated the diagnostics of multicollinearity in GLMs, especially when complex survey data are used. In this paper, we develop variance inflation factors (VIFs) that measure the amount that the variance of a parameter estimator is increased due to multicollinearity in GLMs. We also extend VIFs and condition indexes to apply to complex survey data, accounting for design features, e.g. weights, clusters, and strata. Illustrations of these methods are given using data from a household survey of health and nutrition.
Release date: 2025-12-23
5. Fighting Misinformation
Stats in brief: 89-20-00062025001
Description: This video is designed to help you critically assess the data presented to you. No data is perfect. By understanding the strengths and limitations of the data, you can avoid being misled—and make smarter, more informed decisions.
Release date: 2025-12-15
6. Life in the FastText Lane: Harnessing Linear Programming Constrained Machine Learning for Classifications Revision Archived
Articles and reports: 11-522-X202500100010
Description: Statistics Canada's Labour Force Survey (LFS) plays an essential role in the estimation of labour market conditions in Canada. Periodically, LFS revises its data to the most recent industry and occupational classification versions. Differences in versions can be extensive, including high-level and unit-group structural changes, creations, deletions, split-offs and combination of classification units (classes). Historically, to reconcile split-off classes - where one class splits into multiple classes - a sample of LFS split-off records would be manually recoded to the new classification version. Based on the split-off proportion observed in the recoded sample, a random allocation method would be applied on all data to reflect the changing Canadian labour market over time. This article proposes using machine learning (fastText), constrained to split-off proportions using linear programming, to revise industry and occupation classifications in LFS. The hybrid framework benefits from a text-based revision mechanism while adhering to traditional proportions driven estimates, thus ensuring a minimal impact on the comparability of published labour market indicators.
Release date: 2025-09-08
7. Leveraging Statistics Canada data integration opportunities for program evaluation
Articles and reports: 36-28-0001202500300002
Description: Government programs are evaluated to measure their effectiveness. This article discusses the benefits of using Statistics Canada data combined with the data collected from the government program to provide a far more comprehensive evaluation than program data alone can offer. The article also summarizes a recent example of a program evaluation that benefited from Statistics Canada data and the expertise of Statistics Canada researchers in analyzing the data.
Release date: 2025-03-26
8. Bayesian predictive inference of a finite population mean without specifying the relation between the study variable and the covariates
Articles and reports: 12-001-X202400200004
Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.
Release date: 2024-12-20
9. Robust adaptive survey design for time changes in mixed-mode response propensities
Articles and reports: 12-001-X202400200005
Description: Adaptive survey designs (ASDs) tailor recruitment protocols to population subgroups that are relevant to a survey. In recent years, effective ASD optimization has been the topic of research and several applications. However, the performance of an optimized ASD over time is sensitive to time changes in response propensities. How adaptation strategies can adjust to such variation over time is not yet fully understood. In this paper, we propose a robust optimization approach in the context of sequential mixed-mode surveys employing Bayesian analysis. The approach is formulated as a mathematical programming problem that explicitly accounts for uncertainty due to time change. ASD decisions can then be made by considering time-dependent variation in conditional mode response propensities and between-mode correlations in response propensities. The approach is demonstrated using a case study: the 2014-2017 Dutch Health Survey. We evaluate the sensitivity of ASD performance to 1) the budget level and 2) the length of applicable historic time-series data. We find there is only a moderate dependence on the budget level and the dependence on historic data is moderated by the amount of seasonality during the year.
Release date: 2024-12-20
10. Longitudinal Immigration Database (IMDB) Technical Report, 2023
Surveys and statistical programs – Documentation: 11-633-X2024004
Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.
Release date: 2024-12-09

Data (2)

Data (2) ((2 results))

1. Canadian Statistical Geospatial Explorer Hub Archived
Data Visualization: 71-607-X2020010
Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
Release date: 2024-08-21
2. Housing Data Viewer Archived
Data Visualization: 71-607-X2019010
Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
Release date: 2019-10-30

Analysis (256)

Analysis (256) (170 to 180 of 256 results)

171. Comparing a rate in a subpopulation to the rate in the full population: How it may be done when using survey data, and available software tools Archived
Articles and reports: 12-002-X20050018030
Description:
People often wish to use survey micro-data to study whether the rate of occurrence of a particular condition in a subpopulation is the same as the rate of occurrence in the full population. This paper describes some alternatives for making inferences about such a rate difference and shows whether and how these alternatives may be implemented in three different survey software packages. The software packages illustrated - SUDAAN, WesVar and Bootvar - all can make use of bootstrap weights provided by the analyst to carry out variance estimation.
Release date: 2005-06-23
172. Revisions of the Canadian National Tourism Indicators Archived
Stats in brief: 13-604-M2005047
Description:
This paper discusses the revision policy of Canada's National Tourism Indicators (NTI) and summarizes results from some recent studies of data revisions to the NTI. The discussion is timely, as the adoption of explicit data revision policies has been emphasized recently as an essential element in the good governance of statistical systems.
The paper starts with a brief description of the NTI, their underlying conceptual framework, and their sources and methods. Next comes a discussion of the need for data revisions, and an outline of various types of revisions. Then a few sections are devoted to the new NTI revision policy adopted with the first quarter 2004 estimates, and the associated costs and benefits. Revision studies, which have been used to assess quality of national accounts estimates, and the database established to track data revisions to the NTI are described next. Last, results from some recent NTI data revision exercises and studies are summarized.
Release date: 2005-01-28
173. Research and analysis to better understand data collection activities Archived
Articles and reports: 11-522-X20030017604
Description:
This paper explains the scope, objectives and challenges of research and analysis on operations at Statistics Canada and gives some examples of the work accomplished to date.
Release date: 2005-01-26
174. A design-based procedure for the analysis of experiments embedded in complex sample surveys Archived
Articles and reports: 11-522-X20030017702
Description:
This paper proposes a procedure to test hypotheses about differences between sample estimates observed under alternative survey methodologies.
Release date: 2005-01-26
175. A Comparison of Canadian and U.S. Productivity Levels: An Exploration of Measurement Issues Archived
Articles and reports: 11F0027M2005028
Geography: Canada
Description:
This paper examines the level of labour productivity in Canada relative to that of the United States in 1999. In doing so, it addresses two main issues. The first is the comparability of the measures of GDP and labour inputs that the statistical agency in each country produces. Second, it investigates how a price index can be constructed to reconcile estimates of Canadian and U.S. GDP per hour worked that are calculated in Canadian and U.S. dollars respectively. After doing so, and taking into account alternative assumptions about Canada/U.S. prices, the paper provides point estimates of Canada's relative labour productivity of the total economy of around 93% that of the United States. The paper points out that at least a 10 percentage point confidence interval should be applied to these estimates. The size of the range is particularly sensitive to assumptions that are made about import and export prices.
Release date: 2005-01-20
176. Describing the Distribution of Income: Guidelines for Effective Analysis Archived
Articles and reports: 75F0002M2004010
Description:
This document offers a set of guidelines for analysing income distributions. It focuses on the basic intuition of the concepts and techniques instead of the equations and technical details.
Release date: 2004-10-08
177. Comparison of design-based and model-based methods in analyzing complex health survey data: A case study Archived
Articles and reports: 11-522-X20020016708
Description:
In this paper, we discuss the analysis of complex health survey data by using multivariate modelling techniques. Main interests are in various design-based and model-based methods that aim at accounting for the design complexities, including clustering, stratification and weighting. Methods covered include generalized linear modelling based on pseudo-likelihood and generalized estimating equations, linear mixed models estimated by restricted maximum likelihood, and hierarchical Bayes techniques using Markov Chain Monte Carlo (MCMC) methods. The methods will be compared empirically, using data from an extensive health interview and examination survey conducted in Finland in 2000 (Health 2000 Study).
The data of the Health 2000 Study were collected using personal interviews, questionnaires and clinical examinations. A stratified two-stage cluster sampling design was used in the survey. The sampling design involved positive intra-cluster correlation for many study variables. For a closer investigation, we selected a small number of study variables from the health interview and health examination phases. In many cases, the different methods produced similar numerical results and supported similar statistical conclusions. Methods that failed to account for the design complexities sometimes led to conflicting conclusions. We also discuss the application of the methods in this paper by using standard statistical software products.
Release date: 2004-09-13
178. Interval censoring of smoking cessation in the National Population Health Survey Archived
Articles and reports: 11-522-X20020016712
Description:
In this paper, we consider the effect of the interval censoring of cessation time on intensity parameter estimation with regard to smoking cessation and pregnancy. The three waves of the National Population Health Survey allow the methodology of event history analysis to be applied to smoking initiation, cessation and relapse. One issue of interest is the relationship between smoking cessation and pregnancy. If a longitudinal respondent who is a smoker at the first cycle ceases smoking by the second cycle, we know the cessation time to within an interval of length at most a year, since the respondent is asked for the age at which she stopped smoking, and her date of birth is known. We also know whether she is pregnant at the time of the second cycle, and whether she has given birth since the time of the first cycle. For many such subjects, we know the date of conception to within a relatively small interval. If we knew the time of smoking cessation and pregnancy period exactly for each member who experienced one or other of these events between cycles, we could model their temporal relationship through their joint intensities.
Release date: 2004-09-13
179. Application of the delete-a-group jackknife variance estimator to analyses of data from a complex longitudinal survey Archived
Articles and reports: 11-522-X20020016714
Description:
In this highly technical paper, we illustrate the application of the delete-a-group jack-knife variance estimator approach to a particular complex multi-wave longitudinal study, demonstrating its utility for linear regression and other analytic models. The delete-a-group jack-knife variance estimator is proving a very useful tool for measuring variances under complex sampling designs. This technique divides the first-phase sample into mutually exclusive and nearly equal variance groups, deletes one group at a time to create a set of replicates and makes analogous weighting adjustments in each replicate to those done for the sample as a whole. Variance estimation proceeds in the standard (unstratified) jack-knife fashion.
Our application is to the Chicago Health and Aging Project (CHAP), a community-based longitudinal study examining risk factors for chronic health problems of older adults. A major aim of the study is the investigation of risk factors for incident Alzheimer's disease. The current design of CHAP has two components: (1) Every three years, all surviving members of the cohort are interviewed on a variety of health-related topics. These interviews include cognitive and physical function measures. (2) At each of these waves of data collection, a stratified Poisson sample is drawn from among the respondents to the full population interview for detailed clinical evaluation and neuropsychological testing. To investigate risk factors for incident disease, a 'disease-free' cohort is identified at the preceding time point and forms one major stratum in the sampling frame.
We provide proofs of the theoretical applicability of the delete-a-group jack-knife for particular estimators under this Poisson design, paying needed attention to the distinction between finite-population and infinite-population (model) inference. In addition, we examine the issue of determining the 'right number' of variance groups.
Release date: 2004-09-13
180. A comparison of approaches to modelling health and environment Archived
Articles and reports: 11-522-X20020016719
Description:
This study takes a look at the modelling methods used for public health data. Public health has a renewed interest in the impact of the environment on health. Ecological or contextual studies ideally investigate these relationships using public health data augmented with environmental characteristics in multilevel or hierarchical models. In these models, individual respondents in health data are the first level and community data are the second level. Most public health data use complex sample survey designs, which require analyses accounting for the clustering, nonresponse, and poststratification to obtain representative estimates of prevalence of health risk behaviours.
This study uses the Behavioral Risk Factor Surveillance System (BRFSS), a state-specific US health risk factor surveillance system conducted by the Center for Disease Control and Prevention, which assesses health risk factors in over 200,000 adults annually. BRFSS data are now available at the metropolitan statistical area (MSA) level and provide quality health information for studies of environmental effects. MSA-level analyses combining health and environmental data are further complicated by joint requirements of the survey sample design and the multilevel analyses.
We compare three modelling methods in a study of physical activity and selected environmental factors using BRFSS 2000 data. Each of the methods described here is a valid way to analyse complex sample survey data augmented with environmental information, although each accounts for the survey design and multilevel data structure in a different manner and is thus appropriate for slightly different research questions.
Release date: 2004-09-13

Reference (26)

Reference (26) (10 to 20 of 26 results)

11. Note to Users of Data from the 2012 Canadian Income Survey
Notices and consultations: 75-513-X2014001
Description:
Starting with the 2012 reference year, annual individual and family income data is produced by the Canadian Income Survey (CIS). The CIS is a cross-sectional survey developed to provide information on the income and income sources of Canadians, along with their individual and household characteristics. The CIS reports on many of the same statistics as the Survey of Labour and Income Dynamics (SLID), which last reported on income for the 2011 reference year. This note describes the CIS methodology, as well as the main differences in survey objectives, methodology and questionnaires between CIS and SLID.
Release date: 2014-12-10
12. Using a Trend-cycle Approach to Estimate Changes in Southern Canada's Water Yield from 1971 to 2004 Archived
Surveys and statistical programs – Documentation: 16-001-M2010014
Description: Quantifying how Canada's water yield has changed over time is an important component of the water accounts maintained by Statistics Canada. This study evaluates the movement in the series of annual water yield estimates for Southern Canada from 1971 to 2004. We estimated the movement in the series using a trend-cycle approach and found that water yield for southern Canada has generally decreased over the period of observation.
Release date: 2010-09-13
13. Finding and Using Statistics Archived
Surveys and statistical programs – Documentation: 11-533-X
Description:
This guide has been created especially for users needing a step-by-step review on how to find, read and use data, with quick tips on locating information on the Statistics Canada website. Originally published in paper format in the 1980s, revised as part of the 1994 Statistics Canada Catalogue, and then transformed into an electronic version, this guide is continually being updated to maintain its currency and usefulness.
Release date: 2007-11-19
14. Trade in Culture Services A Handbook of Concepts and Methods Archived
Surveys and statistical programs – Documentation: 81-595-M2007056
Geography: Canada
Description: This handbook discusses the collection and interpretation of statistical data on Canada's trade in culture services.
Release date: 2007-10-31
15. Producing Hours Worked for the SNA in Order to Measure Productivity: The Canadian Experience Archived
Surveys and statistical programs – Documentation: 15-206-X2006004
Description:
This paper provides a brief description of the methodology currently used to produce the annual volume of hours worked consistent with the System of National Accounts (SNA). These data are used for labour input in the annual and quarterly measures of labour productivity, as well as in the annual measures of multifactor productivity. For this purpose, hours worked are broken down by educational level and age group, so that changes in the composition of the labour force can be taken into account. They are also used to calculate hourly compensation and the unit labour cost and for simulations of the SNA Input-Output Model; as such, they are integrated as labour force inputs into most SNA satellite accounts (i.e., environment, tourism).
Release date: 2006-10-27
16. Constant Dollar Adjustment of Expenditure Data from the Survey of Household Spending Archived
Surveys and statistical programs – Documentation: 62F0026M2005005
Description:
This discussion paper reviews the previous research into the subject of presenting historical time series and comparisons in constant dollars for the Survey of Household Spending (SHS), and its predecessor the Family Expenditure Survey (FAMEX). It examines two principal methods of converting spending data into constant dollars. The purpose of this discussion paper is to show interested parties how the two methods differ in complexity of implementation and interpretation.
Release date: 2005-07-15
17. The CRISP-NLSCY files Archived
Notices and consultations: 12-002-X20050018033
Description:
Dr. J. Douglas Willms, and his staff at the Canadian Research Institute for Social Policy (CRISP) at the University of New Brunswick (Fredericton Campus), have developed a set of files for researchers interested in using Statistics Canada's National Longitudinal Survey of Children and Youth (NLSCY) data sets. "The Files" consist of SPSS data and syntax, which are intended to assist researchers in conducting more efficient longitudinal analyses, using NLSCY data.
Release date: 2005-06-23
18. Using Median Expenditures: Impact on Household Spending Data Archived
Surveys and statistical programs – Documentation: 62F0026M2005001
Description:
This paper provides some guidance to users on the use of medians and also gives some examples of situations when it can be a more appropriate measure than the average.
Release date: 2005-05-17
19. Culture Goods Trade Estimates: Methodology and Technical Notes Archived
Surveys and statistical programs – Documentation: 81-595-M2004020
Geography: Canada
Description:
This article discusses the collection and interpretation of statistical data on Canada's trade in culture goods. It defines the products that are included in culture trade and explains how appropriate products are selected from the relevant classification standards.
This version has been replaced by Culture Goods Trade Data User Guide, Catalogue No. 81-595-MIE2006040.
Release date: 2004-07-28
20. Occupation, 2001 Census Technical Report (Reference Products: 2001 Census) Archived
Surveys and statistical programs – Documentation: 92-388-X
Description:
This report contains basic conceptual and data quality information to help users interpret and make use of census occupation data. It gives an overview of the collection, coding (to the 2001 National Occupational Classification), edit and imputation of the occupation data from the 2001 Census. The report describes procedural changes between the 2001 and earlier censuses, and provides an analysis of the quality level of the 2001 Census occupation data. Finally, it details the revision of the 1991 Standard Occupational Classification used in the 1991 and 1996 Censuses to the 2001 National Occupational Classification for Statistics used in 2001. The historical comparability of data coded to the two classifications is discussed. Appendices to the report include a table showing historical data for the 1991, 1996 and 2001 Censuses.
Release date: 2004-07-15

Date modified:: 2026-06-22

Language selection

WxT Language switcher

Search and menus

WxT Search form

Data analysis

Filter results by

Keyword(s)

Type

Geography

Survey or statistical program

Content

Results

All (289) (0 to 10 of 289 results)

Data (2) ((2 results))

Analysis (256) (170 to 180 of 256 results)

Reference (26) (10 to 20 of 26 results)

Data analysis

Filter results by

Keyword(s)

Type

Geography

Survey or statistical program

Content

Results

All (289) (0 to 10 of 289 results)

Data (2) ((2 results))

Analysis (256) (170 to 180 of 256 results)

Reference (26) (10 to 20 of 26 results)

How are the results ordered?

How are the results ordered?

How do I use the filters and the search box?

How do I refine my search?

How does the search work?