Data analysis

Skip to filters. View results.

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

2 facets displayed. 0 facets selected.

Survey or statistical program

56 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (289)

All (289) (0 to 10 of 289 results)

  • Articles and reports: 36-28-0001202600500003
    Description: This spotlight article outlines practical methods for assessing the economic impacts of public programs delivered by federal agencies and Crown corporations. It summarizes key steps in conducting quantitative impact analysis, including data linkage, cohort construction and implementation of quasi causal estimators.
    Release date: 2026-05-27

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2026-05-27

  • Surveys and statistical programs – Documentation: 11-633-X2026001
    Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.
    Release date: 2026-03-05

  • Articles and reports: 12-001-X202500200004
    Description: The class of generalized linear models (GLM) is a flexible generalization of ordinary least squares regression that allows the linear model to be related to the response variable via a link function and assumes the magnitude of the variance of each measurement to be a function of its predicted value. Multicollinearity in GLMs can inflate variances of the estimated coefficients and cause poor prediction in certain regions of the regression space. It may also cause a nonsignificant Wald statistic even when the predictors are highly predictive in a model of the family of GLMs. Little previous research has closely investigated the diagnostics of multicollinearity in GLMs, especially when complex survey data are used. In this paper, we develop variance inflation factors (VIFs) that measure the amount that the variance of a parameter estimator is increased due to multicollinearity in GLMs. We also extend VIFs and condition indexes to apply to complex survey data, accounting for design features, e.g. weights, clusters, and strata. Illustrations of these methods are given using data from a household survey of health and nutrition.
    Release date: 2025-12-23

  • Stats in brief: 89-20-00062025001
    Description: This video is designed to help you critically assess the data presented to you. No data is perfect. By understanding the strengths and limitations of the data, you can avoid being misled—and make smarter, more informed decisions.
    Release date: 2025-12-15

  • Articles and reports: 11-522-X202500100010
    Description: Statistics Canada's Labour Force Survey (LFS) plays an essential role in the estimation of labour market conditions in Canada. Periodically, LFS revises its data to the most recent industry and occupational classification versions. Differences in versions can be extensive, including high-level and unit-group structural changes, creations, deletions, split-offs and combination of classification units (classes). Historically, to reconcile split-off classes - where one class splits into multiple classes - a sample of LFS split-off records would be manually recoded to the new classification version. Based on the split-off proportion observed in the recoded sample, a random allocation method would be applied on all data to reflect the changing Canadian labour market over time. This article proposes using machine learning (fastText), constrained to split-off proportions using linear programming, to revise industry and occupation classifications in LFS. The hybrid framework benefits from a text-based revision mechanism while adhering to traditional proportions driven estimates, thus ensuring a minimal impact on the comparability of published labour market indicators.
    Release date: 2025-09-08

  • Articles and reports: 36-28-0001202500300002
    Description: Government programs are evaluated to measure their effectiveness. This article discusses the benefits of using Statistics Canada data combined with the data collected from the government program to provide a far more comprehensive evaluation than program data alone can offer. The article also summarizes a recent example of a program evaluation that benefited from Statistics Canada data and the expertise of Statistics Canada researchers in analyzing the data.
    Release date: 2025-03-26

  • Articles and reports: 12-001-X202400200004
    Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.
    Release date: 2024-12-20

  • Articles and reports: 12-001-X202400200005
    Description: Adaptive survey designs (ASDs) tailor recruitment protocols to population subgroups that are relevant to a survey. In recent years, effective ASD optimization has been the topic of research and several applications. However, the performance of an optimized ASD over time is sensitive to time changes in response propensities. How adaptation strategies can adjust to such variation over time is not yet fully understood. In this paper, we propose a robust optimization approach in the context of sequential mixed-mode surveys employing Bayesian analysis. The approach is formulated as a mathematical programming problem that explicitly accounts for uncertainty due to time change. ASD decisions can then be made by considering time-dependent variation in conditional mode response propensities and between-mode correlations in response propensities. The approach is demonstrated using a case study: the 2014-2017 Dutch Health Survey. We evaluate the sensitivity of ASD performance to 1) the budget level and 2) the length of applicable historic time-series data. We find there is only a moderate dependence on the budget level and the dependence on historic data is moderated by the amount of seasonality during the year.
    Release date: 2024-12-20

  • Surveys and statistical programs – Documentation: 11-633-X2024004
    Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.
    Release date: 2024-12-09
Data (2)

Data (2) ((2 results))

  • Data Visualization: 71-607-X2020010
    Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
    Release date: 2024-08-21

  • Data Visualization: 71-607-X2019010
    Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
    Release date: 2019-10-30
Analysis (256)

Analysis (256) (180 to 190 of 256 results)

  • Articles and reports: 11-522-X20020016723
    Description:

    Categorical outcomes, such as binary, ordinal and nominal responses, occur often in survey research. Logistic regression investigates the relationship between such categorical responses variables and a set of explanatory variables. The LOGISTIC procedure can be used to perform a logistic analysis on data from a random sample. However, this approach is not valid if the data come from other sample designs, such as complex survey designs with stratification, clustering and/or unequal weighting. In these cases, specialized techniques must be applied in order to produce the appropriate estimates and standard errors.

    The SURVEYLOGISTIC procedure, experimental in Version 9, brings logistic regression for survey data to the SAS System and delivers much of the functionality of the LOGISTIC procedure. This paper describes the methodological approach and applications for this new software.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016724
    Description:

    Some of the most commonly used statistical models are fitted using maximum likelihood (ML) or some extension of ML. Stata's ML command provides researchers and data analysts with a tool to develop estimation commands to fit their models using their data. Such models may include multiple equations, clustered observations, sampling weights and other survey design characteristics. These elements are discussed in this paper.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016725
    Description:

    In 1997, the US Office of Management and Budget issued revised standards for the collection of race information within the federal statistical system. One revision allows individuals to choose more than one race group when responding to federal surveys and other federal data collections. This change presents challenges for analyses that involve data collected under both the old and new race-reporting systems, since the data on race are not comparable. The following paper discusses the problems encountered by these changes and methods developed to overcome them.

    Since most people under both systems report only a single race, a common proposed solution is to try to bridge the transition by assigning a single-race category to each multiple-race reporter under the new system, and to conduct analyses using just the observed and assigned single-race categories. Thus, the problem can be viewed as a missing-data problem, in which single-race responses are missing for multiple-race reporters and needing to be imputed.

    The US Office of Management and Budget suggested several simple bridging methods to handle this missing-data problem. Schenker and Parker (Statistics in Medicine, forthcoming) analysed data from the National Health Interview Survey of the US National Center for Health Statistics, which allows multiple-race reporting but also asks multiple-race reporters to specify a primary race, and found that improved bridging methods could result from incorporating individual-level and contextual covariates into the bridging models.

    While Schenker and Parker discussed only three large multiple-race groups, the current application requires predicting single-race categories for several small multiple-race groups as well. Thus, problems of sparse data arise in fitting the bridging models. We address these problems by building combined models for several multiple-race groups, thus borrowing strength across them. These and other methodological issues are discussed.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016728
    Description:

    Nearly all surveys use complex sampling designs to collect data and these data are frequently used for statistical analyses beyond the estimation of simple descriptive parameters of the target population. Many procedures available in popular statistical software packages are not appropriate for this purpose because the analyses are based on the assumption that the sample has been drawn with simple random sampling. Therefore, the results of the analyses conducted using these software packages would not be valid when the sample design incorporates multistage sampling, stratification, or clustering. Two commonly used methods for analysing data from complex surveys are replication and Taylor linearization techniques. We discuss the use of WESVAR software to compute estimates and replicate variance estimates by properly reflecting complex sampling and estimation procedures. We also illustrate the WESVAR features by using data from two Westat surveys that employ complex survey designs: the Third International Mathematics and Science Study (TIMSS) and the National Health and Nutrition Examination Survey (NHANES).

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016731
    Description:

    Behavioural researchers use a variety of techniques to predict respondent scores on constructs that are not directly observable. Examples of such constructs include job satisfaction, work stress, aptitude for graduate study, children's mathematical ability, etc. The techniques commonly used for modelling and predicting scores on such constructs include factor analysis, classical psychometric scaling and item response theory (IRT), and for each technique there are often several different strategies that can be used to generate individual scores. However, researchers are seldom satisfied with simply measuring these constructs. They typically use the derived scores in multiple regression, analysis of variance and numerous multivariate procedures. Though using predicted scores in this way can result in biased estimates of model parameters, not all researchers are aware of this difficulty. The paper will review the literature on this issue, with particular emphasis on IRT methods. Problems will be illustrated, some remedies suggested, and areas for further research will be identified.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016732
    Description:

    Analysis of dose-response relationships has long been important in toxicology. More recently, this type of analysis has been employed to evaluate public education campaigns. The data that are collected in such evaluations are likely to come from standard household survey designs with all the usual complexities of multiple stages, stratification and variable selection probabilities. On a recent evaluation, a system was developed with the following features: categorization of doses into three or four levels, propensity scoring of dose selection and a new jack-knifed Jonckheere-Terpstra test for a monotone dose-response relationship. This system allows rapid production of tests for monotone dose-response relationships that are corrected both for sample design and for confounding. The focus of this paper will be the results of a Monte-Carlo simulation of the properties of the jack-knifed Jonckheere-Terpstra.

    Moreover, there is no experimental control over dosages and the possibility of confounding variables must be considered. Standard regressions in WESVAR and SUDAAN could be used to determine if there is a linear dose-response relationship while controlling on confounders, but such an approach obviously has low power to detect nonlinear but monotone dose-response relationships and is time-consuming to implement if there are a large number of possible outcomes of interest.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016739
    Description:

    The Labour Force Survey (LFS) was not designed to be a longitudinal survey. However, given that respondent households typically remain in the sample for six consecutive months, it is possible to reconstruct six-month fragments of longitudinal data from the monthly records of household members. Such longitudinal data (altogether consisting of millions of person-months of individual- and family-level data) is useful for analyses of monthly labour market dynamics over relatively long periods of time, 20 years and more.

    We make use of these data to estimate hazard functions describing transitions among the labour market states: self-employed, paid employee and not employed. Data on job tenure for the employed, and data on the date last worked for the not employed - together with the date of survey responses - permit the estimated models to include terms reflecting seasonality and macro-economic cycles, as well as the duration dependence of each type of transition. In addition, the LFS data permit spouse labour market activity and family composition variables to be included in the hazard models as time-varying covariates. The estimated hazard equations have been included in the LifePaths socio-economic microsimulation model. In this setting, the equations may be used to simulate lifetime employment activity from past, present and future birth cohorts. Cross-sectional simulation results have been used to validate these models by comparisons with census data from the period 1971 to 1996.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016742
    Geography: Canada
    Description:

    One of the most important advances brought about by life course and event history studies is the use of parallel or interdependent processes as explaining factors in transition rate models. The purpose of this paper is to demonstrate a causal approach to the study of interrelated family events. Various types of interdependent processes are described first, followed by two event history perspectives: the 'system' and 'causal' approaches. The authors assert that the causal approach is more appropriate from an analytical point of view as it provides a straightforward solution to simultaneity, cause-effect lags, and temporal shapes of effects. Based on comparative cross-national applications in West and East Germany, Canada, Latvia and the Netherlands, the usefulness of the causal approach is demonstrated by analysing two highly interdependent family processes: entry into marriage (for individuals who are in a consensual union) as the dependent process, and first pregnancy/childbirth as the explaining one. Both statistical and theoretical explanations are explored emphasizing the need for conceptual reasoning.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016743
    Description:

    There is much interest in using data from longitudinal surveys to help understand life history processes such as education, employment, fertility, health and marriage. The analysis of data on the durations of spells or sojourns that individuals spend in certain states (e.g., employment, marriage) is a primary tool in studying such processes. This paper examines methods for analysing duration data that address important features associated with longitudinal surveys: the use of complex survey designs in heterogeneous populations; missing or inaccurate information about the timing of events; and the possibility of non-ignorable dropout or censoring mechanisms. Parametric and non-parametric techniques for estimation and for model checking are considered. Both new and existing methodology are proposed and applied to duration data from Canada's Survey of Labour and Income Dynamics (SLID).

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016744
    Description:

    A developmental trajectory describes the course of a behaviour over age or time. This technical paper provides an overview of a semi-parametric, group-based method for analysing developmental trajectories. This methodology provides an alternative to assuming a homogenous population of trajectories as is done in standard growth modelling.

    Four capabilities are described: (1) the capability to identify, rather than assume, distinctive groups of trajectories; (2) the capability to estimate the proportion of the population following each such trajectory group; (3) the capability to relate group membership probability to individual characteristics and circumstances; and (4) the capability to use the group membership probabilities for various other purposes, such as creating profiles of group members.

    In addition, two important extensions of the method are described: the capability to add time-varying covariates to trajectory models and the capability to estimate joint trajectory models of distinct but related behaviours. The former provides the statistical capacity for testing if a contemporary factor, such as an experimental intervention or a non-experimental event like pregnancy, deflects a pre-existing trajectory. The latter provides the capability to study the unfolding of distinct but related behaviours such as problematic childhood behaviour and adolescent drug abuse.

    Release date: 2004-09-13
Reference (26)

Reference (26) (20 to 30 of 26 results)

  • Surveys and statistical programs – Documentation: 11F0019M2003207
    Geography: Canada
    Description:

    The estimation of intergenerational earnings mobility is rife with measurement problems since the research does not observe permanent, lifetime earnings. Nearly all studies make corrections for mean variation in earnings because of the age differences among respondents. Recent works employ average earnings or instrumental variable methods to address the effects of measurement error as a result of transitory earnings shocks and mis-reporting. However, empirical studies of intergenerational mobility have paid no attention to the changes in earnings variance across the life cycle suggested by economic models of human capital investment.

    Using information from the Intergenerational Income Data from Canada and the National Longitudinal Survey and Panel Study of Income Dynamics from the United States, this study finds a strong association between age at observation and estimated earnings persistence. Part of this age-dependence is related to a general increase in transitory earnings variance during the collection of data. An independent effect of life cycle investment is also identified. These findings are then applied to the variation among intergenerational earnings persistence studies. Among studies with similar methodologies, one-third of the variance in published estimates of earnings persistence is attributable to cross-study differences in the age of responding fathers. Finally, these results call into question tests for the importance of credit constraints based on measures of earnings at different points in the life cycle.

    Release date: 2003-08-05

  • Surveys and statistical programs – Documentation: 12-584-G
    Description:

    This book introduces technical aspects of the Statistics Canada Total Work Accounts System (TWAS). The TWAS is designed to facilitate the analysis of issues that require simultaneous consideration of both paid work and unpaid productive work. Its key contribution is to allocate the deemed output of each episode of unpaid work activity to a specific beneficiary or group of beneficiaries (called "destinations"). The guide presents the criteria used to decide the allocation of each work episode to one of the destinations, as well as the pseudo code for DESTIN, the key variable of the System. This pseudo code allows programmers to quickly create the actual programming code needed to derive the DESTIN variable in their own microdata files of diary-based time-use records. The guide also discusses illustrative applications of the System, as well as its key limitations.

    Release date: 2002-02-12

  • Notices and consultations: 87-003-X19970012882
    Geography: Canada
    Description:

    The purpose of this article is to inform Travel-log readers of the availability of a new analytical tool - the National Tourism Indicators. These estimates, which measure trends in tourism in Canada, are placed in perspective here, taking into account the concepts and definitions used in developing them.

    Release date: 1997-01-08

  • Surveys and statistical programs – Documentation: 11F0019M1995083
    Geography: Canada
    Description:

    This paper examines the robustness of a measure of the average complete duration of unemployment in Canada to a host of assumptions used in its derivation. In contrast to the average incomplete duration of unemployment, which is a lagging cyclical indicator, this statistic is a coincident indicator of the business cycle. The impact of using a steady state as opposed to a non steady state assumption, as well as the impact of various corrections for response bias are explored. It is concluded that a non steady state estimator would be a valuable compliment to the statistics on unemployment duration that are currently released by many statistical agencies, and particularly Statistics Canada.

    Release date: 1995-12-30

  • Surveys and statistical programs – Documentation: 75F0002M1993014
    Description:

    This paper presents the results from test 3A of the Survey of Labour and Income Dynamics (SLID), conducted in January 1993, with a view to identify any necessary changes to the questions or to the algorithm used to derive labour force status.

    Release date: 1995-12-30

  • Surveys and statistical programs – Documentation: 75F0002M1994018
    Description:

    This document describes the demographic, cultural and geographic derived variables for the Survey of Labour and Income Dynamics (SLID).

    Release date: 1995-12-30