Data analysis

Skip to filters. View results.

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

2 facets displayed. 0 facets selected.

Survey or statistical program

56 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (289)

All (289) (0 to 10 of 289 results)

  • Articles and reports: 36-28-0001202600500003
    Description: This spotlight article outlines practical methods for assessing the economic impacts of public programs delivered by federal agencies and Crown corporations. It summarizes key steps in conducting quantitative impact analysis, including data linkage, cohort construction and implementation of quasi causal estimators.
    Release date: 2026-05-27

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2026-05-27

  • Surveys and statistical programs – Documentation: 11-633-X2026001
    Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.
    Release date: 2026-03-05

  • Articles and reports: 12-001-X202500200004
    Description: The class of generalized linear models (GLM) is a flexible generalization of ordinary least squares regression that allows the linear model to be related to the response variable via a link function and assumes the magnitude of the variance of each measurement to be a function of its predicted value. Multicollinearity in GLMs can inflate variances of the estimated coefficients and cause poor prediction in certain regions of the regression space. It may also cause a nonsignificant Wald statistic even when the predictors are highly predictive in a model of the family of GLMs. Little previous research has closely investigated the diagnostics of multicollinearity in GLMs, especially when complex survey data are used. In this paper, we develop variance inflation factors (VIFs) that measure the amount that the variance of a parameter estimator is increased due to multicollinearity in GLMs. We also extend VIFs and condition indexes to apply to complex survey data, accounting for design features, e.g. weights, clusters, and strata. Illustrations of these methods are given using data from a household survey of health and nutrition.
    Release date: 2025-12-23

  • Stats in brief: 89-20-00062025001
    Description: This video is designed to help you critically assess the data presented to you. No data is perfect. By understanding the strengths and limitations of the data, you can avoid being misled—and make smarter, more informed decisions.
    Release date: 2025-12-15

  • Articles and reports: 11-522-X202500100010
    Description: Statistics Canada's Labour Force Survey (LFS) plays an essential role in the estimation of labour market conditions in Canada. Periodically, LFS revises its data to the most recent industry and occupational classification versions. Differences in versions can be extensive, including high-level and unit-group structural changes, creations, deletions, split-offs and combination of classification units (classes). Historically, to reconcile split-off classes - where one class splits into multiple classes - a sample of LFS split-off records would be manually recoded to the new classification version. Based on the split-off proportion observed in the recoded sample, a random allocation method would be applied on all data to reflect the changing Canadian labour market over time. This article proposes using machine learning (fastText), constrained to split-off proportions using linear programming, to revise industry and occupation classifications in LFS. The hybrid framework benefits from a text-based revision mechanism while adhering to traditional proportions driven estimates, thus ensuring a minimal impact on the comparability of published labour market indicators.
    Release date: 2025-09-08

  • Articles and reports: 36-28-0001202500300002
    Description: Government programs are evaluated to measure their effectiveness. This article discusses the benefits of using Statistics Canada data combined with the data collected from the government program to provide a far more comprehensive evaluation than program data alone can offer. The article also summarizes a recent example of a program evaluation that benefited from Statistics Canada data and the expertise of Statistics Canada researchers in analyzing the data.
    Release date: 2025-03-26

  • Articles and reports: 12-001-X202400200004
    Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.
    Release date: 2024-12-20

  • Articles and reports: 12-001-X202400200005
    Description: Adaptive survey designs (ASDs) tailor recruitment protocols to population subgroups that are relevant to a survey. In recent years, effective ASD optimization has been the topic of research and several applications. However, the performance of an optimized ASD over time is sensitive to time changes in response propensities. How adaptation strategies can adjust to such variation over time is not yet fully understood. In this paper, we propose a robust optimization approach in the context of sequential mixed-mode surveys employing Bayesian analysis. The approach is formulated as a mathematical programming problem that explicitly accounts for uncertainty due to time change. ASD decisions can then be made by considering time-dependent variation in conditional mode response propensities and between-mode correlations in response propensities. The approach is demonstrated using a case study: the 2014-2017 Dutch Health Survey. We evaluate the sensitivity of ASD performance to 1) the budget level and 2) the length of applicable historic time-series data. We find there is only a moderate dependence on the budget level and the dependence on historic data is moderated by the amount of seasonality during the year.
    Release date: 2024-12-20

  • Surveys and statistical programs – Documentation: 11-633-X2024004
    Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.
    Release date: 2024-12-09
Data (2)

Data (2) ((2 results))

  • Data Visualization: 71-607-X2020010
    Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
    Release date: 2024-08-21

  • Data Visualization: 71-607-X2019010
    Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
    Release date: 2019-10-30
Analysis (256)

Analysis (256) (210 to 220 of 256 results)

  • Articles and reports: 12-001-X19980024356
    Description:

    In the nonsurvey setting,"exact" confidence intervals for proportions calculated using the binomial distribution are frequently used instead of intervals based on approximate normality when the number of positive counts is small. With complex survey data, the binomial intervals are not applicable, so intervals based on the assumed approximate normality of the sample-weighted proportion are used, even if the number of positive counts is small. We propose a simple modification of the binomial intervals to be used in this situation. Limited simulations are presented that show the coverage probability of the proposed intervals is superior to that of the normality-based intervals, logit-transform intervals, and intervals based on a Poisson approximation. Applications are given involving the prevalence of Human Immunodeficiency Virus (HIV) based on data from the third National Health and Nutrition Examination Survey, and the proportion of users of cocaine based on data from the Hispanic Health and Nutrition Examination Survey.

    Release date: 1999-01-14

  • Articles and reports: 12-001-X19980013913
    Description:

    Temporary mobility is hypothesized to contribute toward within-household coverage error since it may affect an individual's determination of "usual residence" - a concept commonly applied when listing persons as part of a household-based survey or census. This paper explores a typology of temporary mobility patterns and how they relate to the identification of usual residence. Temporary mobility is defined by the pattern of movement away from, but usually back to a single residence over a two-three month reference period. The typology is constructed using two dimensions: the variety of places visited and the frequency of visits made. Using data from the U.S. Living Situation Survey (LSS) conducted in 1993, four types of temporary mobility patterns are identified. In particular, two groups exhibiting patterns of repeat visit behavior were found to contain more of the types of people who tend to be missed during censuses and surveys. Log-linear modeling indicates spent away and demographic characteristics.

    Release date: 1998-07-31

  • Articles and reports: 12-001-X19960022981
    Description:

    Results from the Current Population Survey split panel studies indicated a centralized computer-assisted telephone interviewing (CATI) effect on labor force estimates. One hypothesis is that the CATI interviewing increased the probability of respondent's changing their reported labor force status. The two sample McNemar test is appropriate for testing this type of hypothesis: the hypothesis of interest is that the marginal changes in each of two independent sample's tables are equal. We show two adaptations of this test to complex survey data, along with applications from the Current Population Survey's Parallel Survey split data and from the Current Population Survey's CATI Phase-in data.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X199600114390
    Description:

    Data are often available only as a set of group or area means. However, it is well known that statistical analysis based on such data will often produce results very different from those obtained from analysing the corresponding individual or household data. If the results of area level analyses are thought to apply to the individual level then we risk committing the ecological fallacy. Aggregation or ecological effects arise in part because geographic areas are not comprised of random groupings of people or households but exhibit strong socio-economic differences between areas. The population structure must be incorporated into the statistical model underpinning the analysis if aggregation effects are to be understood. A simple general model is proposed to achieve this and the consequences of the model and its implications for the estimation of population means and covariance matrices are obtained. Furthermore, methods are suggested which can provide unbiased estimates of individual level parameters from aggregated data and so avoid the ecological fallacy. These methods rely on identifying the “grouping variables” that characterise the process that led to the population structure, or at least characterise the area differences. An estimate of the unit level covariance matrix of the grouping variables is required from some source. Data from the 1991 Census of the United Kingdom have been analysed to identify the important grouping variables and evaluate the effectiveness of the proposed adjustment methods for the estimation of covariance matrices and correlation coefficients. These results lead to a suggested strategy for the analysis of aggregated data.

    Release date: 1996-06-14

  • Articles and reports: 11F0019M1995084
    Geography: Canada
    Description:

    The objective of this paper is to introduce in a new measure of the average duration of unemployment spells using Canadian data. The paper summarizes the work of Corak (1993) and Corak and Heisz (1994) on the average complete duration of unemployment in a non-technical way by focusing on the distinction between it and the average incomplete duration of unemployment, which is regularly released by Statistics Canada. It is pointed out that the latter is a lagging cyclical indicator. The average complete duration of unemployment is a more accurate indicator of prevailing labour market conditions, but some assumptions required in its derivation also imply that it lags actual developments.

    Release date: 1995-12-30

  • Articles and reports: 75F0002M1994004
    Description:

    This report describes major expected uses for the data from the Survey of Labour and Income Dynamics (SLID).

    Release date: 1995-12-30

  • Articles and reports: 75F0002M1995016
    Description:

    This paper examines the development of survey data files, or data modelling, for longitudinal surveys.

    Release date: 1995-12-30

  • Articles and reports: 12-001-X199500214394
    Description:

    In a 1992 National Test Census the mailing sequence of a prenotice letter, census form, reminder postcard, and replacement census form resulted in an overall mailback response of 63.4 percent. The response was substantially higher than the 49.2 percent response rate obtained in the 1986 National Content Test Census, which also utilized a replacement form mailing. Much of this difference appeared to be the result of the prenotice - census form - reminder sequence, but the extent to which each main effect and interactions contributed to overall response was not known. This paper reports results from the 1992 Census Implementation Test, a test of the individual and combined effectiveness of a prenotice letter, a stamped return envelope and a reminder postcard, on response rates. This was a national sample of households (n = 50,000) conducted in the fall of 1992. A factorial design was used to test all eight possible combinations of the main effects and interactions. Logistic regression and multiple comparisons were employed to analyze test results.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500214396
    Description:

    We summarize some salient aspects of the theory of estimation functions for finite populations. In particular, we discuss the problem of estimation of means and totals and extend this theory to estimating functions. We then apply this estimating functions framework to the problem of estimating measures of income inequality. The resulting statistics are nonlinear functions of the observations. Some of them depend on the order of observations or quantiles. Consequently, the mean squared errors of these estimates are inexpressible by simple formulae and cannot be estimated by conventional variance estimation methods. We show that within the estimating function framework this problem can be resolved using the Taylor linearization method. Finally, we illustrate the proposed methodology using income data from Canadian Survey of Consumer Finance and comparing it to the ‘delete-one-cluster’ jackknifing method.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500114416
    Description:

    Stanley Warner was widely known for the creation of the randomized response technique for asking sensitive questions in surveys. Over almost two decades he also formulated and developed statistical methodology for another problem, that of deriving balanced information in advocacy settings so that both positions regarding a policy issue can be fairly and adequately represented. We review this work, including two survey applications implemented by Warner in which he applied the methodology, and we set the ideas into the context of current methodological thinking.

    Release date: 1995-06-15
Reference (26)

Reference (26) (20 to 30 of 26 results)

  • Surveys and statistical programs – Documentation: 11F0019M2003207
    Geography: Canada
    Description:

    The estimation of intergenerational earnings mobility is rife with measurement problems since the research does not observe permanent, lifetime earnings. Nearly all studies make corrections for mean variation in earnings because of the age differences among respondents. Recent works employ average earnings or instrumental variable methods to address the effects of measurement error as a result of transitory earnings shocks and mis-reporting. However, empirical studies of intergenerational mobility have paid no attention to the changes in earnings variance across the life cycle suggested by economic models of human capital investment.

    Using information from the Intergenerational Income Data from Canada and the National Longitudinal Survey and Panel Study of Income Dynamics from the United States, this study finds a strong association between age at observation and estimated earnings persistence. Part of this age-dependence is related to a general increase in transitory earnings variance during the collection of data. An independent effect of life cycle investment is also identified. These findings are then applied to the variation among intergenerational earnings persistence studies. Among studies with similar methodologies, one-third of the variance in published estimates of earnings persistence is attributable to cross-study differences in the age of responding fathers. Finally, these results call into question tests for the importance of credit constraints based on measures of earnings at different points in the life cycle.

    Release date: 2003-08-05

  • Surveys and statistical programs – Documentation: 12-584-G
    Description:

    This book introduces technical aspects of the Statistics Canada Total Work Accounts System (TWAS). The TWAS is designed to facilitate the analysis of issues that require simultaneous consideration of both paid work and unpaid productive work. Its key contribution is to allocate the deemed output of each episode of unpaid work activity to a specific beneficiary or group of beneficiaries (called "destinations"). The guide presents the criteria used to decide the allocation of each work episode to one of the destinations, as well as the pseudo code for DESTIN, the key variable of the System. This pseudo code allows programmers to quickly create the actual programming code needed to derive the DESTIN variable in their own microdata files of diary-based time-use records. The guide also discusses illustrative applications of the System, as well as its key limitations.

    Release date: 2002-02-12

  • Notices and consultations: 87-003-X19970012882
    Geography: Canada
    Description:

    The purpose of this article is to inform Travel-log readers of the availability of a new analytical tool - the National Tourism Indicators. These estimates, which measure trends in tourism in Canada, are placed in perspective here, taking into account the concepts and definitions used in developing them.

    Release date: 1997-01-08

  • Surveys and statistical programs – Documentation: 11F0019M1995083
    Geography: Canada
    Description:

    This paper examines the robustness of a measure of the average complete duration of unemployment in Canada to a host of assumptions used in its derivation. In contrast to the average incomplete duration of unemployment, which is a lagging cyclical indicator, this statistic is a coincident indicator of the business cycle. The impact of using a steady state as opposed to a non steady state assumption, as well as the impact of various corrections for response bias are explored. It is concluded that a non steady state estimator would be a valuable compliment to the statistics on unemployment duration that are currently released by many statistical agencies, and particularly Statistics Canada.

    Release date: 1995-12-30

  • Surveys and statistical programs – Documentation: 75F0002M1993014
    Description:

    This paper presents the results from test 3A of the Survey of Labour and Income Dynamics (SLID), conducted in January 1993, with a view to identify any necessary changes to the questions or to the algorithm used to derive labour force status.

    Release date: 1995-12-30

  • Surveys and statistical programs – Documentation: 75F0002M1994018
    Description:

    This document describes the demographic, cultural and geographic derived variables for the Survey of Labour and Income Dynamics (SLID).

    Release date: 1995-12-30