Data analysis

Skip to filters. View results.

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

2 facets displayed. 0 facets selected.

Survey or statistical program

56 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (289)

All (289) (0 to 10 of 289 results)

  • Articles and reports: 36-28-0001202600500003
    Description: This spotlight article outlines practical methods for assessing the economic impacts of public programs delivered by federal agencies and Crown corporations. It summarizes key steps in conducting quantitative impact analysis, including data linkage, cohort construction and implementation of quasi causal estimators.
    Release date: 2026-05-27

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2026-05-27

  • Surveys and statistical programs – Documentation: 11-633-X2026001
    Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.
    Release date: 2026-03-05

  • Articles and reports: 12-001-X202500200004
    Description: The class of generalized linear models (GLM) is a flexible generalization of ordinary least squares regression that allows the linear model to be related to the response variable via a link function and assumes the magnitude of the variance of each measurement to be a function of its predicted value. Multicollinearity in GLMs can inflate variances of the estimated coefficients and cause poor prediction in certain regions of the regression space. It may also cause a nonsignificant Wald statistic even when the predictors are highly predictive in a model of the family of GLMs. Little previous research has closely investigated the diagnostics of multicollinearity in GLMs, especially when complex survey data are used. In this paper, we develop variance inflation factors (VIFs) that measure the amount that the variance of a parameter estimator is increased due to multicollinearity in GLMs. We also extend VIFs and condition indexes to apply to complex survey data, accounting for design features, e.g. weights, clusters, and strata. Illustrations of these methods are given using data from a household survey of health and nutrition.
    Release date: 2025-12-23

  • Stats in brief: 89-20-00062025001
    Description: This video is designed to help you critically assess the data presented to you. No data is perfect. By understanding the strengths and limitations of the data, you can avoid being misled—and make smarter, more informed decisions.
    Release date: 2025-12-15

  • Articles and reports: 11-522-X202500100010
    Description: Statistics Canada's Labour Force Survey (LFS) plays an essential role in the estimation of labour market conditions in Canada. Periodically, LFS revises its data to the most recent industry and occupational classification versions. Differences in versions can be extensive, including high-level and unit-group structural changes, creations, deletions, split-offs and combination of classification units (classes). Historically, to reconcile split-off classes - where one class splits into multiple classes - a sample of LFS split-off records would be manually recoded to the new classification version. Based on the split-off proportion observed in the recoded sample, a random allocation method would be applied on all data to reflect the changing Canadian labour market over time. This article proposes using machine learning (fastText), constrained to split-off proportions using linear programming, to revise industry and occupation classifications in LFS. The hybrid framework benefits from a text-based revision mechanism while adhering to traditional proportions driven estimates, thus ensuring a minimal impact on the comparability of published labour market indicators.
    Release date: 2025-09-08

  • Articles and reports: 36-28-0001202500300002
    Description: Government programs are evaluated to measure their effectiveness. This article discusses the benefits of using Statistics Canada data combined with the data collected from the government program to provide a far more comprehensive evaluation than program data alone can offer. The article also summarizes a recent example of a program evaluation that benefited from Statistics Canada data and the expertise of Statistics Canada researchers in analyzing the data.
    Release date: 2025-03-26

  • Articles and reports: 12-001-X202400200004
    Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.
    Release date: 2024-12-20

  • Articles and reports: 12-001-X202400200005
    Description: Adaptive survey designs (ASDs) tailor recruitment protocols to population subgroups that are relevant to a survey. In recent years, effective ASD optimization has been the topic of research and several applications. However, the performance of an optimized ASD over time is sensitive to time changes in response propensities. How adaptation strategies can adjust to such variation over time is not yet fully understood. In this paper, we propose a robust optimization approach in the context of sequential mixed-mode surveys employing Bayesian analysis. The approach is formulated as a mathematical programming problem that explicitly accounts for uncertainty due to time change. ASD decisions can then be made by considering time-dependent variation in conditional mode response propensities and between-mode correlations in response propensities. The approach is demonstrated using a case study: the 2014-2017 Dutch Health Survey. We evaluate the sensitivity of ASD performance to 1) the budget level and 2) the length of applicable historic time-series data. We find there is only a moderate dependence on the budget level and the dependence on historic data is moderated by the amount of seasonality during the year.
    Release date: 2024-12-20

  • Surveys and statistical programs – Documentation: 11-633-X2024004
    Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.
    Release date: 2024-12-09
Data (2)

Data (2) ((2 results))

  • Data Visualization: 71-607-X2020010
    Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
    Release date: 2024-08-21

  • Data Visualization: 71-607-X2019010
    Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
    Release date: 2019-10-30
Analysis (256)

Analysis (256) (220 to 230 of 256 results)

  • Articles and reports: 12-001-X199400214421
    Description:

    This paper discusses testing a single hypothesis about linear regression coefficients based on sample survey data. It suggests that when the design-based linearization variance estimator for a regression coefficient is used it should be adjusted to reduce its slight model bias and that a Satterthwaite-like estimation of its effective degrees of freedom be made. A very important special case of this analysis is its application to domain means.

    Release date: 1994-12-15

  • Articles and reports: 11F0019M1994070
    Geography: Canada
    Description:

    This paper uses job turnover data to compare how job creation, job destruction and net job change differ for small and large establishments in the Canadian manufacturing sector. It uses several different techniques to correct for the regression-to-the-mean problem that, it has been suggested, might incorrectly lead to the conclusion that small establishments create a disproportionate number of new jobs. It finds that net job creation for smaller establishments is greater than that of large establishments after such changes are made. The paper also compares the importance of small and large establishments in the manufacturing sectors of Canada and the United States. The Canadian manufacturing sector is shown to have both a larger proportion of employment in smaller establishments but also to have a small establishment sector that is growing in importance relative to that of the United States.

    Release date: 1994-11-16

  • Articles and reports: 12-001-X199300214452
    Description:

    Surveys across time can serve many objectives. The first half of the paper reviews the abilities of alternative survey designs across time - repeated surveys, panel surveys, rotating panel surveys and split panel surveys - to meet these objectives. The second half concentrates on panel surveys. It discusses the decisions that need to be made in designing a panel survey, the problems of wave nonresponse, time-in-sample bias and the seam effect, and some methods for the longitudinal analysis of panel survey data.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300214460
    Description:

    Methods for estimating response bias in surveys require “unbiased” remeasurements for at least a subsample of observations. The usual estimator of response bias is the difference between the mean of the original observations and the mean of the unbiased observations. In this article, we explore a number of alternative estimators of response bias derived from a model prediction approach. The assumed sampling design is a stratified two-phase design implementing simple random sampling in each phase. We assume that the characteristic, y, is observed for each unit selected in phase 1 while the true value of the characteristic, \mu, is obtained for each unit in the subsample selected at phase 2. We further assume that an auxiliary variable x is known for each unit in the phase 1 sample and that the population total of x is known. A number of models relating y, \mu and x are assumed which yield alternative estimators of E (y - \mu), the response bias. The estimators are evaluated using a bootstrap procedure for estimating variance, bias, and mean squared error. Our bootstrap procedure is an extension of the Bickel-Freedman single phase method to the case of a stratified two-phase design. As an illustration, the methodology is applied to data from the National Agricultural Statistics Service reinterview program. For these data, we show that the usual difference estimator is outperformed by the model-assisted estimator suggested by Särndal, Swensson and Wretman (1991), thus indicating that improvements over the traditional estimator are possible using the model prediction approach.

    Release date: 1993-12-15

  • Articles and reports: 12-001-X199300114476
    Description:

    This paper focuses on how to deal with record linkage errors when engaged in regression analysis. Recent work by Rubin and Belin (1991) and by Winkler and Thibaudeau (1991) provides the theory, computational algorithms, and software necessary for estimating matching probabilities. These advances allow us to update the work of Neter, Maynes, and Ramanathan (1965). Adjustment procedures are outlined and some successful simulations are described. Our results are preliminary and intended largely to stimulate further work.

    Release date: 1993-06-15

  • Articles and reports: 12-001-X199200214484
    Description:

    Maximum likelihood estimation from complex sample data requires additional modeling due to the information in the sample selection. Alternatively, pseudo maximum likelihood methods that consist of maximizing estimates of the census score function can be applied. In this article we review some of the approaches considered in the literature and compare them with a new approach derived from the ideas of ‘weighted distributions’. The focus of the comparisons is on situations where some or all of the design variables are unknown or misspecified. The results obtained for the new method are encouraging, but the study is limited so far to simple situations.

    Release date: 1992-12-15

  • Articles and reports: 12-001-X199200214487
    Description:

    This paper reviews the idea of robustness for randomisation and model-based inference for descriptive and analytic surveys. The lack of robustness for model-based procedures can be partially overcome by careful design. In this paper a robust model-based approach to analysis is proposed based on smoothing methods.

    Release date: 1992-12-15

  • Articles and reports: 12-001-X199200114494
    Description:

    This article presents a selected annotated bibliography of the literature on capture-recapture (dual system) estimation of population size, on extensions to the basic methodology, and the application of these techniques in the context of census undercount estimation.

    Release date: 1992-06-15

  • Articles and reports: 12-001-X199200114498
    Description:

    One way to assess the undercount at subnational levels (e.g. the state level) is to obtain sample data from a post-enumeration survey, and then smooth those data based on a linear model of explanatory variables. The relative importance of sampling-error variances to corresponding model-error variances determines the amount of smoothing. Maximum likelihood estimation can lead to oversmoothing, so making the assessment of undercount over-reliant on the linear model. Restricted maximum likelihood (REML) estimators do not suffer from this drawback. Empirical Bayes prediction of undercount based on REML will be presented in this article, and will be compared to maximum likelihood and a method of moments by both simulation and example. Large-sample distributional properties of the REML estimators allow accurate mean squared prediction errors of the REML-based smoothers to be computed.

    Release date: 1992-06-15

  • Articles and reports: 12-001-X199200114499
    Description:

    This paper reviews some of the arguments for and against adjusting the U.S. census of 1980, and the decision of the court.

    Release date: 1992-06-15
Reference (26)

Reference (26) (10 to 20 of 26 results)

  • Notices and consultations: 75-513-X2014001
    Description:

    Starting with the 2012 reference year, annual individual and family income data is produced by the Canadian Income Survey (CIS). The CIS is a cross-sectional survey developed to provide information on the income and income sources of Canadians, along with their individual and household characteristics. The CIS reports on many of the same statistics as the Survey of Labour and Income Dynamics (SLID), which last reported on income for the 2011 reference year. This note describes the CIS methodology, as well as the main differences in survey objectives, methodology and questionnaires between CIS and SLID.

    Release date: 2014-12-10

  • Surveys and statistical programs – Documentation: 16-001-M2010014
    Description: Quantifying how Canada's water yield has changed over time is an important component of the water accounts maintained by Statistics Canada. This study evaluates the movement in the series of annual water yield estimates for Southern Canada from 1971 to 2004. We estimated the movement in the series using a trend-cycle approach and found that water yield for southern Canada has generally decreased over the period of observation.
    Release date: 2010-09-13

  • Surveys and statistical programs – Documentation: 11-533-X
    Description:

    This guide has been created especially for users needing a step-by-step review on how to find, read and use data, with quick tips on locating information on the Statistics Canada website. Originally published in paper format in the 1980s, revised as part of the 1994 Statistics Canada Catalogue, and then transformed into an electronic version, this guide is continually being updated to maintain its currency and usefulness.

    Release date: 2007-11-19

  • Surveys and statistical programs – Documentation: 81-595-M2007056
    Geography: Canada
    Description: This handbook discusses the collection and interpretation of statistical data on Canada's trade in culture services.
    Release date: 2007-10-31

  • Surveys and statistical programs – Documentation: 15-206-X2006004
    Description:

    This paper provides a brief description of the methodology currently used to produce the annual volume of hours worked consistent with the System of National Accounts (SNA). These data are used for labour input in the annual and quarterly measures of labour productivity, as well as in the annual measures of multifactor productivity. For this purpose, hours worked are broken down by educational level and age group, so that changes in the composition of the labour force can be taken into account. They are also used to calculate hourly compensation and the unit labour cost and for simulations of the SNA Input-Output Model; as such, they are integrated as labour force inputs into most SNA satellite accounts (i.e., environment, tourism).

    Release date: 2006-10-27

  • Surveys and statistical programs – Documentation: 62F0026M2005005
    Description:

    This discussion paper reviews the previous research into the subject of presenting historical time series and comparisons in constant dollars for the Survey of Household Spending (SHS), and its predecessor the Family Expenditure Survey (FAMEX). It examines two principal methods of converting spending data into constant dollars. The purpose of this discussion paper is to show interested parties how the two methods differ in complexity of implementation and interpretation.

    Release date: 2005-07-15

  • Notices and consultations: 12-002-X20050018033
    Description:

    Dr. J. Douglas Willms, and his staff at the Canadian Research Institute for Social Policy (CRISP) at the University of New Brunswick (Fredericton Campus), have developed a set of files for researchers interested in using Statistics Canada's National Longitudinal Survey of Children and Youth (NLSCY) data sets. "The Files" consist of SPSS data and syntax, which are intended to assist researchers in conducting more efficient longitudinal analyses, using NLSCY data.

    Release date: 2005-06-23

  • Surveys and statistical programs – Documentation: 62F0026M2005001
    Description:

    This paper provides some guidance to users on the use of medians and also gives some examples of situations when it can be a more appropriate measure than the average.

    Release date: 2005-05-17

  • Surveys and statistical programs – Documentation: 81-595-M2004020
    Geography: Canada
    Description:

    This article discusses the collection and interpretation of statistical data on Canada's trade in culture goods. It defines the products that are included in culture trade and explains how appropriate products are selected from the relevant classification standards.

    This version has been replaced by Culture Goods Trade Data User Guide, Catalogue No. 81-595-MIE2006040.

    Release date: 2004-07-28

  • Surveys and statistical programs – Documentation: 92-388-X
    Description:

    This report contains basic conceptual and data quality information to help users interpret and make use of census occupation data. It gives an overview of the collection, coding (to the 2001 National Occupational Classification), edit and imputation of the occupation data from the 2001 Census. The report describes procedural changes between the 2001 and earlier censuses, and provides an analysis of the quality level of the 2001 Census occupation data. Finally, it details the revision of the 1991 Standard Occupational Classification used in the 1991 and 1996 Censuses to the 2001 National Occupational Classification for Statistics used in 2001. The historical comparability of data coded to the two classifications is discussed. Appendices to the report include a table showing historical data for the 1991, 1996 and 2001 Censuses.

    Release date: 2004-07-15