Data analysis

Skip to filters. View results.

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

2 facets displayed. 0 facets selected.

Survey or statistical program

56 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (289)

All (289) (0 to 10 of 289 results)

  • Articles and reports: 36-28-0001202600500003
    Description: This spotlight article outlines practical methods for assessing the economic impacts of public programs delivered by federal agencies and Crown corporations. It summarizes key steps in conducting quantitative impact analysis, including data linkage, cohort construction and implementation of quasi causal estimators.
    Release date: 2026-05-27

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2026-05-27

  • Surveys and statistical programs – Documentation: 11-633-X2026001
    Description: This report defines key concepts related to area-level analysis and introduces area-level measures developed and utilized at Statistics Canada for health analysis. It also provides a decision-making framework and practical recommendations to help researchers select appropriate methods. The goal is to guide readers on when area-level analysis is appropriate and what type of area-level measure is suitable to achieve research objectives.
    Release date: 2026-03-05

  • Articles and reports: 12-001-X202500200004
    Description: The class of generalized linear models (GLM) is a flexible generalization of ordinary least squares regression that allows the linear model to be related to the response variable via a link function and assumes the magnitude of the variance of each measurement to be a function of its predicted value. Multicollinearity in GLMs can inflate variances of the estimated coefficients and cause poor prediction in certain regions of the regression space. It may also cause a nonsignificant Wald statistic even when the predictors are highly predictive in a model of the family of GLMs. Little previous research has closely investigated the diagnostics of multicollinearity in GLMs, especially when complex survey data are used. In this paper, we develop variance inflation factors (VIFs) that measure the amount that the variance of a parameter estimator is increased due to multicollinearity in GLMs. We also extend VIFs and condition indexes to apply to complex survey data, accounting for design features, e.g. weights, clusters, and strata. Illustrations of these methods are given using data from a household survey of health and nutrition.
    Release date: 2025-12-23

  • Stats in brief: 89-20-00062025001
    Description: This video is designed to help you critically assess the data presented to you. No data is perfect. By understanding the strengths and limitations of the data, you can avoid being misled—and make smarter, more informed decisions.
    Release date: 2025-12-15

  • Articles and reports: 11-522-X202500100010
    Description: Statistics Canada's Labour Force Survey (LFS) plays an essential role in the estimation of labour market conditions in Canada. Periodically, LFS revises its data to the most recent industry and occupational classification versions. Differences in versions can be extensive, including high-level and unit-group structural changes, creations, deletions, split-offs and combination of classification units (classes). Historically, to reconcile split-off classes - where one class splits into multiple classes - a sample of LFS split-off records would be manually recoded to the new classification version. Based on the split-off proportion observed in the recoded sample, a random allocation method would be applied on all data to reflect the changing Canadian labour market over time. This article proposes using machine learning (fastText), constrained to split-off proportions using linear programming, to revise industry and occupation classifications in LFS. The hybrid framework benefits from a text-based revision mechanism while adhering to traditional proportions driven estimates, thus ensuring a minimal impact on the comparability of published labour market indicators.
    Release date: 2025-09-08

  • Articles and reports: 36-28-0001202500300002
    Description: Government programs are evaluated to measure their effectiveness. This article discusses the benefits of using Statistics Canada data combined with the data collected from the government program to provide a far more comprehensive evaluation than program data alone can offer. The article also summarizes a recent example of a program evaluation that benefited from Statistics Canada data and the expertise of Statistics Canada researchers in analyzing the data.
    Release date: 2025-03-26

  • Articles and reports: 12-001-X202400200004
    Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.
    Release date: 2024-12-20

  • Articles and reports: 12-001-X202400200005
    Description: Adaptive survey designs (ASDs) tailor recruitment protocols to population subgroups that are relevant to a survey. In recent years, effective ASD optimization has been the topic of research and several applications. However, the performance of an optimized ASD over time is sensitive to time changes in response propensities. How adaptation strategies can adjust to such variation over time is not yet fully understood. In this paper, we propose a robust optimization approach in the context of sequential mixed-mode surveys employing Bayesian analysis. The approach is formulated as a mathematical programming problem that explicitly accounts for uncertainty due to time change. ASD decisions can then be made by considering time-dependent variation in conditional mode response propensities and between-mode correlations in response propensities. The approach is demonstrated using a case study: the 2014-2017 Dutch Health Survey. We evaluate the sensitivity of ASD performance to 1) the budget level and 2) the length of applicable historic time-series data. We find there is only a moderate dependence on the budget level and the dependence on historic data is moderated by the amount of seasonality during the year.
    Release date: 2024-12-20

  • Surveys and statistical programs – Documentation: 11-633-X2024004
    Description: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 40 years.
    Release date: 2024-12-09
Data (2)

Data (2) ((2 results))

  • Data Visualization: 71-607-X2020010
    Description: The Canadian Statistical Geospatial Explorer empowers users to discover geo enabled data holdings of Statistics Canada at various levels of geography including at the neighbourhood level. Users are able to visualize, thematically map, spatially explore and analyze, export and consume data in various formats. Users can also view the data superimposed on satellite imagery, topographic and street layers.
    Release date: 2024-08-21

  • Data Visualization: 71-607-X2019010
    Description: The Housing Data Viewer is a visualization tool that allows users to explore Statistics Canada data on a map. Users can use the tool to navigate, compare and export data.
    Release date: 2019-10-30
Analysis (256)

Analysis (256) (200 to 210 of 256 results)

  • Articles and reports: 12-001-X20000025539
    Description:

    In this paper we will combine two applications of multilevel models. The multilevel model is suitable to analyze interviewer effects on survey data. It can also be used to analyze longitudinal - "repeated measurements" - data. We will analyze a data quality indicator of panel data that come from the Belgian Election Studies.

    Release date: 2001-02-28

  • Articles and reports: 11F0019M2001158
    Geography: Canada
    Description:

    Several recent papers have cited non-linearities in the relationship between incomes of parents and their children as evidence of important intergenerational credit constraints. This paper argues that any pattern in the conditional expectation function can be justified by a properly constructed story with credit constraints. This raises questions about the validity of the approach. Quantile regressions provide an alternative test. Using data from Canadian tax files, this paper finds results contrary to the credit constraints hypothesis; the non-linearities in the regression function are driven by the low-ability (unconstrained) sons rather than high-ability (presumably constrained) sons.

    Release date: 2001-01-30

  • Articles and reports: 53-222-X19980006587
    Description:

    The primary purpose of this article is to present a new time series data and to demonstrate its analytical potential and not to provide a detailed analysis of these data. The analysis in section 5.2.4 will deal primarily with the trends of major variables dealing with domestic and transborder traffic.

    Release date: 2000-03-07

  • 204. Low Income Cut-offs Archived
    Journals and periodicals: 13-551-X
    Description:

    Low income cut-offs (LICOs) are intended to convey the income level at which a family may be in straitened circumstances because it has to spend a greater portion of its income on the basics (food, clothing and shelter) than does the average family of similar size. The LICOs vary by family size and by size of community.

    This publication provides a brief explanation of how the LICOs are derived and updated annually. In addition, it provides on a historical basis, LICOs for different family sizes by size of area of residence. LICOs are calculated based on the spending patterns of families on basic 'necessities' - food, shelter and clothing - as collected from the Survey of Household Spending (formerly referred to as the Family Expenditure Survey (FAMEX)).

    Release date: 1999-12-10

  • Articles and reports: 12-001-X19990014714
    Description:

    In this paper a general multilevel model framework is used to provide estimates for small areas using survey data. This class of models allows for variation between areas because of: (i) differences in the distributions of unit level variables between areas, (ii) differences in the distribution of area level variables between areas (iii) area specific components of variance which make provision for additional local variation which cannot be explained by unit-level or area-level covariates. Small area estimators are derived for this multilevel model formulation and an approximation to the mean square error (MSE) of each small area estimates for this general class of mixed models is provided together with an estimator of this MSE. Both the approximations to the MSE and the estimator of MSE take into account three sources of variation: (i) the prediction MSE assuming that both the fixed and components of variance terms in the multilevel model are knows, (ii) the additional component due to the fact that the fixed coefficients must be estimated, and (iii) the further component due to the fact that the components of variance in the model must be estimated. The proposed methods are estimated using a large data set as a basis for numerical investigation. The results confirm that the extra components of variance contained in multilevel models as well as small area covariates can improve small area estimates and that the MSE approximation and estimator are satisfactory.

    Release date: 1999-10-08

  • Articles and reports: 12-001-X19980024347
    Description:

    We review the current status of various aspects of the design and analysis of studies where the same units are investigated at several points in time. These studies include longitudinal surveys, and longitudinal analyses of retrospective studies and of administrative or census data. The major focus is the special problems posed by the longitudinal nature of the study. We discuss four of the major components of longitudinal studies in general; namely, Design, Implementation, Evaluation and Analysis. Each of these components requires special considerations when planning a longitudinal study. Some issues relating to the longitudinal nature of the studies are: concepts and definitions, frames, sampling, data collection, nonresponse treatment, imputation, estimation, data validation, data analysis and dissemination. Assuming familiarity with the basic requirements for conducting a cross-sectional survey, we highlight the issues and problems that become apparent for many longitudinal studies.

    Release date: 1999-01-14

  • Articles and reports: 12-001-X19980024348
    Description:

    Gross flows among labour force states are of great importance in understanding labour market dynamics. Observed flows are typically subject to classification errors, which may induce serious bias. In this paper, some of the most common strategies, used to collect longitudinal information about labour force condition are reviewed, jointly with the modelling approaches developed to correct gross flows, when affected by classification errors. A general framework for estimating gross flows is outlined. Examples are given of different model specifications, applied to data collected with different strategies. Specifically, two cases are considered, i.e., gross flows from (i) the U.S. Survey of Income and Program Participation and (ii) the French Labour Force Survey, a yearly survey collecting retrospective monthly information.

    Release date: 1999-01-14

  • Articles and reports: 12-001-X19980024350
    Description:

    In longitudinal surveys, simple estimates of change, such as differences of percentages may not always be efficient enough to detect changes of practical relevance, especially in sub-populations. The use of models, which can represent the dependence structure of the longitudinal survey, can help to solve this problem. One of the main characteristics observed by the Swiss Labour Force Survey (SLFS) is the employment status. As the survey is designed as a rotating panel, the data from the SLFS are multivariate categorical data, where a large proportion of the response profiles are missing by design. The multivariate logistic model, introduced by Glonek and McCullagh (1995) as a generalisation of logistic regression, is attractive in this context, since it allows for dependent repeated observations and incomplete response profiles. We show that, using multivariate logistic regression, we can represent the complex dependence structure of the SLFS by a small number of parameters, and obtain more efficient estimates of change.

    Release date: 1999-01-14

  • Articles and reports: 12-001-X19980024351
    Description:

    To calculate price indexes, data on "the same item" (actually a collection of items narrowly defined) must be collected across time periods. The question arises whether such "quasi-longitudinal" data can be modeled in such a way as to shed light on what a price index is. Leading thinkers on price indexes have questioned the feasibility of using statistical modeling at all for characterizing price indexes. This paper suggests a simple state space model of price data, yielding a consumer price index that is given in terms of the parameters of the model.

    Release date: 1999-01-14

  • Articles and reports: 12-001-X19980024353
    Description:

    This paper studies response errors in the Current Population Survey of the U.S. Bureau of the Census and assesses their impact on the unemployment rates published by the Bureau of Labour Statistics. The measurement of these error rates is obtained from reinterview data, using an extension of the Hui and Walter (1980) procedure for the evaluation of diagnostic tests. Unlike prior studies which assumed that the reconciled reinterview yields the true status, the method estimates the error rates in both interviews. Using these estimated error rates, we show that the misclassification in the original survey creates a cyclical effect on the reported estimated unemployment rates. In particular, the degress of underestimation increases when true unemployment is high. As there was insufficient data to distinguish between a model assuming that the misclassification rates are the same throughout the business cycle, and one that allows the error rates to differ in periods of low, moderate and high unemployment, our findings should be regarded as preliminary. Nonetheless, they indicated that the relationship between the models used to assess the accuracy of diagnostic tests, and those measuring misclassification rates of survey data, deserves further study.

    Release date: 1999-01-14
Reference (26)

Reference (26) (10 to 20 of 26 results)

  • Notices and consultations: 75-513-X2014001
    Description:

    Starting with the 2012 reference year, annual individual and family income data is produced by the Canadian Income Survey (CIS). The CIS is a cross-sectional survey developed to provide information on the income and income sources of Canadians, along with their individual and household characteristics. The CIS reports on many of the same statistics as the Survey of Labour and Income Dynamics (SLID), which last reported on income for the 2011 reference year. This note describes the CIS methodology, as well as the main differences in survey objectives, methodology and questionnaires between CIS and SLID.

    Release date: 2014-12-10

  • Surveys and statistical programs – Documentation: 16-001-M2010014
    Description: Quantifying how Canada's water yield has changed over time is an important component of the water accounts maintained by Statistics Canada. This study evaluates the movement in the series of annual water yield estimates for Southern Canada from 1971 to 2004. We estimated the movement in the series using a trend-cycle approach and found that water yield for southern Canada has generally decreased over the period of observation.
    Release date: 2010-09-13

  • Surveys and statistical programs – Documentation: 11-533-X
    Description:

    This guide has been created especially for users needing a step-by-step review on how to find, read and use data, with quick tips on locating information on the Statistics Canada website. Originally published in paper format in the 1980s, revised as part of the 1994 Statistics Canada Catalogue, and then transformed into an electronic version, this guide is continually being updated to maintain its currency and usefulness.

    Release date: 2007-11-19

  • Surveys and statistical programs – Documentation: 81-595-M2007056
    Geography: Canada
    Description: This handbook discusses the collection and interpretation of statistical data on Canada's trade in culture services.
    Release date: 2007-10-31

  • Surveys and statistical programs – Documentation: 15-206-X2006004
    Description:

    This paper provides a brief description of the methodology currently used to produce the annual volume of hours worked consistent with the System of National Accounts (SNA). These data are used for labour input in the annual and quarterly measures of labour productivity, as well as in the annual measures of multifactor productivity. For this purpose, hours worked are broken down by educational level and age group, so that changes in the composition of the labour force can be taken into account. They are also used to calculate hourly compensation and the unit labour cost and for simulations of the SNA Input-Output Model; as such, they are integrated as labour force inputs into most SNA satellite accounts (i.e., environment, tourism).

    Release date: 2006-10-27

  • Surveys and statistical programs – Documentation: 62F0026M2005005
    Description:

    This discussion paper reviews the previous research into the subject of presenting historical time series and comparisons in constant dollars for the Survey of Household Spending (SHS), and its predecessor the Family Expenditure Survey (FAMEX). It examines two principal methods of converting spending data into constant dollars. The purpose of this discussion paper is to show interested parties how the two methods differ in complexity of implementation and interpretation.

    Release date: 2005-07-15

  • Notices and consultations: 12-002-X20050018033
    Description:

    Dr. J. Douglas Willms, and his staff at the Canadian Research Institute for Social Policy (CRISP) at the University of New Brunswick (Fredericton Campus), have developed a set of files for researchers interested in using Statistics Canada's National Longitudinal Survey of Children and Youth (NLSCY) data sets. "The Files" consist of SPSS data and syntax, which are intended to assist researchers in conducting more efficient longitudinal analyses, using NLSCY data.

    Release date: 2005-06-23

  • Surveys and statistical programs – Documentation: 62F0026M2005001
    Description:

    This paper provides some guidance to users on the use of medians and also gives some examples of situations when it can be a more appropriate measure than the average.

    Release date: 2005-05-17

  • Surveys and statistical programs – Documentation: 81-595-M2004020
    Geography: Canada
    Description:

    This article discusses the collection and interpretation of statistical data on Canada's trade in culture goods. It defines the products that are included in culture trade and explains how appropriate products are selected from the relevant classification standards.

    This version has been replaced by Culture Goods Trade Data User Guide, Catalogue No. 81-595-MIE2006040.

    Release date: 2004-07-28

  • Surveys and statistical programs – Documentation: 92-388-X
    Description:

    This report contains basic conceptual and data quality information to help users interpret and make use of census occupation data. It gives an overview of the collection, coding (to the 2001 National Occupational Classification), edit and imputation of the occupation data from the 2001 Census. The report describes procedural changes between the 2001 and earlier censuses, and provides an analysis of the quality level of the 2001 Census occupation data. Finally, it details the revision of the 1991 Standard Occupational Classification used in the 1991 and 1996 Censuses to the 2001 National Occupational Classification for Statistics used in 2001. The historical comparability of data coded to the two classifications is discussed. Appendices to the report include a table showing historical data for the 1991, 1996 and 2001 Censuses.

    Release date: 2004-07-15